The Open Data Platform Alliance

Categories: Corporate Open Source Software

This morning, Pivotal and Hortonworks announced the formation of the Open Data Platform Initiative. Cloudera has elected not to join, and I’d like to explain why.

I have an engineer’s disdain for industry consortia in general, and for vendor-driven consortia in particular. Far too often, these organizations aim not at promoting, but rather at slowing, innovation in the technology industry. I am old enough to have witnessed the creation of the Open Software Foundation first-hand. The signatories were legacy vendors with legacy UNIX-variant operating systems to protect. The threat they saw in 1988 was the proliferation of free versions of UNIX, notably those based on the Berkeley Software Distribution. As matters played out, Linux — never a party to the OSF — emerged, and innovated, and crushed the proprietary UNIX market.

I learned then that code trumps cash.

Pivotal and Hortonworks claim that the ODP is driven by an industry-wide longing for standardization in the Apache Hadoop ecosystem.

I don’t believe them.

First of all, if that hunger were real, then you’d see a large collection of ISV and customers leading the charge, not merely signing on. The fact that Pivotal played the central role suggests that Pivotal’s strategic challenges, and not those of the industry broadly, are paramount. More fundamentally, Cloudera’s partner ecosystem includes 1,447 companies at the time of this writing. We’re simply not hearing from them that they’re confused about building applications on core Hadoop.

Every vendor shipping a Hadoop distribution builds off the Hadoop trunk. The APIs, data formats and semantics of trunk are stable. The project is a decade old, now, and the global Hadoop community exercises its governance obligations responsibly. There’s simply no fundamental incompatibility among the core Hadoop components shipped by the various vendors.

Of course we all ship different products. Each of us exercises our best curatorial judgment in the complementary packages and services that customers require. You see the same thing in the differences among Red Hat Enterprise Linux, SUSE Linux Enterprise and Canonical’s Ubuntu products. The market works better when vendors can exercise their best judgment, free from the oversight of vendor-driven consortia.

My biggest reservations, though, are for the community.

The software industry has been fundamentally transformed by the success of the open source development model. Talented engineers use the internet to collaborate. The price of membership in this community is creativity, talent and a willingness to bring the code. No matter where you live, regardless of your background, if you have the talent, you can join the club.

Apache Hadoop has been successful for lots of reasons. It’s a powerful new platform for data storage and analysis. It’s dramatically cheaper than systems that preceded it. Fundamentally, though, Hadoop won because it’s open source. The global community innovates faster than any single company can. Users are free to download and try the software, and to join the effort to improve it. The proprietary model for platform software development has been overtaken by open source, and you’ll never see a dominant proprietary platform emerge again. Hadoop is the latest, and best, example.

The Pivotal and Hortonworks alliance, notwithstanding the marketing, is antithetical to the open source model and the Apache way.

While the ASF is open to vendors, the ODP isn’t actually open at all. As a vendor-driven consortium, membership is only for enterprises with serious money — it ought to be called the “Only Dollars Play” alliance. The price of entry is beyond the means of precisely the people who really drive the Hadoop standard — the individual engineers who participate in the Apache projects, and who actually bring the code. Developers in Hadoop already collaborate to design and implement standards, using tried, tested, successful open source collaboration tools.

Vendors can play. They can join the global Apache development community, and bring the talent of their developers and the power of their payrolls to the party. They can argue for, and help to build, the standard inside of the Apache Software Foundation.

Pivotal has visibly stepped back from genuine participation in the Hadoop ecosystem and in the ASF generally. The ODP allows the company to continue to exert its will without actually contributing to the platform.

By the middle 1990s, the OSF had largely faded into irrelevance. Today, the influence of the Linux development community matters, and the legacy UNIX vendors have scattered. It may well be that the ODP turns out to be no more than a retrograde marketing effort, too.

Even if so, though, it’s a mistake. Our colleagues at Hortonworks are deeply involved in the open source community. They are prolific contributors to the Apache Hadoop ecosystem. Our competitive positions notwithstanding, we value their collaboration on the platform enormously. I am disappointed to see them offer their influence to high bidders who don’t understand how open source works.


21 responses on “The Open Data Platform Alliance

Leave a Reply