The Next Ten Years

Categories: Corporate Enterprise Data Hub General Partners

Last week, I wrote about our Series F fundraising round. We raised $160M from T. Rowe Price and three other institutional investors, and from strategic investors MSD Capital and Google Ventures. That’s a lot of money, and it was a big deal for us.

Today, we’re announcing a relationship with Intel that is a bigger deal. The particulars will be covered exhaustively elsewhere, but briefly, they are:

  • Intel will replace its own Apache Hadoop-based software distribution with CDH. We’ll transition current customers and partners over a reasonable time period.
  • Intel and Cloudera will cooperate to enhance the open source projects that make up CDH so that they take advantage of the power and features of Intel’s chipset. Intel silicon powers 94% of the world’s servers today, so we expect to make life better for an enormous number of users.
  • We’ll work together to promote and distribute the software worldwide, directly and through the channels and partnerships we each bring to the relationship.
  • Intel Capital has acquired a stake in Cloudera, adding to the funds we announced last week. It’s the largest data center investment in Intel Capital’s history, and the investment makes Intel our largest strategic shareholder.

The investment, on top of the numbers we announced last week, is nice, but the commercial terms are what made this a compelling deal to us here at Cloudera.

Intel’s share, stature and reach in the market mean that it has enormous expertise, deep relationships and high-value partnerships for delivering hardware and software innovation to customers. Intel has consistently driven more function, performance and savings into its silicon. That lets software to do more things better and more cheaply with every new chip Intel designs.

So far, the Hadoop ecosystem has been slow to roll support for those features into the various projects. Our collaboration allows us to speed that process up substantially. The two companies will collaborate on roadmap and implementation, delivering better software solutions faster than we could separately. Intel’s done great work — AES performance improvement by taking advantage of on-chip encryption, for example. We’ll make sure those changes get into the trunk releases of the open source projects, and we’ll do much more together. Cloudera’s users and customers, but also the open source community as a whole, will benefit.

Working with Intel will also allow Cloudera to reach more customers around the world more easily. We expect to speed up adoption of our enterprise data hub offering by working creatively with Intel and its network of partners to make the software better, and to make it easier to get, to deploy and operate and to use worldwide.

The opportunity was just too good to pass up.

But that’s not the main thing I want to talk about today.


Google published its Google File System paper in October of 2003 and its MapReduce paper in December 2004. It’s reasonable to say that the ideas that turned into Apache Hadoop are just about ten years old today. The work on storage was already out, and the computational model was under active development and measurement ahead of an end-of-year presentation.

The Hadoop project itself was formally created in 2005. Doug Cutting and Mike Cafarella founded it, and Yahoo! staffed up a team to contribute. Other web companies — notably Facebook, but likewise LinkedIn and others — got involved. They were joined by individual contributors and committers, distributed around the world.

I’ve written elsewhere of my conviction that infrastructure software has to be open source. It insulates users and buyers from vendor lock-in and bad corporate behavior. More importantly, though, good open source projects benefit from a network effect: They get useful enough to be adopted widely, attracting new contributors and committers, making them better still, driving even more adoption. That snowball gets to roll down a very long hill.

That’s the way that the Mosaic browser created the world-wide web (and, as a side effect, the Apache Software Foundation), and it’s exactly what happened to Apache Hadoop.

Over the last ten years, the original Apache Hadoop project has spawned dozens of complementary open source projects. They come with goofy names — Avro, Pig, Hive, Impala, Zookeeper, Flume, Sqoop, Spark, Hue, Accumulo and many more — but they all have a serious purpose. They extend the capabilities of the core project by making it more powerful, more useful, more accessible and more valuable to new users. All of those projects create more opportunities for contributors and committers to get involved in rolling snowballs of their own downhill.

I find Hadoop fascinating because, like Mosaic, it’s brand new. Most open source projects — Linux, MySQL and PostgreSQL, JBoss — are good, but they’re really just knocking off long-standing proprietary software products. Very rarely, a genuinely innovative open source project comes along. Even more rarely, that innovative project creates a huge commercial market.

That’s what Hadoop has done.

More precisely, that’s what the global community of contributors and committers in the Apache Hadoop ecosystem have done. Sharing their work openly, collaborating across company lines when working for competitors or contributing software and expertise as individuals, these developers have created a huge new market while cranking out some pretty great, and entirely novel, code.

The resulting market is big enough to have attracted the attention and the active participation of the very biggest vendors of silicon, servers, storage, databases, cloud infrastructure and analytic systems. Ten years ago, the Hadoop project didn’t exist. Five years ago, virtually none of those companies had heard of it. Those that had didn’t take it seriously. Today, it’s their biggest threat or their biggest opportunity. For many, it’s both.

That snap-your-head-back, force-your-face-into-a-rictus acceleration can only come from real innovation. You can only go that fast if you harness the talent of the entire planet. You have to give people the tools they need to do good work and make it simple for them to share and collaborate. Then you have to get out of their way.

Open source is the only way I know to do that. The community comprising the global Apache Hadoop ecosystem, including especially the Apache Software Foundation, have done it here.

All this market opportunity — all this economic benefit — was created by developers working together in the open.

We’re pleased to be part of this robust and innovative global community. In the midst of our celebration here at Cloudera of the tremendous partnership with Intel and the hard work that led to it, I want to take a moment to say to all the contributors, committers, PMC members and other volunteers around the world: Very nice work, folks. We should all be proud.

What’s Next?

The pace of innovation in data management from 2004 to now has been breathtaking. I expect that the second decade of Hadoop will be more interesting still.

We don’t announce product at Cloudera before we ship, but you can certainly expect broad trends over our recent major releases to continue. Security, performance, reliability and integration with existing IT systems have been and will remain high priority. Our Intel relationship will certainly help there. Support for more and better applications and tools from our partner ecosystem make the platform easier to use. That stays true. Operations, compliance and governance workloads matter hugely to our enterprise clients and we’ll further extend our offerings there.

We started Cloudera with HDFS and MapReduce. Over the years, we added Apache HBase and Apache Accumulo for NoSQL support, Impala for real-time SQL analytics, Search based on Apache Solr for faceted text search, Oryx for machine learning and analytics and Apache Spark for streaming and analytic workloads. We’ve opened the platform to third-party engines from companies like Splunk and SAS. Every one of these engines is able to deploy and run right on the cluster, against the data stored natively in HDFS. You don’t have to move it to use it.

This real-time, secure, manageable and flexible platform is our enterprise data hub offering. We’ve come a long way from plain old batch-mode MapReduce. We expect to continue that advance, adding new ways of working with data and new ways of getting value out of it. Enterprises make decades-long bets on platform technology. The enterprise data hub offers a decades-long promise of continued improvement and innovation.

We’ve long had distribution and channel agreements with industry leaders. HP, Dell and Oracle all resell Cloudera, for example. Our bet has been, from the beginning, on a rich partner ecosystem of vendors — more than nine hundred hardware, software, systems integrator, hosting and other companies — who create customer value with their own products and help drive adoption of ours. Our new relationship with Intel is unique, but it’s very much in keeping with that philosophy. You can expect us to remain partner-focused and to grow the Cloudera Connect program well beyond the 900 partners who support our platform today.

The Intel relationship gives us more resources to invest for the long term in product and in making our customers successful. We have more money, of course, but we also have a world-class partner with a deep shared interest in proliferation of the scale-out enterprise data hub.

Our collaboration means that customers will be able to rationalize their IT spending. They’ll be able to do existing work better, faster and cheaper. More importantly, they’ll be able to attack new workloads, analyzing more data than ever before in ways previously impossible. They’ll get value from data they couldn’t capture before.


4 responses on “The Next Ten Years

Leave a Reply