CDH 5, Cloudera’s software distribution containing Apache Hadoop and related projects and Cloudera Manager 5.0 are new both generally available. It’s common in these sorts of announcements to review “what’s new.” In our fast paced industry I think it’s worth talking about “what’s new” from two vantage points.
CDH 4.0 was our last major release which became generally available a little more than 18 months ago. Looking at 5.0 from the perspective of a 4.0 user the advances would seem unbelievable. Contrasting CDH 4.0 with 5.0 we can really see the evolution of the platform from a Hadoop distribution to an enterprise data hub.
- Hadoop is now open for business when it comes to mainstream BI users. SQL is 10-100X faster with low (< 2 second) latency response times and good support for the ANSI SQL standard.
- Hadoop data can power search applications for business users. Any file or object can be indexed and searched in near-real time.
- Data scientists no longer have to choose between running their favorite tools and running their models at scale. SAS, Revolution Analytics and other popular predictive analytic tools can run natively inside a Cloudera enterprise data hub.
- Jobs run faster. Thanks to Apache Spark, data processing jobs can now run 5-100X faster with some or all phases of the jobs running in memory.
- Developers are happier and more productive. Also thanks to Spark, the development experience for creating data processing jobs is strikingly more usable. Jobs take 1/5the the number of lines of code they used to and can be tweaked using an interactive interpreter.
- Operators sleep easier. The entire platform has full disaster recovery capabilities as well as automated backup and restore tooling.
- Managers can feel (just) a little less paranoid. The platform now supports role based access control and users can secure collections of tables or columns or ranges of rows.
- Data in the system is taggable, searchable and has full-fidelity lineage, down to the column and calculation
- Now with streaming. The system can now process continuous streams of data in addition to the traditional batch or interactive methods.
- We can all just get along. Customers can run SQL and MapReduce and Spark workloads concurrently and dynamically schedule and share resources among them using the new YARN resource management framework. YARN also enables developers and third party ISVs to develop new kinds of scale-out applications that run natively inside the platform.
- Even prettier. The user experience for Cloudera’s Hue 3.0 is out of this world
For those of you have been following along with every update and add-on there’s also a “what’s new” comparing Cloudera Enterprise 5 compared to those running our very latest release (CDH 4.6, Impala 1.2, Spark 0.9, Search 1.2.) You can find the release notes for CDH here and Cloudera Manager here.
The new capabilities of Cloudera Enterprise 5 are important but equally important are the tools and applications that our ISV partners develop for the platform. The size of our commercial partner ecosystem has never been larger. This release sets a new benchmark for collaboration between Cloudera and the partner ecosystem. As of today, there are 100 (yes, exactly 100) of our partners’ products that certified on CDH 5 prior to the general availability of the release, which is a true testament of their commitment. We are excited to continue to working with our partners on certification and help maximize the opportunity for companies to gain value from all their data.
Looking at the capabilities of the platform, the strength of the partner ecosystem and the real-world customer experiences, it makes sense to see the natural evolution of Hadoop is the enterprise data hub. By CDH 4 Hadoop was widely recognized as more flexible, more scalable and less expensive than traditional data management platforms. By CDH 5 we can also say it is faster, more functional and better integrated.
We developed this platform by listening to our customers, our partners and our user community, but how this release got built has mattered as much as what got built. Cloudera has always developed and shipped a 100% open source platform and the benefits of that strategy have never been clearer. During the course of this release, Cloudera employees co-founded several new open source projects including Impala, Parquet and Apache Sentry. Today, each of these projects are recognized as the leaders in their respective categories. Recognizing our company cannot be the sole source of innovation for the industry, Cloudera also started to contribute to other existing open source projects and folded these into the platform as well; notably Apache Solr and Apache Spark. Lastly we’ve continued to demonstrate our ability to add value on top of the open source platform for enterprises by building and enhancing licensed software tools such as Cloudera Manager, Cloudera Navigator and Cloudera BDR.
With CDH 5 now generally available we’ll proceed with our regular cadence of quarterly updates. We will continue to update CDH 4 as well but you’ll see the majority of enhancements added to the CDH 5 releases.
Thank you for your support of Apache Hadoop, Cloudera, and CDH!