A key benefit of Apache Hadoop is the platform’s ability to support multiple different frameworks for working with shared data, so that users across a company – from developers to business analysts to data scientists, and anyone in between – can use the right tool for their workload and their skillset. With Cloudera Enterprise, all of these users get the fastest performance for their workloads with the best-of-breed technologies, the platform is easy to manage even as it scales out to support more and different use cases, and compliance-level security is built in. The new release of Cloudera Enterprise 5.7 furthers each of these three key areas so enterprises can fully embrace the platform across the business.
Faster across workloads
As part of this release, we have added support for Hive-on-Spark. When it comes to ETL development and batch processing, Apache Hive remains the de facto tool with its familiar SQL-like language and large-scale adoption across enterprises. Hive traditionally leverages MapReduce as its underlying execution engine. However, Apache Spark provides a number of advantages compared the MapReduce – including easier development, greater flexibility, and faster processing – which is why it’s poised to succeed MapReduce as the standard data processing engine for Hadoop. The One Platform Initiative is the roadmap for completing this transition, focused on better uniting Spark and Hadoop and ensuring Spark can meet all enterprise requirements. This release marks a critical milestone in this goal.
With Hive-on-Spark, data engineers can seamlessly transition existing and future Hive workloads to Spark and take advantage of the faster processing power, with an average 3x performance improvement. To further minimize business disruption, partners including: BMC, ClearStory Data, Elastic, NGDATA, Solix, Trillium Software, Zementis, and more are working with Cloudera to certify their solutions so customers can continue to use the leading data integration and preparation tools while taking advantage of this latest technology.
For most enterprises, data processing isn’t the end goal though. Once it’s processed, it can be opened up to other departments and users for business intelligence or other analytics to discover new insights. And, of course, with Cloudera and Hadoop, this can all be done within the same platform. Apache Impala (incubating) is the fastest analytic SQL engine for Hadoop and supports high-concurrency data access for business analysts across the company. With Cloudera 5.7, we continue to widen Impala’s performance leadership, even as it scales to support hundreds of users, with an average 2x performance improvement compared to earlier versions.
As the platform supports more users, workloads, and applications, it’s important that each has the right resources available to support their jobs and meet SLAs. As part of Cloudera Manager, administrators can dynamically manage resources for these user groups based on workload priorities, day of the week or time, or other business needs. With Cloudera 5.7, these administrators can now get full visibility into historical usage and efficiency across users, tenants, and applications. This new Cluster Utilization Reporting feature ensures efficient operations and proper resource allocation between groups and workload types based on what’s actually being used. This automatic reporting also helps guarantee SLAs are being met; provides simple troubleshooting of job and query performance issues; and enables better capacity planning. Check out the videos below to see how this reporting can be used for investigating YARN and Impala workloads.
Cloudera Enterprise 5.7 is now available for download on cloudera.com/downloads. For more details on what’s new with Cloudera Enterprise 5.7, register for the “Cloudera 5.7 Webinar Series” and check out the Developer Blog.