Delivering High-Performance Analytics on the Public Cloud

Categories: Cloud Data Warehouse Product

It’s rare these days that a customer conversation occurs without the topic of cloud being broached. Our customers like the idea of being able to provision capacity and grow and shrink clusters on-demand, as well as the convenience of not having to rack and maintain hardware. And when it comes to analytics, the cloud affords unprecedented elasticity and infrastructure availability. So when a retailer needs to look over holiday sales results or a well operator needs to determine why a drill is slowing, they can quickly provision compute resources and dig into their data.  

In fact the list of reasons why companies choose to run Hadoop on public cloud infrastructure seems to grow almost daily. But flexibility, agility, and TCO notwithstanding, probably the most obvious rationale for Hadoop in the cloud is that the cloud increasingly is the birthplace of most new and vastly unexplored data.

Think about data generated from mobile devices, sensors on machines, clickstream logs, social media. It’s all there, typically sitting in a cloud object store like Amazon S3. Unfortunately because this data is largely unstructured and unavailable for analytics without moving it on-premises or to another cloud storage database, few companies are able to gain value from it.  

Recent advances in CDH have opened up access to Amazon S3 for data engineers – first with support for Apache Hive on YARN, then Apache Spark, and most recently, Hive on Spark.  This enabled our customers to build data pipelines, and run batch and stream processing on cloud-native cluster configurations.

The latest version of Cloudera Enterprise – released just last month – includes Apache Impala (incubating) support for Amazon S3. Impala is a high-performance analytic SQL engine fully integrated with Hadoop, and is quite simply the fastest and most cost-effective way to do BI on big data in the cloud. By enabling customers to do reads from and writes to Amazon S3 with Impala (as well as Hive-on-Spark), Cloudera is providing customers with a modern analytic database in the cloud. See this latest advancement in action below:

Think about how you’re doing analytics today:

  • Are you driving insights from all the data that’s available to you?
  • Is data movement within your cloud environment creating latency?
  • Do your analysts look at data stored within multiple cloud environments?
  • Do you have a choice in the tools you use for analytics?

Whether you’re all in on public cloud today, or thinking about how cloud might impact your future, it’s important to consider how and where you’re going to ingest, store, process, and analyze your business data. Our customers tell us that they want to open up access to all their data, so that SQL developers and business analysts can explore and query that data to gain insights that drive business impact.
That’s as true today in the cloud as it is on-premises. To learn more about how to deploy Cloudera in the cloud, visit


Leave a Reply