Leverage the Full Potential of the Cloud with the Cloud-Native Capabilities of Impala

Categories: Cloud Data Warehouse

At Strata + Hadoop World next week, you’re certain to hear quite a lot around the cloud, the value it provides, and its rising popularity. At Cloudera, we have seen cloud-based big data deployments accelerate within our customers. From GoPro increasing revenue by understanding customer usage, to Schiphol Airport improving their operating efficiency with proactive machine maintenance, to FINRA lowering risk by building a holistic picture of US trading markets, cloud deployments are playing a key role across industries and mission critical use cases.  It’s only a matter of time before every enterprise is evaluating the cloud as a way to not only minimize their on-premises footprint, but also leverage the increased provisioning agility and pay-per-use elasticity for a lower overall total cost of ownership.

Business intelligence and analytic workloads are well-suited to take advantage of the benefits of the cloud, especially as it relates to flexibility and cost-savings provided from decoupled storage and elastic sizing. However, traditionally-architected analytic databases – even cloud-based ones such as Amazon Redshift – simply aren’t designed to support this functionality. To truly tap into the full value of running these workloads in the cloud, you need a modern analytic database that brings high-performance analytic SQL to big data – including direct querying of data in cloud-native object stores like Amazon S3 – and supports elastic scale whether deployed on-premises, in the cloud, or hybrid.

Cloudera’s analytic database is built with Apache Impala (incubating) at its core, providing cloud-native capabilities that are unmatched by any other solution available. Designed with decoupled storage and compute layers and with native support for the Amazon S3 object store, Cloudera’s platform can elastically scale to easily support more data, more users, and more applications, as and when you need it. For instance, you can scale up your cluster to support higher user concurrency during prime business hours, and then scale back down over the weekend. In addition, it can support transient clusters to help further optimize costs. You can easily spin up a cluster to run a specific report or other analytic job, and then terminate the cluster after the job completes.

Further, Impala is the only analytic SQL engine with hybrid portability. The same powerful analytic capabilities are available not only for on-premises deployments, but also across multiple clouds – so you are never locked into a single environment. Impala supports multiple storage options, such as the HDFS file system (including on local EBS storage), the updateable Apache Kudu data store, and the Amazon S3 object store, allowing you to get value from all of your data regardless of where it lives.

And while traditionally-architected systems require you to adhere to rigid schemas and move data specifically into these systems before querying, Impala provides much greater data agility without compromising on the enterprise needs for BI and analytics. Quickly bring together more data of any type, and add new data sources, all within an open, shared platform. Data can be prepared as needed and then made immediately available for reporting without data movement, and raw data is also directly available for analysis. Beyond just SQL, the shared platform lets you converge multiple applications and frameworks to extend the value of your data and results to your data science teams or operationalize them as part of your real-time applications.

Impala not only provides the above cloud-native values and capabilities, but it does so while maintaining the interactive performance requirements of BI and exploratory analytics even at high user concurrency. In fact, recent benchmarks show Impala providing better cost performance compared to alternatives (with a full benchmark comparison available on our Engineering Blog).

For more information on running high-performance analytics in the cloud, check out our talk at Strata + Hadoop World New York on Wednesday, Sept. 28th or watch our webinar, including a demo of Impala running on S3.


Leave a Reply