At Cloudera, we spend most of our day talking about data. Different types of data, different ways to process it, enrich it, analyze it, track it, secure it, etc. For a good chunk of our existence, the data our customers cared most about lived in a data center within their four walls. That’s not the case anymore, and it hasn’t been for a couple years. Most of the new data we’re working with is generated and continues to live in the cloud.
A recent Cisco survey suggests cloud workloads are expected to more than triple between 2014-2019. Meanwhile Gartner suggests that at least 70 percent of application development projects will be deployed in the cloud by 2020. So while cloud use has been steadily rising in our customer base and in enterprises as a whole, we’ve really only begun to scratch the surface.
For most companies, running Hadoop in the cloud is no longer a question of if, but when, how, and on what platform? The goal of Cloudera Director 2.0 – announced today – is to make it easier than ever to deploy and manage your clusters on your terms. At the end of the day, what matters to our customers is not where the data lives, but whether they’re getting value from it – which means ensuring the same fast, easy, and secure enterprise Hadoop platform across any environment.
That’s why we work closely with our customers and cloud partners to ensure we’re delivering technology that enables Cloudera Enterprise to run across cloud environments – taking advantage of all the benefits the cloud has to offer – while providing the same reliable experience customers expect from running Cloudera on-premises.
While customers like FINRA (Financial Industry Regulatory Authority), Airbnb, and Adecco might have very different data strategies, requirements, and business approaches, they all have embraced Hadoop in the cloud at a large scale to address their respective big data challenges. From their use cases, along with dozens of others, we have seen commonality among the types of use cases that best benefit from the flexibility of the cloud. By and large the primary use cases for Hadoop in the cloud fell into three categories:
- ETL and Modeling
- Business Intelligence (BI) and Analytics
- Application Delivery
Cloudera Director was developed to simplify the experience of running these common workloads in the cloud, providing one interface to deploy and manage all your clusters, across all your environments. The release of Cloudera Director 2.0 builds on this idea and includes new automations that reduce operating costs, critical enterprise-grade capabilities and troubleshooting necessary for production, and customized configurations to quickly get started with common workloads.
For transient workloads (common for ETL and modeling), Cloudera Director 2.0 makes it easy to provision clusters and resources on a per-job basis. Through automated job submissions, administrators can define when ETL workloads need to be run and Cloudera Director will spin up the cluster, run the job, and then terminate the cluster, ensuring the business is only paying for exactly what they’re using. For further cost savings, this release also adds spot instance support for AWS and preemptible instance support for Google Cloud Platform (GCP).
For these data engineers and data scientists, they can work on their data directly where it lives without moving it to HDFS. The recent release of Cloudera Enterprise 5.5 introduced support for Apache Hive and Apache Spark on Amazon S3, so users can continue to use the best-of-breed processing tools, regardless of where the data resides.
Powering BI and Analytics workloads requires the ability to quickly scale the cluster to support low latency, high-concurrency access for all business users. Building on the elastic capabilities already available in Cloudera Director, the latest release adds cluster cloning and repair. These capabilities make it easy to scale out to support varying usage and repair issues without disrupting the business. Additionally, with the recently announced RecordService (available in beta), Cloudera Enterprise is the only Hadoop platform able to support secure, multi-tenant access to all users analyzing data in Amazon S3 or other Hadoop storage options.
For long-running Application Delivery workloads, Cloudera Director 2.0 has integrated the critical enterprise functionality of high availability and Kerberos configurations within the overall bootstrap workflow, making it seamless to set up. To further integrate and protect these applications, the addition of external database connectors make it easy to connect to the rest of the business and to Cloudera Enterprise’s Backup and Disaster Recovery.
At Cloudera, we remain intensely focused on helping our customers gain business value from all of their data. That means we have to deliver a seamless user experience whether your data is stored on-prem, in Azure, AWS, GCP, or any other public or private environment. As your cloud footprint continues to grow, we encourage you to think about how this new generation of data can drive your business – whether it’s designing better products, providing greater customer service, or simply cutting costs.
- For a technical look at Cloudera Director, check out the engineering blog
- Cloudera Director is available as a free download for use with CDH and Cloudera Enterprise
- Register for our webinar to learn how to deploy Cloudera Enterprise on Microsoft Azure