In a blog posted earlier this week, my esteemed colleague Sean Anderson laid out a powerful argument for machine learning (ML) as a way to fuel recommendation engines, churn reduction engines, and IoT workflows. Leveraging components like Apache Spark, and its machine-learning libraries, data scientists are able to design and train complex models using troves of data at unprecedented scale.
But having access to the right tools is only one piece of the puzzle. You also need to ensure your application is running on infrastructure that is optimized for the job you want to do. This is as true for ML as it is for ETL jobs, SQL analytics, stream processing, and the list goes on. If you’re working with data – something so foundational to your company and your customers – establishing the right environment is absolutely critical.
If blogs about the rate of cloud adoption in the enterprise were a food source, no one would ever go hungry. And there’s good reason for this trend (in adoption, not blogging). The cloud delivers a level of customization down to the application that’s simply unmatched in on-prem architectures. That means you can be very particular about how to set up an environment. You can optimize for transience, lower cost, elasticity, and so on.
But limitless choice and flexibility can often lead to more questions. What’s the right operating environment for my given workload? How do I need to think about data storage, security, and governance? What kind of expertise is necessary to run a big data platform in the cloud?
Whether you’re all in on the cloud today, thinking through a migration, or still in the exploration phase, it’s important to keep in mind the various application patterns and how each leverage different facets of cloud infrastructure. For example:
- Short-lived data engineering workloads like ETL can take advantage of transient infrastructure to for lower cost.
- An Analytic Database in the cloud delivers elastic scaling and can query data directly from cloud object storage
- Low-cost cloud storage can reduce the cost of backup and disaster recovery for operational database workloads.
On February 9, we’re kicking off a webinar series on big data in the cloud where we will explore each of these use cases in greater detail.
And of course since we’re talking about cloud, we need to have an on-demand asset, right? Check out our recently released white paper, “A Modern Data Platform for the Cloud” Just beware that this paper does contain stats about cloud adoption in the enterprise.
Consider it “food for thought.”