Building a Self-Managed Shared Data Experience

Categories: Cloud

Cloud promises many advantages as an environment for machine learning and analytics. Cloud makes it fast and easy to spin up resources for new applications. Cloud offers elasticity of those resources to efficiently support transient analytics workloads and data pipelines. Cloud offers self-service without waiting for IT infrastructure and operations teams. Cloud service providers now offer a plethora of “house brand” analytics services, some brand new, some re-purposed versions of older technologies.

Yet there is often a dark side to using public cloud service providers’ analytics offerings in the cloud, too. As each new service is started, data is typically copied in for analysis. This means cloud makes it too easy to accidentally create a lot of redundant silos of data for each of those cloud analytics services. That data may be stored multiple times in different pools, each multiplying storage resource costs. That data may be used in ways that don’t comply with appropriate security and governance policies. That data may be hard to discover for other users and other applications. That data may be hard to track for audit and compliance purposes. Worse, the metadata and context associated with that data may be lost forever if a transient cluster is shut down and the resources released.

There is a better way. A way to leverage the benefits of cloud for multi-disciplinary analytics, without all of those problems. At Cloudera, we call that better way the Shared Data Experience (SDX.) We announced SDX at Strata New York in September 2017. Now we are releasing the reference architecture for you build your own self-managed SDX foundation for all your cloud-based data and analytics applications. Best of all, SDX is part of Cloudera’s core platform, not an expensive add-on. You can read here about how to deploy SDX in your cloud.

Cloudera SDX will give you the best of both worlds. Self-service access to a universal data in a single data store for all of your applications, not siloed into a fragmented service for each type of data science, business intelligence (BI), data engineering, or real-time operational analytics you want to do. Cloud data that is already curated with business context, not needing IT to rebuild schemas or business definitions. Cloud data that is secured and governed consistently and pervasively for all users and uses, not a wild West of unknown dangers and risks. Cloud data that is cost-efficient, not redundantly stored in different formats. Cloud data on a universal platform, built together from the beginning, that is ready for you to be active and agile in discovering new insights. Cloud data that can efficiently drive all your machine learning and analytics initiatives. It’s time for a new approach to analytics in the cloud. It’s time for SDX.


Leave a Reply