The Modern Data Warehouse

Categories: Enterprise Data Hub

Cloudera continually strives to advance the state of the modern data warehouse.  A environment that was often times defined by its constraints is being extended and complemented by new technologies that aim to extend analysis to data that was traditionally a poor fit for the data warehouse and doing it at a level of scale that wasn’t a reality for most data warehouse environments.  For a good portion of time, a data warehouse was an analysis environment that could only be afforded by large enterprises.  Now companies large and small can reasonably store and analyze unlimited data with the introduction of solutions like Cloudera Enterprise.  I sat down with Bloor Group head analyst Eric Kavanagh in a recent podcast to discuss the evolution of data warehouse design and how Cloudera users are modernizing their analytics environment and gaining new agility and functionality beyond just their ability to scale.

“I’ve been studying data warehousing now for, I’m almost afraid to admit, about 18 years, and it really has been the lifeblood of the analytics world for decades. What we see now with the Hadoop environment is a whole different way of persisting data and then of analyzing data. You talked about constraints. I’m glad you mentioned that, because if you think about how data warehouses were designed 25 or 30 years ago, there were constraints in place that really drove the design point.” – Eric Kavanagh

The data warehouse has long been the lifeblood of the analytics world but increasingly users are seeking ways to incorporate more data in their decision making.  This requires not only  more data, but often complex data that was traditionally not a fit for data warehouse platforms.  This is causing organizations to rethink their data strategy and consider augmenting their current capabilities and extending them with Cloudera Enterprise.  You can access the full conversation here, but below are some key takeaways you should consider.

Hadoop is More than Unlimited Storage

The promise of big data has long been to enable the collection of massive amounts of data for analysis.  However, as Hadoop has evolved users are not content with just expanding their storage footprint.  They want to ensure that the data is fueling decision making and that their analysis efforts are more accurate by incorporating more data.  This goes beyond scaling out current analytics but advancing to new forms of analysis including search and advanced data discovery.  

“We’re constantly evolving to try to address these more complex data types that companies need in order to really create a full picture of either a customer or a product or a support situation or a threat based on a multitude of data points.” – Sean Anderson

When we talk about the concept of the enterprise data hub, we are talking about the ability to not only store and process unlimited data but the ability to do advanced data discovery and power real-time analytic applications all in a single technology solution.  While an EDH has many advantages it requires some strategy for which workloads to bring online.  This is where Cloudera is focusing to help companies build a strategy around data workload adoption.

Breaking Away From a Top-Down Schema Approach

Technologists have long touted the accolades of schema-on-read.  The ability to land data of any format and worry about the relationships and schema afterwards allows for rapid ingestion and quick discovery on data even before it is incorporated into the companies adopted schema design.  Since traditional data warehouse design was based on a top-down approach where you  designed the schema based on assumptions of how your queries need to organize data it left little flexibility to accommodate anything that didn’t fit the database design.  Pioneer’s in data warehouse design like Ralph Kimball first broke this rigid framework with the introduction of true dimensional modeling.  Now Ralph and other data warehouse professionals are discussing how Hadoop can present opportunities to break the constraints of traditional EDW environments.  Cloudera has since built on these concepts alongside Ralph in a series of videos that address Hadoop’s capabilities to provide even further flexibility.

Cloudera is Focused on Enterprise Hardening

Open source software is fueling a revolution in application development and infrastructure design.  It is changing the way applications are being developed.  Along with the collaborative efforts of the ecosystem to mature Hadoop often comes a fragmentation in focus as users solve the specific problems and use cases that are powering their business.  In addition, the velocity of iterations is often rolled out at a feverish pace.  So how can organizations embrace Hadoop with the assurance that it is stable enough to meet the enterprise demands in production?  This is largely why users choose Cloudera.  Cloudera is making Hadoop a reality for customers in some of the most demanding scenarios.  Whether it is powering an entire application that serves millions of customers at scale, uncovering life saving predictions, or meeting the demands of regulatory compliance to ensure customer data is safe whole industries are betting on Hadoop.  Our users tell us they choose Cloudera because we are the best positioned to support them, develop the functionality that enables their use case, and offer them the training their teams need to be successful.   

“Whenever you think about growing and contributing to open source initiatives, many of the contributors are solving very specific problems, and they have specific needs based on the type of company they are, what type of data that they’re bring in, and so Cloudera’s role is to really bring that and make it an enterprise reality for people running Hadoop in production” – Sean Anderson

The flipside of that, is then ensuring that this great work is fed back into the open source community to ensure that we are always advancing the state of Hadoop in a way that addresses the use cases that our customers are focused on solving.

Cloudera is Powering Advanced Analytics

As technology allows us to move beyond basic reporting we need the ability to test our assumptions quickly and look at the answers first to understand if we are asking the right questions of our data.  By receiving better analytics, faster we can start to improve things like product adoption and sales forecast accuracy.  This will require a new approach to model development, and just like many of our peers in application, devops and cloud infrastructure the focus is around agile, iterative development.

“The speed of iteration is so valuable these days. Once again, if you look at the disparity between the waterfall development approach for designing software in the past to the agile approach, which is much more collaborative and immediate, where you’ve got Dev Ops guys working directly with the business, making changes every single day to mission-critical systems, again, the difference is like night and day” – Eric Kavanagh

This enables your business to be more agile.  You can respond to shifts in your business in real-time and enable a more diverse group of analysts with an array of tools to also benefit from data outside of your data science team.  By putting reliable, full-fidelity data in the hands of multiple sectors of the business you can ensure that everyone’s business strategy is based on a real view of the world.

“I think there’s going to be an absolute renaissance in enterprise applications that leverage analysis from environments like Hadoop.” – Eric Kavanagh

With ecosystems innovations like Impala and Navigator Optimizer Cloudera will continue to building a superior analytic database to a state where we are no longer managed by constraints.  A analytic database that is agile, flexible, and able to scale to meet the demands of tomorrow’s data.

Listen to the full recording here.


One response on “The Modern Data Warehouse

Leave a Reply