There has been a fair amount of confusion about Cloudera’s position towards the enterprise data warehouse (EDW) in the market, as a technology, and in practice. We are hoping that with a little education this confusion can be put to rest. However, like any disruptive technology, I anticipate that this conversation will continue and evolve as our customers identify the right platform for their unique data needs.
Over the last year or so, Cloudera has taken a strong stance, introducing the enterprise data hub (EDH) as the de facto standard for the next generation of big data management. This has caused quite a commotion within the EDW community because some people and companies have claimed that introducing the EDH means that Cloudera is trying to “kill the EDW” or something to that effect. That’s just not the case.
For starters, we have been partners with Oracle for the past couple of years, and Oracle like pretty much all other data management vendors, sells an EDW (Exadata). In fact, they also sell their own version of an EDH, the Big Data Appliance, right alongside Exadata. Why? Because they understand that their customers use each platform for different workloads depending on the customers’ needs and constraints. In fact we have had connectors to Teradata for a long time that we co-developed with Teradata’s engineering team to make it easier to deploy and interoperate an EDH with an EDW.
EDWs and Enterprise Data Hubs
Cloudera’s vision is simply that there’s a new platform for data management built for the next generation of data applications. Like an EDW, an EDH can be used to help solve similar or dissimilar workloads depending on what the customers’ architecture looks like. There is definitely overlap – EDWs and EDHs can be used to solve some of the same problems, but conversely, each platform can do things the other can’t. As hundreds of customers deploy their EDHs, we continue to learn more about how the EDH and the EDW can and should coexist, we will continue to clarify the strengths and weaknesses of each, and what workloads are better for what system.
Is the EDH a solution to all data problems? Of course not, but as I said there is some overlap, and good-hearted people can (and do) argue about the boundaries of the overlap. But the fact is that thousands of enterprise customers all over the world are experimenting and pushing Hadoop in new ways, and in many cases they are moving some data, some workloads and some processes from some existing systems into Hadoop because it offers new capabilities at a much lower cost.
So what does Cloudera advise our customers to do with respect to EDWs?
We recommend building an EDH next to your EDW. Then if it makes sense, you can examine the data and workloads in your EDW and decide which ones would perform better in the EDH, thereby freeing up space and CPU cycles for the workloads that can only be served appropriately by an EDW.
We have hundreds of paying enterprise customers and many of them are doing just that. We are focused on helping our customers solve difficult data management problems using Hadoop. Now EDWs have been around for a long time and they are very powerful systems. In fact many people argue that an EDW is not really a product per se but a set of business processes (Kimball style data marts, ODSs, ETL/ ELT, etc), which we think is a reasonable way to think about it.
What do Long-Time EDW Practitioners Think?
There’s a wide range of opinion here. But one example is articulated very well in a webinar we produced last month for EDW professionals featuring Dr. Ralph Kimball, founder of The Kimball Group where he did a nice job explaining to EDW professionals what Apache Hadoop (and our EDH) actually is. He explained how Hadoop is similar to (and different from) EDWs in a variety of ways, and how he thinks EDW architects should think about Hadoop.
I really recommend you watch it (if you’re interested you can watch it here). For those of you who don’t know Ralph, he’s generally considered one of the leading thinkers in the area of data warehousing and has written some of the seminal books on the topic. The first one was so popular (over 6,000 registrants) that we did a second webinar on May 29th for Hadoop professionals who want to better understand EDWs, kind of the opposite of the first one.
To wrap up, I’ll just repeat in case you’ve gotten this far and forgotten why you started reading:
- We do not recommend throwing out your EDW.
- We do not think EDWs are dead.
- We don’t think our enterprise data hub is a direct replacement for your EDW.
But, we do think any modern enterprise can benefit from building an EDH to see how it complements their existing data management infrastructure, and also gives you new capabilities and insights that you may have been missing.
Oh, and the EDH can also make you millions of dollars… but I will save that for a different blog post.