The Future of Data Warehousing: ETL Will Never be the Same

Categories: Compliance Data Science Data Warehouse Enterprise Data Hub General Product Security Spark Success Stories

Last Monday, over 2,200 of you tuned in live for our webinar “The Future of Data Warehousing: ETL Will Never be the Same”, in which Dr. Ralph Kimball and Manish Vipani, VP and Chief Architect of Enterprise Architecture at Kaiser Permanente engaged in a lively discussion of how Apache Hadoop is enabling the evolution of modern ETL and data warehousing practices.

(If you’re new to this series, be sure to first check out our other webinars with Dr. Kimball, in which he offered an introduction to Hadoop concepts for data warehouse professionals, and also an overview of data warehouse best practices for Hadoop developers.)

Picking up where those previous webinars left off, Dr. Kimball reiterated why Hadoop is having such an impact on traditional data warehousing environments: By modernizing the “back room” of data collection and preparation, it can rapidly open the door to more data, more users, and more diverse analytic perspectives than ever before possible. Hadoop’s modular, scalable architecture presents several attractive benefits, including:

  • Performance – Meet SLAs with highly parallel distributed computing frameworks such as Apache Spark.
  • Scalability – Keep unlimited data online with predictable, linear growth.
  • Diversity – Handle any kind of structured or unstructured data without having to predefine a schema, through Hadoop’s signature “schema-on-read” capability.
  • Low Cost – Offload processing workloads or archival data from an existing data warehouse, mainframe, etc.
  • Flexibility – Go beyond SQL, using multiple programming languages and alternative techniques, such as full-text search using Apache Solr.

We put it to the audience: Which of these matters to you? Not surprisingly, the answer turns out to be “all of the above”!

Despite the benefits, over 50% of our audience had not yet begun their journey to Hadoop, but were eager to learn:

Fortunately, Manish was willing to share how Kaiser Permanente overcame some of the common barriers that can hold organizations back from becoming data-driven. By adopting a pragmatic, incremental approach to adopting Hadoop as their unified “landing zone” — and by using some of the advanced security and governance capabilities of tools such as Cloudera Navigator to meet their privacy and compliance requirements — the team was able to quickly roll out over 10 use cases across diverse data sets and lines of business.

Next Up: Q&A

During the webinar many of you submitted questions — over 250! — that we were unable to answer live. Thankfully, Dr. Ralph Kimball and Manish Vipani have taken the time to address most of them. Stay tuned over the next week as we share their responses! And again, if you missed the live broadcast, you can catch the replay here.


Leave a Reply