Increasing an SQL-Expert’s Agility – A Real World Story

Categories: Data Warehouse Success Stories

For our customers, powering true self-service business intelligence (BI) is a major use case for Cloudera’s modern analytic database. Its flexibility and performance enables end-users to explore and iterate on data as needed, without being limited by rigid structures or IT queues. They can answer new questions and discover new insights in ways never before possible.

But it’s not just our customers that get to benefit from this technology. Here at Cloudera, we maintain our own enterprise data hub (EDH) to provide predictive support to our customers and better run our business with a consolidated view of data. While the internal platform is used heavily by Sales, Marketing, and Support, the Accounting team was still operating in a silo. That is, until Mark Brine – accounting director in our Finance Department – endeavored into an adventure of changing how things were done and turned into a change agent for his team.

Mark Brine works in Accounting, and Revenue in particular. Most, if not all, accounting systems are designed around money and which party receives an invoice is critical. While the reports on these invoiced customers are functional to address accounting needs, they were also quite limited. Especially as executives started asking tougher questions around customer satisfaction and monitoring operations to improve satisfaction and company results. Cloudera was looking to drive operations through a holistic look at end-customers – metrics such as support cases filed, revenue, churn, satisfaction, consulting projects, training, etc. However, Mark was limited by an accounting system that struggled to report on much more than just the end-customer name. Mark feared irrelevance and knew something needed to change.

At the time, Cloudera’s accounting data was stored in a well-known database with a well-known SaaS vendor. Under the covers this system holds the data in a traditional relational database. In theory, it could be queried directly with SQL, via an ODBC driver, but faced challenges in practice when running real world reports, especially as the team tried to answer these new questions. For example, a consultant was hired to build an enriched revenue report with SQL. But the ODBC driver report broke simply trying to get the existing revenue report, much less the enriched revenue report, due to the number of joins required.  

Mark opted to try something new. After a few queries across the ODBC driver that ran until Mark went home for the day, he involved the internal data team that manages Cloudera’s EDH. It ingests data from many different sources, such as marketing leads, Salesforce data, and even diagnostic data from customer deployments, to provide a consolidated view of the business and helps departments be more data-driven. The team took a look at the reports Mark wanted to run and told him, “These are easy as long as the data is in the EDH.” Mark was tired of his limited reports and Excel lookups and wanted the agility and self-service capabilities that the EDH provided. So the ODBC driver against the traditional database was discarded and Mark and team got to work.

Using the structured data ingest tool, Apache Sqoop, it was easy to connect to the well-known accounting system’s JDBC driver interface and quickly pull the structured tables into the flexible, Apache Hadoop-based platform. Once stored in Cloudera’s platform, the data was immediately available for a variety of use cases, such as batch processing with Apache Hive, high-performance BI and SQL analytics with Apache Impala, free-text search with Apache Solr, or data science with Apache Spark. And with this new flexibility and speed of a modern analytic database, the reports that stalled on the legacy system suddenly were completed in a matter of seconds. This alone was a huge win for Mark and team, but the story doesn’t end there.

To better facilitate self-service BI and exploratory analytics, Cloudera’s platform also includes HUE as the intelligent SQL editor, designed for SQL developers. HUE provided Mark with the direct access and agility he needed to go beyond the canned accounting reports and truly start discovering insights within his data.

Mark worked to reverse engineer the ingested tables, and created new and joined tables to support an additional revenue report by end-user parent, end-user company, and so on. In the end, 32 joins were used for this report. When Mark explained this new report to his colleague, they laughed and said that kind of query would be impossible to run. Using HUE (to query data via Impala), Mark pulled up the query with the 32 joins, 1.7M transaction lines, 800k transactions, and many columns and pressed run to create the 200k revenue line items in well under a minute1.

By thinking differently about his data and what’s possible, Mark and his team now have a world of data and answers at their fingertips. New reports can easily be generated and iterated on to answer the more challenging questions, all without the need to wait for IT (or a legacy system’s response times). And thanks to HUE’s query assistance and intelligent recommendations, Mark can spend his time running new queries rather than wasting cycles trying to debug or optimize “impossible” queries.

“I created my own ad-hoc data warehouse. I have very wide tables (lots of columns) but short tables (not many rows compared to a bank). It is perhaps not the first thing you think of when you hear big data, but the flexibility and possibilities are a huge value to me and let me be more self-sufficient and productive. It, of course, helps that I work at a data-driven company that encourages us to use our own product and to try new things.”

–Mark Brine, Director of Revenue, Accounting, Cloudera

For Mark, this is still just the beginning. Through his self-service access, he’s joined accounting revenue data to Salesforce customer and opportunity details for even more enriched reporting. In Cloudera’s recent IPO, many of the revenue related numbers reported in the SEC filing were calculated with this accounting and Salesforce data. By modernizing with Cloudera’s platform, Mark is no longer constrained by rigid systems and, instead, is empowered to define how his team thinks about data and the impact it has to the rest of the business.

For more details on Cloudera’s modern analytic database, visit:


1This has since been reduced to seconds thanks to Impala performance improvements in recent Cloudera releases.


Leave a Reply