Previously, we announced that the leaders in the data governance space have joined Cloudera to provide a unified foundation for open metadata and end-to-end visibility for governance. Today, we are happy to host this guest blog from Oliver Claude, CMO of Waterline Data.
In a conversation with one of our customers, who happens to be a physician, I learned that he took time out of his busy schedule to attend a SQL class. When I inquired for the reason, he explained that he did not know what was available and what to ask for. He kept asking IT for different data for a year, but it was never enough data, never quite the right data, and never soon enough. However, after two weeks of learning SQL and obtaining direct access to data, he was able to explore all the data on his own, found exactly what he needed, and was able to crack a healthcare research challenge that he’d been working on for a year.
The data self-service is revolutionizing how enterprises make decisions.
The big data trend we’ve seen unfolding over the past year is using the Hadoop platform as a shared service for the business. This phenomenon, which is driving data scientists and data engineers, who need to have quick, easy, and trusted self-service access to data assets to uncover new insight and solve business problems, marks a major step forward in the evolution and adoption of Hadoop as an enterprise data platform.
But self-service of data creates a potential battle between business and IT for control of the Hadoop environment and the data within it. Business users might see Hadoop as a way to bypass IT and get what they need quickly and cost-effectively, while IT recognizes the nightmare scenario of Hadoop proliferation quickly turning into an unsupportable morass. To meet the requirements of both, a shared service can strike just the right balance between data-driven agility and the need for data management and data governance.
Demilitarize the battle zone between the business and IT
The only way for data architects to handle the exploding demand for data is to deploy a self-service shared data layer, since the traditional governance approaches won’t work. How do you provide everyone direct access to data and still make sure government regulations on data privacy are complied with? How do you let data scientists do their own data cleansing and treatment of raw data and provide quality data to everyone else? How do you keep track of data while everyone is creating new data sets? And, of course, how do you do all that for vast amounts and wide varieties of data?
With more and more government and industry regulations and stricter enforcement, governance is becoming a major focus for many enterprises. Our customers are moving from the traditional approach to implementing and maintaining governance policies across heterogeneous systems that use different technologies. A shared service approach provides a central point of governance and the ability to create consistent data quality policy, track lineage of data, and find all sensitive data in all systems and handle it appropriately. Enterprises are bringing data into a Hadoop-based data hub, so it can be treated using consistent governance policies and made available to the analysts and project teams in a self-service yet controlled manner.
Waterline Data Inventory and Cloudera Navigator
Waterline Data — which provides profiling, metadata discovery and a self-service data catalog — is building on its partnership with Cloudera and investing to certify with Cloudera Navigator to continue to deliver the most comprehensive data governance solution for Cloudera customers. The Waterline Data Inventory product complements Cloudera Navigator in two primary areas:
- Metadata/tagging – Waterline Data Inventory will consume as well as augment the metadata in Navigator. It can automatically profile and discover technical, business and compliance metadata in a data hub and do so at Hadoop scale. This allows large volumes of data assets to be inventoried and tagged without coding or slow manual registration processes. In particular, Waterline Data Inventory can discover sensitive data, allowing Navigator to encrypt the data, and can detect schema changes and notify downstream business users.
- Data lineage – As part of the automated discovery, Waterline Data Inventory can discover additional data lineage to augment Navigator’s data lineage. It can do this discovery “after the fact” to augment log-based data lineage, helping provide an end-to-end view of lineage.
Data governance is an integral part of deploying a Cloudera enterprise data hub as a shared service for the business. Our customers and other leading organizations have recognized that agility and self-service are a liability without data that is managed and governed. Waterline Data and Cloudera Navigator will provide a solution that automates data governance and accelerates data discovery.
Additional information is available at www.waterlinedata.com
Oliver Claude, CMO of Waterline Data
As CMO, Oliver brings years of experience in strategic sales and marketing in the Information Management space. Prior to joining Waterline Data, Oliver was VP and Chief Solution Owner for the SAP Enterprise Master Data Management portfolio. He held similar positions focusing on MDM and Data Quality as VP of Product Marketing at Informatica, and Program Director of Product Management at IBM. Oliver also served as Business Unit Executive at IBM, creating a go-to-market Center of Excellence for IBM’s MDM portfolio. After four years as a Senior Sales Engineer at Meta Software, Oliver spent five years at Siebel Systems as a pre-sales Lead Architect. Oliver holds a M.S. in Management Information Systems from NOVA Southeastern University, and completed the Strategic Marketing Management executive program at the Stanford Graduate School of Business.