This blog was penned by Scott Hedrick, Director Big Data Ecosystem, Informatica.
Adoption of Apache Hadoop is increasing in large organizations, but for this data management framework to become foundational, enterprises need a way to gain visibility and track their Hadoop data.
In the early days of Hadoop, Web companies would hand code new custom data infrastructures into Hadoop with little understanding of what was really happening to the data, save for a few elite developers. This may still work for some companies, but enterprises and other large organizations that are increasingly subject to government regulations and internal data policies need the ability to audit their data pipelines on Hadoop in the same way they do on traditional infrastructures.
Industries from financial services and insurance to healthcare and manufacturing need to comply with data management regulations and prepare for internal and government audits of their practices including the lineage of the data on which they are basing critical decisions.
The integration of Informatica Metadata Manager and Cloudera Navigator provides complete visibility into the data and processing within Cloudera’s enterprise data hub offering. Navigator is a visual tool that maps out data lineage in Hadoop, including tracking access to Hive, HDFS, Impala, HBase and Sentry.
Only Informatica and Cloudera provide end-to-end data lineage from source systems through Hadoop, and into BI/analytic and data warehouse systems, including hand-coded processes. Within Informatica, you can view the complete data movement and transformations captured by Navigator in the context of a complete data pipeline across the entire information management infrastructure.
In addition to the governance aspect, the end-to-end data lineage provides data scientists and analysts with greater visibility into their data sources and the history of transformations and rules that have been applied to data. This gives them a more clear understanding of their data and its veracity.
Informatica’s integration with Cloudera Navigator makes big data more accessible to large organizations by providing the production-ready capabilities, governance and risk management they need to deploy operational systems at scale with confidence. A short demo of Informatica’s metadata management and integration with Cloudera Navigator is available here.
Scott leads the Big Data partner ecosystem at Informatica. He works with key partners to bring great integrated joint solutions to market and promote them to the world. Prior to Informatica, Scott has driven success through marketing, ecosystems, partnerships and product leadership at companies including Nokia, Opera Software, MontaVista Software, Sun Microsystems and mobile startups. Scott helped pioneer the market for Linux and and web technologies for smartphones and internet devices, such as Nintendo Wii, Sony Internet TVs and Motorola phones. He has lived and worked in Europe and Japan. Scott received a MBA in Global Business from Thunderbird School of Global Management.