“Your relationship with data is changing, and your approach to data needs to change with it.”
In my role as Big Data Evangelist at Cloudera, I spend a lot of time talking to customers about data as a competitive and business-defining asset. It’s undeniable that the most successful companies – Amazon, Facebook, Apple, Microsoft, and Google – put data at the center of their business.
But desiring to be data-driven and executing on that desire are entirely different things. That’s why when I meet with customers, I make an effort to share how modern approaches to data rooted in open source projects like Apache Hadoop and Apache Spark, differ from how data was traditionally managed. There are myriad differences from a pure architecture perspective to be sure, but I like to break it down along the lines of how you work with data. Namely, how you:
- Build your data asset
- Share your data securely
- Innovate with analytics
These distinctions enable businesses, people, and governments to use data in new and transformative ways and represent a tremendous opportunity for companies to improve customer insights, connect products and services, and reduce business risk.
Build your data asset
It’s time to think about data as a corporate asset in the same way we think about employees, portfolios, property, etc. With legacy data management systems, storing data comes at a significant cost. You can measure the hard costs of storing data or the soft costs of IT and analyst time spent on activities that aren’t contributing to top-line growth. Data without clear value was discarded, and data that showed clear value often required a significant amount of transformation in order to be queried.
Today, modern platforms like Cloudera Enterprise can retain data of any type and size in its native format for as long as necessary – building a history for future use cases. This means no time or effort is lost on transforming data prior to loading it into your data lake. It also means the full fidelity of the data is available for analysis – every record, every field. So as new use cases are discovered, analysts and data scientists have access to ALL the data and can transform that data over and over again to meet their needs. Put another way, you can take action on data the moment you have it.
Share your data securely
Prior to the advent of Hadoop, businesses kept their data in the same silos where the data originated. Point-of-sale (POS) data remained in the POS system, transactional data remained in transactional systems, and log files were often not retained at all. Today organizations can break down those silos and unify their data. Our customers credit the ability to bring data into a single location with speeding their ability to bring new products to market. See how “Relay Health leverages more data in less time.”
But bringing more data and ostensibly more users to a single source of truth means that data and metadata must be protected against unauthorized access. It must also be easy to track for auditing and compliance purposes, particularly for regulations like GDPR and HIPAA. Read how ADP, which is responsible for paying one in six Americans today, is innovating its business model with data sharing.
In order to take advantage of the power of the data in new ways, it is imperative that organizations take a data-centric approach to business security. This reduces risks associated with data use, provides mechanisms to ensure privacy policies meet compliance, and enables users across the enterprise to access the appropriate data needed for analysis.
Innovate with analytics
Now that we have more data, better access, and comprehensive security and governance, we can turn our attention to innovation. We can continuously analyze data as it is captured, in real-time – and take immediate action. We’re able to better understand consumers, networks, supply chains, etc, so we can look for anomalies (like fraud) and opportunities (point-of-sale promotions).
We can build machine learning models to find hidden patterns in large datasets. Machine learning is enabling business to build more accurate pricing models, detect network intrusions, generate real-time targeted advertising on websites, reach record sales via recommendation engine deployment, and so much more. The final stage of ML is the implementation of autonomous decision making. Check out how Airbnb’s data infrastructure supports data science and machine learning.
This analysis can now be delivered to business through a vast network of applications including via batch (ETL processing of large data sets), through interactive SQL via, Apache Impala, Apache Solr search database and building applications on top of Apache HBase – making insights actionable for business leaders, customer support employees, field technicians, and even consumers on their devices in their native applications.
By exploring data, combining different datasets and analyzing them in innovative ways, businesses are discovering new and valuable insights. Because the data is readily available analysts and data scientists can dig right in and explore. They can ‘fail-fast’ and iterate without fear of falling behind on their projects.
These core changes are the foundation for the best practices laid out in Cloudera’s Center of Excellence (CoE): People and Process framework- a body of knowledge that is rooted in extensive experience, both in building our own Big Data projects and through collaboration with customers across the globe. To learn more, check out, “The Five Markers on your Big Data Journey,” and don’t miss Cloudera Now, your fastest route to data-driven success.