Data is capital. But it’s often hard to know ahead of time which algorithm, analytic or app will create the most value from a particular piece of data. And as the variety of both data capture and use increases, it’ll just get harder. Companies will increasingly rely on data liquidity – the ability to get the data you want into the shape you need with minimal time, cost and risk.
Transforming data from one shape into another is nothing new. But this work – ETL – often runs on dedicated servers or on the same hardware running the data warehouse. Moving it into an Apache Hadoop environment lowers the cost and often improves the overall performance.
However, this is only the most basic way to increase a company’s data liquidity. The real power comes from making big data management, integration and analytics based on Hadoop, NoSQL, and relational databases, work seamlessly as a single system.
Take a large wireless network operator. It captures phone configuration data in a large Cloudera Enterprise Hadoop cluster on the Oracle Big Data Appliances. It also captures data about every call hand-off between cell towers in a large Oracle 12c data warehouse on Oracle Exadata machines. With Oracle Big Data SQL, the wireless company can write a single query to ask which phone configurations have the least call drops. A query optimizer franchises pieces of the query to execute locally where the data lives, even though the analyst doesn’t know which data is in which system. They act as one.
Now picture a consumer packaged goods company that wants to know how it can improve new product launches. This is a general problem, not a specific pre-determined question. The factors that affect how many units of the new product it’ll sell include characteristics of the product, the customers, the stores, even the overall economy. Plus, the best data on how the launch is going comes from the sales transactions of the launch itself. But to use it, you have to act fast.
This is where discovery tools, like Oracle Big Data Discovery, that examine data landing in Hadoop from enterprise and third-party systems, make such a difference. With visual interfaces for finding data sets in Hadoop, profiling and transforming them, and then exploring the newly created mash-up, this is data liquidity brought to its logical conclusion – spontaneous demand for fast answers to entirely new questions. This is only possible if analysts can immediately and easily get the data they want into the shape they need for the question at hand.
One last example of data liquidity comes from data science. Statistical packages, like open-source R, become even more powerful when their scripts run on highly parallelized computing infrastructure. This could be either a Hadoop cluster or a data warehouse, depending on where the data lives.
Now, take it one step further and add a production system that runs algorithms based on the data scientists’ scoring models to check for fraud, make cross-sell recommendations, or manage robotic inventory systems. Here, the key is not only to get the necessary data into the proper shape to re-fit the algorithm’s model, but to get the new, improved algorithm into production as quickly as possible.
These techniques can be used simultaneously. One of the wonders of data capital is that unlike financial capital, a given piece of data can be used in multiple algorithms, analytics, and apps at once. A single big data environment that unites Hadoop, NoSQL, and relational technologies reduces the time, cost and risk of repurposing data, helping companies create additional value from data capital.
Paul Sonderegger is Oracle’s big data strategist. He works closely with executive teams in large organizations to help them understand how data capital changes the way companies compete. He also helps Oracle’s big data product teams use real-world examples to influence future product directions. Sonderegger is a highly sought-after speaker on the intersection of competitive strategy and data capital, and is a contributing author at Forbes.com and DataInformed. Prior to joining Oracle, Sonderegger was chief strategist at Endeca, a discovery analytics company. Before Endeca, he was a principal analyst at Forrester Research, specializing in search and user-experience design. Sonderegger has a BA from Wake Forest University.