Big Data Economics

Categories: Enterprise Data Hub General

Analytic groups within an organization have a bewildering array of tools and technologies available to them, from relational databases, search plus advanced statistical modelling, to name a few. The majority of these tools have been designed to allow one or more types of questions to be answered easier than before, yet these organizations are still facing the same fundamental problems:

  • Analytic Backlog – there is a list of analytics that need to be created, and only the high priority ones appear to get completed. This leads to frustration by the analytic consumers, causing them to find other ways to solve their problems.
  • Optimized for a few question types – most Business Intelligence (BI) organizations have an array of tools, the proficiency of the organization and the common tools cater for a narrow set, as such only the common types of questions can be quickly answered.

What if your BI organization can reduce this backlog, by opening the data to more resources while increasing the number of questions types that can be answered?

Juxtaposed to the BI group is the organization as a whole, in many cases the questions they need to answer are getting more complicated, through both internal and external drivers, that may include, some of the following:

  • Threat to sales – due to rapid changes in their respective markets, or lack of traditional optimizations left
  • Cyber Threats – that change behavior as they understand and react to the traditional rules an organization puts in place to protect itself.
  • Mergers and Acquisitions – ability to identify targets and rapidly react during the M+A process, to minimize customer loss and optimize market share
  • Changes in the markets – seismic such as Brexit and the rapid fluctuations in Oil prices and/or currency.

By mapping this conflict on a graph, it should be possible to start to analyze where the problems are and how to address them, this is shown in the chart below


Each point on the graph represents a question, or set of questions can be answered, with a value based on the vertical position and the appropriate cost of answering that question, based on the horizontal position. For each question there is an associated risk of low, medium and high, this represents whether the organization can guarantee they know the outcome. This demonstrates that high cost, high value questions typically are risky to answer as the outcome cannot be guaranteed.

Two observations about this graph, high value answers normally require a higher investment, but are not guaranteed to get the result required. For instance during a merger a lot of IT investment is required, but it might not give the expected increase in revenue. As the IT investment is high to answer these questions, the chances are they will not be asked, or they will be answered subjectively.

Organizations are inherently looking for a chart similar to the one below, where really the cost of each question is drastically reduced, essentially allowing more questions to be answered and the break-even for each becomes a lot lower. As an illustration one customer stated

“I have 50+ data products that I can potentially monetize, but I know that at least 50% of them will fail”

The problem here is that the customer did not know which 50% would succeed, if they did it would be easy, but the cost has to spread across all the products.


The costs that we have per analytic, are:

  • Data storage costs and processing time
  • Labor and organization and development costs associated with developing new ETL flows etc
  • Analytic creation

Of these the labor and organization costs are significantly more than the rest, in fact it is made worse as divisions/groups within an organization will want their own view of the data, normally in their own datastore. One data point of this cost is a data-scientist told me that for every 6hrs of data-science, requires 3-6 months of preparation.

An a example of how an organization actually adds to the processing cost is, the security team may require netflow and user-behavior information for security, similarly the IT organization will require the same data to determine application efficiency. These same datasets can then be further used by marketing to determine if outages in the network could have negatively impacted a customer and thus will need to provide some goodwill coupon or offer.

By landing all the data into a single cluster where it is available to all of these users and more, it is possible overtime to reduce the IT costs of a single analytic, by amortizing the IT costs across all the end-users, marketing, security, IT etc it should be possible to build cheaper analytics for the organization. Each group may still have their own view and will probably create derivatives specific to their needs, but work is now on the business value, not on the common processing that each group can use.

To conclude, using traditional techniques organizations are still limited by the amount and types of questions they can answer, unless they interact with their analytic factory in a different way. By implementing a strategy that allows the implementation costs to be amortized across multiple analytics it will be possible to fundamentally change the way the analytic team can serve the business and the time that they can respond. This will then allow more questions to be answered, drastically reducing analytic backlog and the variety of questions that can be answered. If this is practiced an organization can gain analytic agility that will lead to the ability to rapidly react to any threat or opportunity.


Leave a Reply