This post was written by Neil Raden; distinguished analyst for Hired Brains Research
In my last post, I raised the issue that for “pervasive analytics” or “the democratization of analytics” to be successful, it requires much more than just technology. Most prominent is a lack of training and skills on the part of the wide audience that is expected to be “pervaded” if you will. The shortage of “data scientists” is well documented, which is the motivation for pushing advanced analytics down in the organization to business analysts. The availability of new forms of data provides an opportunity to gain a better understanding of your customers and business environment, which implies a need to analyze data at a level of complexity beyond current skills, and beyond the capabilities of your current BI tools.
Much work is needed to develop realistic game plans for this. In particular, our research at Hired Brains shows that there are three critical areas that need to be addressed:
- Skills and Training: A three-day course is not sufficient and organizations need to make a long-term commitment to the guiding of analysts. Further commitment to ongoing training is needed
- Organizing for Pervasive Analytics: Existing IT relationships with business analysts needs reconstruction and senior analysts and data scientists need to supervise the roles of governance, mentoring and vetting
- Vastly upgraded software from the analytics vendors: In reaction to this rapidly unfolding situation, software vendors are beginning to provide packaged predictive capabilities. This raises a whole host of concerns about casual dragging of statistical and predictive icons onto a palate and almost randomly generating plausible output, that is completely wrong
Skills and Training
Of course it’s unrealistic to think that existing analysts who can build reports and dashboards will learn to integrate moment generating functions and understand the underlying math behind probability distributions and quantitative algorithms. However, with a little help (a lot actually) from software providers, a good man-machine mix is possible where analysts can explore data and use quantitative techniques while being guided, warned and corrected.
A more long-term problem is training people to be able to build models and make decisions based on probability, not a “single version of the truth.” This process will take longer and require more assistance from those with the training and experience to recognize what makes sense and what doesn’t. Here is an example:
The chart shows a correlation between a stock market index and the number of times Jennifer Lawrence was mentioned in the media. Not shown, but the correlation coefficient is a robust 0.80, which means the variables are tightly correlated. Be honest with yourself and think about what could explain this? After you’ve thought about a few confounding variables, did you consider that they are both slightly increasing time series, which is actually the correlation, not the phenomena themselves?
The point here is one doesn’t need to understand the algorithms that create this spurious correlation, they just need enough experience to know that you have to filter out the effect to the time series.
The fact is that making statistical errors is far more insidious than spreadsheet or BI errors when underlying concepts are hidden. Turning business analysts into analytical analysts is possible, but not automatic.
Consider how actuaries learn their craft. Organizations hire people with an aptitude for math, demonstrated by doing well in things like Calculus and Linear Algebra, but not requiring a PhD. As they join an insurance or reinsurance or consulting organization, they are given study time at work to prepare for the exams, and have ample access to mentors to help them along because the firm has a vested interest in them succeeding. Being an analyst in a firm is a less extensive learning process, but the model still makes sense.
Organizing for Pervasive Analytics
(aka How should organizations deal with DIY analytics?)
We’re just beginning our research in this area, but one thing is certain: the BI user pyramid has got to go. In many BI implementations, the work fell onto the shoulders of BI to create datasets, handful of “power users” worked with the most useful features of the toolsets and the remained of the uses, dependent on the two tiers above them, generated simple reports or dashboards for themselves or departments. Creating “Pervasive BI” would have entailed doing a dead lift of the “business users” into the “power user” class, but no feasible approach was ever put forward.
Pervasive analytics cannot depend on the efforts of a few “go-to guys,” it has to evolve into an analytically-centered organization where a combination of training and better software can be effective. That involves a continuing commitment to longer-term training and learning, governance of models so that models developed by professional business analysts can be monitored and vetted before finding their way into performance and just a wholesale change the analytics workflow -> where do these analyses go beyond the analyst?
Expectations from Software Providers
Pre-packaged analytical tools are sorely lacking in advice and error catching. It is very easy to take an icon and drop it on some data, and the tools may offer some cryptic error message or, at worst, the “help” system displays 500 words from a statistics textbook to describe the workings of the tool. But this is 2015 and computers are a jillion times more powerful than they were a few years. It will take some hard work for the engineers, but there is no reason why a tool should be able to respond to its use with:
- Those parameters are not likely to work in this model; why don’t you try these
- Hey “Texas Sharpshooter”, you drew the boundaries around the data to fit the model
- I see you’re using a p-value but haven’t verified that the distribution is normal. Shall I check for you?
Neil and Cloudera continue to collaborate on the industry shift toward a pervasive analytic culture. Pervasive Analytics requires a new approach to data access and optimizing and mobilizing insight creation. Access the full whitepaper series from Neil on The Enterprise Data Hub, A Next Gen Platform