Fall is a special time of year for North American professional sports. Autumn gives us the MLB playoffs, the start of the NHL and NBA seasons, as well as the heart of the NFL season. It’s a terrific time to be a fan (particularly if your favorite team happens to have a cyborg named Madison Bumgarner).
While the fall gives us a buffet of sports viewing to choose from, the juxtaposition of the analysis available for each sport has grabbed my attention recently. Why does baseball have an endless set of statistics for players and teams? If each NFL snap has 22 actors, why does each play have only a small handful of data points recorded? Why is it difficult to tell the extent to which NHL MVP Patrick Kane’s stats were inflated because teammate Jonathan Toews’s line drew top defensive pairings? Why haven’t we seen a Billy Beane, Bill James, or Nate Silver in the NBA?
The answer isn’t obvious at first, but it comes down to the simple concept of measurability, which is broadly affected by a) how discrete the objectives are, b) what percentage of play is “on-the-ball”, and c) how important individual contributions are. In baseball, the most data-driven sport, we find the perfect storm for analytical analysis. The objectives are nearly always consistent (offense avoids outs, defense pursues outs), all action happens on-the-ball (the defense can’t make a play without it), and the crux of the game is 1v1 (pitcher vs batter). That structure yields a game where nearly everything is measurable, with the challenging last frontier of individual defensive statistics quickly being addressed as teams seek competitive advantages.
Baseball stands in sharp contrast to American football. In football, the objectives vary for both offense and defense based on down, distance, score, and time. On-the-ball success may be attributed to a quarterback or running back, but that success is built on the off-the-ball work of 10 other teammates whose contributions are scarcely recorded. Pittsburgh Steelers running back Le’veon Bell may rush for a 100-yard game, but it’s difficult to assign fractions of that success across teammates and coaches.
The ambiguity of American football statistics has upside for the smartest teams – over the past 20 years, just 6 teams have accounted for 75% of championships. Clearly, better data analysis in player acquisition as well as on-field strategy can pay sustainable dividends when data analysis is complex. In contrast, absent a hard salary cap, the end-game of the baseball data revolution may be an efficient market where smart big-market teams outspend smart small-market teams, with the only noise being injuries and luck. For proof, look no further than Theo Epstein, former GM of the Boston Red Sox, currently GM of the NL favorite Chicago Cubs.
When I look at the broader world of business data, I see a quite a few parallels to the sports world and the ability to derive competitive advantages. On-the-ball measurements are easy to track because they’re generated in a clean, consistent way. To me, that bares a similarity to internally-generated business data. Whether the data relates to inventory, regional sales, manufacturing costs, etc., it’s clear and easy to measure because the data is black and white and generated on-site. At the opposite end of the spectrum, off-the-ball data reminds me of data external to the business, which is harder to incorporate into decision making and can be more complex. Market trends, social sentiment, political environment, weather, etc. would fall into this category – things that are “off-the-business”, but impact it nonetheless. As with sports, cutting edge businesses find a way to harness this more abstract and put it to their advantage – such as a trucking company bringing traffic and weather data into real-time route calculations.
Another hot topic in the landscape of data management is the concept of structured and unstructured data. Structured data is like the data of a baseball team; it can fit into pre-defined rows and categories and is easily measured, like profit and loss. Unstructured data refers to data that isn’t organized in a pre-defined manner. It can be social data, sensor data, media files, images, etc. Unstructured data brings large volumes of data to the table, and the analysis of it brings to mind the concept of team performance – the unstructured data tells the story of how your structured data came to be. Analysis of a sensor-enabled manufacturing line could show that a perceived weakness in one widget is actually caused by the misinstallation of another, with an error-prone manufacturing process affecting profit. Similarly, a poor selling product that hurts revenue and affects inventory may have all the right features, but social analysis may show that something as simple as color scheme may be off. In American football, you might see the offensive line blamed for quarterback sacks, where in reality receivers may not be shedding their defenders fast enough to provide a target for the quarterback in a reasonable amount of time. For teams and companies that can drive to the core of complex issues quickly, there is again a sustainable advantage to be had.
Rather than plot industries along these axes, I’ve chosen to plot data types because this chart can provide insight into what any company in any industry can do to start to develop a competitive advantage through data. The upper left quadrant is how business has been done for most of time – keep good records and try to optimize your business based on those records. However, as businesses move into the remaining three blue quadrants, they’ll find more and more opportunity to drive revenue, reduce costs, and mitigate risk…the business equivalent of better offense, better defense, and healthier players.
While deploying a robust data strategy might not get you a victory parade in your hometown, it will certainly give your employees and shareholders a reason to celebrate.