The Pathway to Multi-Million Dollar IT Savings
Every 30 terabytes of new data warehouse capacity required can cost an organization as much as $2M to $4M in additional investment. IT budgets for analytics have grown into the millions—even for medium-sized businesses. But companies that make the move to Hadoop before running out of capacity in their enterprise data warehouses clearly stand to save millions of dollars.
Today, large organizations are in various stages along the road to Hadoop, the most cost-effective method to store and analyze data. Companies furthest along that path – those that have made the move from sandbox to limited rollout to full production – are reaping the benefits and huge cost savings.
Any IT leader who can spearhead this effort to add Hadoop to their traditional data warehouse technologies will save their company millions. Given that businesses have been focused for years on running more and more lean, the opportunity to ‘save your company millions’ is a rare feat and truly an opportunity for a savvy IT professional to be a hero.
Are you a potential ‘hero’ in the making? Ask yourself:
- Do I have multiple business intelligence tools running on at least one large data warehouse?
Usually one large database over 40 terabytes called ‘the data warehouse’ with a handful of other large systems almost as large which may be called ‘marts’ or ‘operational’ but also support analytics. If you only have a few terabytes of data, Hadoop might still be for you, but usually only if you have some data features known as ‘variety’ such as unstructured text data, social media data, image files or other unique aspects.
- Do I have ‘Big Data’ syndrome?
Each year, organizations say they cannot remove any data history. They end up licensing more external data, have new internal data, or acquire companies that have their own Big Data issues. Data growth continually meets or exceeds annual projections.
- Is my data warehouse continually needing more space?
Traditional systems such as appliances and large relational database platforms are performing slowly or running out of space more quickly than planned. The rapid growth means adding to these systems sooner than IT had originally projected.
- Am I wondering why my IT budget is never enough?
The cost per terabyte of traditional data warehouses seems to be getting much lower. However, organizations are spending just as much or more than ever on their analytic platforms.
If you answered yes to most or all of the questions above, you are not alone. In fact, you are in the majority. You have the opportunity to save your company millions. But what are the steps to get to that point? First, determine which milestone you have reached on the path to Hadoop:
Milestone 1: Your organization has downloaded Hadoop and played with it.
Milestone 2: Your team has created a Hadoop ‘sandbox’ with one or more projects running. Multiple staff access it, are learning Hadoop and associated tools, and engage in active dialogues on how it can be used in the future. Some of the team think it should only be used for ‘new projects’, while others want a broader, more economical approach.
Milestone 3: Your firm has selected a commercial vendor, licensed Hadoop, and has some plans that are firming up. Executive sponsorship (and expectations) exist.
This milestone is where a lot of companies get stuck. They still have fledgling projects and only one or two easy-to-identify candidates since those projects use ‘variety’ data that does not fit well into traditional databases. Hadoop lingers between five and twenty five nodes with one or two clusters. How can you stop lingering at this spot and forge ahead?
Milestone 5: You have grown to a 200 node, multi-cluster system. Core analytical systems have been migrated to Hadoop. These Hadoop systems are in production, and payments for analytical platforms have dropped for the first time in recent years. Inside the company, one name—maybe two—are being hailed by executives with praise. Your path is clear to move ahead and succeed in the organization.
For the companies who made the jump from Milestone 3 to Milestone 5, what did they do? What is Milestone 4?
Milestone 4 includes the step of ‘currently undergoing a migration of existing structured data in appliances and traditional systems over to Hadoop’. IT teams conduct an in-depth review of the most expensive analytical systems and re-platform them to Hadoop at one fifth to one tenth of the cost. During the analysis, most companies find that the most expensive platforms are those hosting structured data with a lot of history: the data warehouse and large marts.
Companies usually discover that a typical Hadoop environment that replicates a third to half of the storage of their traditional platforms requires somewhere between 100 or 200 nodes. Interestingly, this is why many firms that have made the move to Hadoop jump from environments with 5 to 20 nodes to 100-200 nodes—or even more, without a lot in between.
In How I Saved My Company Millions—Part Two, we’ll discuss the steps to get to Milestone 4 and 5, how to start saving now and ways to get there without a lot of drama.