In a recent webinar, “Data-Driven `Customer Support”, Adam Warrington, Senior Manager of Internal Systems Engineering at Cloudera, presented the tools that our support organization uses to quickly solve customer issues. Some of the key points covered in the webinar are:
- Why did Cloudera decide to invest in an EDH and build support tools on top of it?
- What tools we use internally and how they help Cloudera resolve issues quickly.
- What this means for our customers and why it’s important to keep innovating and building our internal tools
In this Q&A blog post, I’ll answer questions that came directly from webinar attendees.
Q1: Over time, how has the investment in support tools helped improve productivity of support engineers at Cloudera?
There are specifically two metrics we think about when measuring our support organizations improved efficiency through tools: Active Work Time and Time to Solve. Active Work Time takes into account how much time it takes for a support engineer to diagnose a case. We don’t simply look at time to resolution we look at how much active work it takes to solve a case. A support engineer might have several cases in their queue and they aren’t working on one case throughout the day. Taking that into account, we’ve seen a 30% reduction in active work time when a COE uses our tool suite and 20% reduction when customers submit a diagnostic bundle.
Q2: Are validations sent only after a customer raises a case or are they proactively collected before that? If yes, how often are they sent.
A validation is a signature of a known issue, that the support team has seen multiple times before, which has been created into a rule that diagnostic bundles pass through when they are sent to Cloudera.That means every diagnostic bundle gets this treatment. Diagnostic bundles are sent through Cloudera Manager and can be sent as often as the administrator prefers (e.g. daily, weekly, monthly, etc…).
Q3: How much infrastructure investments have you made to support all of the support tools?
Let’s start with hardware: we have a 40 node cluster, in-house, that runs within our own data center and powers our support tools. We’ve expanded this cluster to include other data sources, to power other business needs, but it is mainly used by the support team. We have edge nodes that applications run on, to maintain high-availability for certain applications like CSI and and custom processing infrastructure. Our support lab environment is 20 physical servers, with 128GB of RAM each, that are able to host a number of virtual machines to help our support engineers solve customer issues. This isn’t even considering the talented human resources we’ve put into this – hiring dedicated support engineers for Apache Hadoop, project managers, and creating management structures has been one of our most critical investments. All the Hadoop support tools in the world would mean nothing without dedicating these engineers for customers.
Q4: Are you going beyond validation signatures to do Predictive Analytics to predict Failure?
Validation signatures were the first step. We already know the answer to the question. To get beyond known issues, we are starting to explore anomaly detection in log files, this can be as simple as comparing log files of past support tickets to current diagnostic bundles. The issue here, outside of the technical hurdle of being able to find these anomalies, is knowing who to escalate these issues to, as we don’t want our customers bombarded with false-positives or non-concerning issues. We’re still developing the process and tools, but there is a lot of work left to do here.
Q5: In terms of Support tools & engineering, how far ahead is Cloudera compared to rest of its competitors in the Hadoop vendor space?
While the competitive space is always important to consider when operating a business, our support team only has one goal in mind: Drive the time to resolution down, to help our customers solve problems quickly, while also focusing on high customer satisfaction. Our customers know that Cloudera has the fastest and most secure platform for their needs, so we work hard to make deploying, maintaining, and improving their clusters as easy as possible. As long as we keep our focus on making things easy for our customers and asking ourselves “What are we doing for our customers now?”, we know we will continue to lead the market.
In case you missed any part of the webinar, the full recording is now available.