Apache Kudu: Top Use Cases for Real-Time Analytics

Categories: Data Warehouse Operational Database

All businesses have assets that depreciate over time. It’s intuitive that the extended use of a given piece of a equipment decreases its value, and we see this in our daily lives with the cars we drive or the old electronics we sell. Data, however, isn’t typically viewed as an asset that loses value over time.

That’s wrong.

While it might not be something the accounting department can expense, business data loses value when the time period in which it could have prompted a better course of action passes. Whether the data tells you to perform predictive maintenance, take advantage of a market opportunity, or even prevent fraud, the window of opportunity to take action on data can be small. That’s why real-time data and analytics are critical to ensuring your business is operating at its full potential.

Apache Kudu, as part of Cloudera Enterprise 5.10, is a new addition to Cloudera which will make running real-time analytics easier than ever before. Previously, there were two common paths to running analytics on real-time data: stitch multiple, specialized open source projects together through complicated architectures, like lambda, or pay for extremely expensive proprietary software. With Kudu, Cloudera can offer real-time analytical capabilities at a price point customers expect from open source technologies. We expect to see an increase in the market demand for real-time analytics as a result.

Thinking back to the point of data depreciating over time, there are a set of use cases which depend on the insight delivered by real-time analytics in order to take action while opportunity still exists. These use cases are time series data, machine data analytics, and online reporting.

Time Series Data

Data that arrives sequentially and delivers a discrete point-in-time measurement is known as time series data. By continuously appending the latest measurement to the historic set of points, we can see trends emerge in the data. Kudu enables that data to be appended in real time, and gives us the ability to run analytics on the data. That analytical capability can help pivot time series data sets from post-mortem data – analyzing what went wrong after it happens – to predictive data that enables action before an adverse event occurs.

Examples: Market data streaming, internet of things (IoT), connected cars, fraud detection/prevention, risk monitoring

Machine Data Analytics

Machine data analytics refers to the data that your network, computers, and users are generating as they go about their daily business. In the best of times, this information is mundane. However, in stormier weather, it can create a map that leads to bad actors, bottlenecks in your infrastructure, and potential problems with your enterprise apps. With Kudu, real-time analytics puts this map in your hands early, providing a guide to “what’s happening” as opposed to “what’s happened”.

Examples: Network threat detection, network health monitoring, advanced persistent threats (APT), cybersecurity, application performance monitoring

Online Reporting

Online reporting – such as an operational data store – has traditionally been limited by data volume and analytic capability. Keeping long histories of data was prohibitively expensive, and analytical capabilities were the domain of data warehouses. However, with Kudu, online reporting can now be real-time, store all historic data with complete fidelity, and provide analytical analysis.

Example: Operational Data Store

In summary, Kudu expands the functionality of the Hadoop ecosystem by providing relational storage for fast analytics on fast data. This opens up the ability to do specific use cases in an easier and more broadly-implemented manner. Furthermore, it rounds out the full set of storage options available from Cloudera, which now includes HDFS, Apache HBase (NoSQL), Kudu (relational), and cloud-based object storage. This enables clients to easily move between the storage type their use case demands without the need to retrain users on the platform the data resides within.

Additional Resources for Apache Kudu


Leave a Reply