In September of 2015, Cloudera announced an innovative new storage engine for fast analytics on fast data: Kudu. The project is a response to users and customers that were stitching together hybrid architectures to draw out the benefits of Apache HDFS and Apache HBase while minimizing the gaps in those projects. HDFS can scan a large amount of data quickly, but wasn’t designed to handle updates. Conversely, HBase was designed to provide fast and random reads and writes for continuously updating data, but this slowed scan performance. Apache Kudu, unlike either of its storage predecessors, brings strong performance for both scan and random access.
The ability to bring together the fast analytics on fast data will enable a broad range of use cases that rely on a simultaneous combination of sequential and random reads and writes. This includes time series use cases, machine data analytics, and real-time reporting critical to IoT use cases. We expect to see Kudu provide the capabilities for real time offers, fraud detection, risk monitoring, and real time threat detection among other uses.
Since Kudu was accepted into the Apache Incubator in late 2015, its developer community has grown significantly. In addition to contributions from several companies (including Intel, Xiaomi, RMS, and Argyle Data) and over 45 developers, the Kudu beta release is producing strong results with hundreds of users. The maturing code base and attainment of ASF milestones has culminated with Kudu’s graduation from an incubating ASF project into a top level project. This is exciting news for the project, but more importantly, for the current and future users of Kudu!
Cloudera’s Todd Lipcon, the founder and architect of Kudu, will serve as its vice president. In that role, he’ll continue to guide the project out of its beta phase and into an eventual 1.0 release.
Kudu is the latest in a long line of Apache projects that have been founded/cofounded by current or former Cloudera employees, including Apache Avro, Apache Crunch, Apache Flume, core Apache Hadoop, Apache HBase, Apache Hive, Apache Parquet, Apache Sentry, Apache Solr, Apache Sqoop, and Apache ZooKeeper. Cloudera’s commitment to the open source software community, and the wide adoption of the projects it has founded, has resulted in a set of open standards that unite the open source software community.
Let’s celebrate this Apache Kudu milestone together while we eagerly await the 1.0 release.