It’s been almost exactly a year since Cloudera launched Apache Kafka as a fully supported component of Cloudera Enterprise. If nothing else, the last twelve months of Kafka’s history are a poignant reminder of the blistering pace at which new open source projects can achieve prominence and near ubiquity within big data technology stacks. Fittingly, Kafka was one of the charter members of Cloudera Labs, our incubator for new technologies within the Apache Hadoop ecosystem. Since then, Kafka has become both a first-class citizen within Hadoop and a near-essential ingredient in the stream processing architectures now dominating big data IT initiatives in 2016.
Today, we’re launching Kafka 2.0, together with Cloudera Enterprise 5.5.2. While Kafka remains a young technology in the now 10-year-old Hadoop ecosystem, it has unequivocally reached the point of being enterprise-grade software, suitable for mission critical deployments. For me, Kafka reaching this milestone is validated by the huge demand we see from Fortune 1000s and other organizations wanting to put Kafka into production within weeks of our planned release, which brings the critical security and multi-tenancy features that we’d expect see in a version 2 release.
Kafka’s pedigree as the ‘consumer grade’ data collection infrastructure developed at LinkedIn (where it processed over 800 billion events per day), gave it a wide berth to be deployed by R&D groups and in IT pilot projects. But it’s the real-world deployments of Kafka by our production customers where the proverbial rubber meets the road; where delivering a message on time and to the right place can literally mean the difference between life and death. Whether it’s Cisco detecting fraud in WebEx, Cox Automotive monitoring critical IT infrastructure, or Cerner Health detecting dangerous blood infections and saving hundreds of lives in the process, Kafka continues to prove itself in data pipelines where continuous, reliable, low-latency data processing is a hard requirement.
In these sensitive contexts of brokering customer data from the network edge, between data centers, or from devices in the field, Kafka’s new abilities to protect data-in-motion with encryption and strong authentication become a critical need of our new enterprise customers, and eliminate the need early adopters had to develop bespoke security perimeters around Kafka. Kafka’s new multi-tenancy and high availability features, like client throttling, rolling restarts, and secure ZooKeeper integration, enable IT departments to deploy with confidence and let application development groups fire data at will into a single, shared infrastructure, monitored by Cloudera Manager. Finally, Kafka’s deepening integration with the rest of Cloudera Enterprise, including Apache Spark, Apache Flume, and Cloudera Navigator means that stream processing, log ingestion into HDFS, and Hadoop governance functions like auditing can all benefit from Kafka’s excellent performance and reliability guarantees immediately, with minimal or zero code.
Kafka’s ubiquity within a certain class of Internet-of-Things and stream processing systems may soon give way to a broader role in replacing traditional message-oriented middleware as a general rule, and to becoming a given in standing up reliable Hadoop ingest pipelines. Time will tell to what degree Kafka becomes the backbone of the big data highway, but if the evolution of Hadoop is a guide, the leading consumer internet, telecommunications, healthcare, and gaming companies will soon show the rest of the IT world how to break down scale barriers, lower cost, and enable a new level of responsiveness with near-real-time big data applications that drive their business forward, one message at a time.