Apache Kafka is one of the hottest new projects in the Apache Hadoop ecosystem, providing easy and flexible data ingestion at Hadoop scale. Originally started at LinkedIn, Kafka has become the tool of choice for customers looking for a high throughput, publish-subscribe messaging system and is now an integrated and supported part of Cloudera’s platform.
Kafka’s ability to handle fast data ingestion at scale, with the flexibility to open that data up to diverse tools and use cases – including real-time streaming workloads – extends the possibilities of Cloudera’s platform. This integration makes it ideal to build cost-effective, end-to-end workloads for a variety of critical use cases – including fast data in conjunction with Apache Spark Streaming and/or Apache HBase – all within a single system.
Some key use cases for Kafka that we are seeing include:
Log Aggregation – collecting logs from multiple services and making them available in standard format to multiple Hadoop components
Messaging – leverages high-throughput and fault-tolerance for a good solution for large scale message processing applications
Customer Activity Tracking – tracking website and mobile user activity for real-time monitoring and offline processing
Operational Metrics – logging operational monitoring data to a centralized feed for alerting and reporting
Stream Processing – in conjunction with Apache Spark Streaming, performing on the fly aggregations and scoring for real-time models
Event Sourcing – changing states are logged as a time-ordered sequence of records
With the launch of Cloudera Labs last year, we introduced Kafka as one of the “charter members.” Cloudera Labs is a virtual container for innovations being incubated within Cloudera Engineering, with the goal of bringing more use cases, productivity, or other types of value to developers by constantly exploring new solutions for their problems. By incubating Kafka on Cloudera Labs, we were able to offer it to users interested in testing and developing Kafka workloads within CDH. This incubation period also allowed both our engineers and other community members time to work on making Kafka a more mature tool, with the production-ready capabilities needed to deploy it as part of an enterprise data hub.
Since its inception, Kafka’s popularity has driven it to sustainable quality and has sparked the interest of our users across multiple industries, especially those experimenting with or implementing real-time or fast data workloads. This fast-growing maturity is why Kafka is now the first graduate from Cloudera Labs and an integrated part of Cloudera’s platform.
This maturity is also why we already have many customers deploying Kafka in production on Cloudera’s platform, across a variety of use cases and industries, leveraging Cloudera Support since last year. Support for Kafka is also available through Cloudera Enterprise, Cloudera’s production-ready Hadoop platform. Not only can customers take advantage of the world-class support and engineering expertise offered, but Kafka also integrates with the comprehensive security and governance, zero downtime administration, and broad partner ecosystem offered through Cloudera Enterprise.
For more information on how to get started with Kafka and Cloudera, read “How to Deploy and Configure Apache Kafka in Cloudera Enterprise” or download it now.