The deluge of data from new sources and in new formats initially motivated the momentum towards Hadoop in the enterprise. However, the most successful Big Data organizations have expanded beyond the fundamentals of batch processing with MapReduce to analyze and solve real-world problems using the broader set of tools in an enterprise data hub.
Much has been made of the unique benefits offered by a central active archive of multi-structured data at the center of a modern data infrastructure. However, the ballyhooed ‘data lake’ or ‘data reservoir’ can easily inherit some of the same challenges that limit a traditional data warehouse or data mart. Despite the size and variety of the data, a single use case or tool—no matter how robust—can manifest as blinders to the full, game-changing potential of Hadoop in the enterprise.
Building Big Data Applications
In contrast, developing end-to-end applications that incorporate multiple tools from the Hadoop ecosystem is the first step towards a convergence of the disparate use cases and analytical capabilities of an enterprise data hub. Hadoop can become a complete, business-relevant solutions platform and the hub for any number of processing engines and analytical tools to customize an information value chain: capturing and ingesting more data, determining the appropriate file format for storage, transforming and processing the stored data, and presenting the results to the end-user in an easy-to-digest form.
In the right hands, an enterprise data hub provides a rapidly expanding variety of native and third-party tools that can be integrated into the Hadoop stack and scaled towards more advanced analytics and a consolidated 360-degree view of the organization, customer, and market.
Whereas MapReduce code primarily leverages Java skills, full-scale Big Data engineering requires developers who can work with multiple tools at once and drive projects that transform the business. A true Big Data applications developer can deploy tools like Impala and Spark, leverages the Kite SDK, write customizations with user-defined functions, and create a user interface with Hue.
A New Class of Developer
Cloudera’s new four-day training course, Designing and Building Big Data Applications, targets aspiring enterprise data hub professionals who want to use Hadoop and related tools to solve real-world problems. Through instructor-led discussion and interactive, hands-on exercises, participants navigate the ecosystem, learning topics such as:
- Creating a data set with Kite SDK
- Developing custom Flume components for data ingestion
- Managing a multi-stage workflow with Oozie
- Analyzing data with Crunch
- Writing user-defined functions for Hive and Impala
- Transforming data with Morphlines
- Indexing data with Cloudera Search
Attend the Webinar and Get Trained
We’re hosting a webinar introducing Designing and Building Big Data Applications on Thursday, April 24, at 10 AM PT / 1 PM ET. You’ll hear more about the course’s objectives, outline, prerequisites, and technical and business benefits, including a portion of the full training, plus Q&A with the lead curriculum developer. Register now!
You can also enroll in the full Big Data Applications course by visiting Cloudera University. Public classes start in May and are currently scheduled in Redwood City, Columbia, and London, with more class dates coming soon. Private training for your team is also available at your location and according to your schedule, so contact us for more information.