Category Archives: Spark

Simplifying Big Data in the Cloud

Categories: Cloud Data Engineering Product Spark

In recent years, as public cloud adoption has accelerated and customers have started looking towards cloud for large-scale data workloads, we sought to reimagine how to most effectively offer Cloudera capabilities in the cloud. Our customers wanted to understand how to leverage the agility, scale, and ease-of-use offered by the cloud to efficiently and cost-effectively…

Read More

What To Consider When You’re Considering Cloud

Categories: Analytic Database Cloud Data Engineering Data Science Operational Database Spark

In a blog posted earlier this week, my esteemed colleague Sean Anderson laid out a powerful argument for machine learning (ML) as a way to fuel recommendation engines, churn reduction engines, and IoT workflows. Leveraging components like Apache Spark, and its machine-learning libraries, data scientists are able to design and train complex models using troves…

Read More

Apache Spark Market Survey (Part 1 of 2)

Categories: Spark

As an IT industry analyst (and former technical product manager), I’m always fascinated with how enterprises large and small adopt new technologies. What does it take for a new solution to not only present a compelling opportunity, but also prove itself ready for prime time? What separates out the eventual market dominating solution from all…

Read More

Enhanced Streaming and Machine Learning with Apache Spark 2.0

Categories: Spark

Apache Spark has risen to be the taster’s choice of high-scale distributed computation and solidified itself as the de-facto processing engine in the Apache Hadoop ecosystem. In fact, recently Curt Monash of DBMS2 wrote, “The greatest use for Spark seems to be the same as the canonical first use for MapReduce: data transformation.” But the…

Read More