Cloudera to Release First Recursive Hadoop Stack

Categories: Enterprise Data Hub Spark YARN

Cloudera to Release First Recursive Hadoop Stack

Promises ease-of-use of MapReduce and speed of Hive!

PALO ALTO, Calif., April 1, 2015: Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop™, today announced the pending release of Spark-on-Hive-on-MapReduce-on-Spark-on-Oozie-on-HBase-on-Hive-on-Spark-on-Flume-on-HDFS-on-Impala-on-Spark-on-Hive-on-MapReduce-on-Spark-on-Oozie-on-HBase-on-Hive-on-Spark-on-Flume-on-HDFS-on-Impala-on-Spark-on-Hive-on-MapReduce-on-Spark-on-Oozie-on-HBase-on-Hive-on-Spark-on-Flume-on-HDFS-on-Impala-on-…, the industry’s first truly recursive data platform.

A recursive data platform leans on advanced concepts from ‘programming’ and ‘software engineering’ to construct a single, monolithic Big Data infrastructure, according to Todd Lipcon, Computer Guy at Cloudera.

“These are exciting times,” said Lipcon, “ – when we were asked by the product team to engineer a system that had the maturity of Spark, the ease-of-use of MapReduce and the raw, unbridled speed of Hive we had to pull out all the stops, but we worked nights and weekends and really feel we have developed something quite unholy.”

“As you know, the big data community – us included – are quite allergic to the idea of picking a single set of complementary technologies,” said Charles Zedlewski, vice president of All That He Surveys at Cloudera. “What better way to hedge our bets than to have every job, task and query hit every single part of the system? The fact that some queries never complete is seen as a technical detail, and, arguably, a feature.”

“In order to be able to effectively manage a potentially boundless number of services within a Recursive Enterprise Data Hub (R-EDH), Cloudera Manager will itself be made to run on top of YARN or Spark, or Spark-on-YARN, which will of course itself be managed by a CM instance, ad infinitum,” said Dr. Amr Awadallah, Cloudera’s Chief Turtle Officer. “Oozie (running on HBase-on-Hive-on-Spark-on-Flume-on-YARN-on-HBase…) may optionally be used to schedule workflows to launch additional CM instances, and thus whole new R-EDHs within an existing R-EDH. It is our belief that a self-replicating R-EDH is the most effective, nay, only effective strategy to account for the rapidly expanding use cases of the R-EDH,” added Awadallah. “Indeed, though it’s always been ‘turtles all the way down’, there are now exponentially more turtles.”

Cloudera’s new vice president of Global Sales, Vishal Rao, said that he was excited: “The technical details are not yet abundantly clear to me”, he said, “but the licensing revenue possibilities are really compelling. Really, really compelling.”


One response on “Cloudera to Release First Recursive Hadoop Stack

Leave a Reply