New Open-Source Service Enables Apache Spark Development

Categories: Product

Eight months ago we announced a bold initiative.  A claim that Apache Spark would replace MapReduce as the de facto data processing engine in Apache Hadoop.  In order to accomplish this some work had to be done, both in the community and in our labs at Cloudera to strengthen the enterprise capabilities or Spark and also ensure that Spark gains proper access with its enhanced role in the Hadoop ecosystem projects.  We were happy to update everyone recently on the progress made in our One Platform Initiative and today we are excited to announce a project we have launched in aim to ease spark consumption and enable end user applications to easily consume Spark.

Apache Spark has evolved quickly beyond a conversation topic between application architectects to a critical enabler of real-time and interactive use cases.  Spark has opened up never before imagined scenarios in the data science, genomics, physics, and healthcare delivery communities where rapid iteration of hypothesis is paramount.  In the greater context of Hadoop, Spark’s role has been impactful in elevating the processing and exploration of data which opens up new use cases and compelling architectures.  

At Cloudera, we continue to try to address the hardships that developers and data scientists have in building models or applications and putting them into production.  We have seen some impressive advancements both inside the Spark community as well as the adjacent coding ecosystems in this area.  One area of focus is easing the burden of a sometimes convoluted developer experience either due to the architecture or the overall execution experience.

Livy is an apache licensed open-source project available from github in early alpha release developed by the teams at Cloudera, Intel, and Microsoft.  Livy is a web service for long-running spark contexts directly on a cluster.  Livy also enables fine grained job submission and result retrieval over a simple REST based interface.  Resources can be leveraged via YARN and Livy provides multi-client functionality.  Livy features enterprise capabilities around fault tolerance, multi-tenancy, and user isolation making it an ideal solution of active online applications.  Besides being an intriguing new project for developers, Livy has potential implications for enhancing Spark as a back-end solution for applications.  Below are some of the new benefits that Spark applications developers should expect.

  • Reduced Friction of Spark Consumption –  Previously a task only suited for specialized developers, an api endpoint allows application developers direct access into Spark’s resources. No need to go through a Spark installation or configuration process to get started. Only a lightweight client that talks to an HTTP endpoint is needed.  This allows data teams to build products that can be offered up to the business to better scale their ability to leverage Spark.
  • Enabling End-User Applications to Use Spark – Applications can build with REST-based client APIs in Java, Scala and Python for fine grained Spark job submission, result retrieval and management of Spark contexts. Spark can be invoked by applications written in diverse application development frameworks.
  • Enabling of New Architectures – Livy reduces much of the repeated infrastructure involved in productionalizing Spark jobs. It can help run ad-hoc Spark jobs, but it is especially helpful with sharing amongst multiple jobs.  This can help automate tracking and serialization of job results, ease deployment and monitoring, and get job input validation easily.  Livy makes it easy to integrate Spark into service oriented- or microservices-based architectures.

Increasing application developer access and usability of Spark and it’s components are a strategic way to uplevel Spark’s role in the Apache Hadoop ecosystem.  Livy and subsequent initiatives are great examples of how groups inside of Cloudera and out in the open-source community are focusing on the real world problems that data scientists and developers face when trying to leverage new technologies like Apache Spark and Hadoop.  Our hope is that we can turn these everyday heroics and convoluted workarounds into a more comprehensive developer experience.  Learn more about Livy


Leave a Reply