Cloudera and Intel have recently joined forces to focus on a single converged Apache Hadoop-based platform – CDH. Security continues to be a top priority for the Hadoop teams from both organizations and Project Rhino is a focal point for their collaboration in the open source community.
While Hadoop does provide several security mechanisms covering authentication and authorization, enterprises often require greater assurance of data protection including encryption of data at rest and in motion, role-based access control (RBAC), and other features required for compliance and data governance.
To catalyze the development of a comprehensive security framework for data protection in Hadoop, Intel launched Project Rhino in early 2013 as an initiative with several broad objectives:
Provide encryption with hardware-enhanced performance
Support enterprise-grade authentication and single sign-on for Hadoop services
Provide role-based access control unified across multiple components in Hadoop and higher granularity such as cell-level granularity in Apache HBase
Ensure consistent auditing across essential Apache Hadoop components
Project Rhino has already achieved several important objectives. For example, efforts within Project Rhino contributed key security features to HBase 0.98, providing cell level encryption and fine grained access control. As a result, HBase 0.98 is now at feature parity with Apache Accumulo in terms of security.
Meanwhile, in the summer of 2013, Cloudera released software to open source that became the basis for the Apache Sentry project (incubating), which has garnered engagement from engineers at Oracle and IBM. Sentry provides fine-grained authorization support for both data and metadata in a Hadoop cluster and is deployed in production in a number of large enterprises.
As the goals of Project Rhino and Sentry to develop more robust authorization mechanisms in Apache Hadoop are in complete alignment, the efforts of the engineers and security experts from both companies have merged, and their work now contributes to both projects. The specific goal is “unified authorization”, which goes beyond setting up authorization policies for multiple Hadoop components in a single administrative tool; it means setting an access policy once (typically tied to a “group” defined in an external user directory) and having it enforced across all of the different tools that this group of people uses to access data in Hadoop – for example access through Hive, Impala, search, as well as access from tools that execute MapReduce, Pig, and beyond.
Another example is the joint focus on delivering open source HDFS encryption in Apache Hadoop. Engineers at Intel who developed Hadoop encryption capabilities and enhanced its performance by using the Intel Advanced Encryption Standard New Instructions (AES-NI) processor instructions are now joined by Cloudera, focusing jointly on enabling transparent encryption of data-at-rest for all files stored in a Hadoop deployment without compromising performance.
We are confident that with the expertise and commitment to open source from both Intel and Cloudera, security will remain a focus of innovation in the Apache Hadoop community. Together, we will enhance security across the Apache Hadoop platform to meet the security and compliance requirements of the most demanding enterprises. And with the recent acquisition of Gazzang, Cloudera has additional security engineers on hand, some of who will work on Project Rhino efforts as well.
For more information about how Cloudera provides comprehensive, compliance-ready security for the enterprise, register for the upcoming webinar, “Compliance-Ready Hadoop,” on June 19th at 10am PT.