Production-Ready Hadoop: An Overview of Security in Cloudera 5

Categories: Security

As Apache Hadoop becomes increasingly critical to enterprises looking to get more value from more of their data, security and governance concerns are top-of-mind when evaluating Hadoop platforms. Cloudera 5 and related releases (5.3 being the most recent of Cloudera 5.x) offers comprehensive security controls that address the four pillars of security: Perimeter, Access, Visibility, Data.

Four Pillars

These four pillars, in turn, address the traditional security concerns around authentication, authorization, audit, and compliance – for a full compliance-ready stack. In fact, Cloudera 5.x is the first and only distribution to achieve PCI compliance.

Perimeter security (addressing authentication) defines what services can have access to the cluster itself. Cloudera’s perimeter security solution is focused on preserving user’s choice when selecting one of the many Hadoop services while also integrating with the existing standard systems – all at a manageable scale. Active Directory (AD) and Kerberos are standards within the enterprise, but can be notoriously difficult and tedious to implement, especially at Hadoop scale. Cloudera Manager simplifies and automates the work required to leverage AD and Kerberos for strong authentication within the platform – including direct integration into the built in Kerberos server in AD. This eliminates the need to establish and maintain a cluster-specific Kerberos server, allows users to continue to authenticate directly against AD, permits all Hadoop services to be defined directly in AD as part of the overall service management layer, and controls user access to these services via the AD groups. Cloudera Manager also completely automates the process of Kerberizing the cluster – making it quick and easy to deploy Kerberos without all the manual, error-prone configurations.

Once the user has authenticated against these services, access security defines what data they can access (addressing authorization). With Cloudera 5, central management of these access policies is key due to the number of access paths and users, so users can access the data needed to do their job. Cloudera 5.x delivers this unified authorization with Apache Sentry. Sentry is an open source project that provides unified authorization via fine-grained role-based access controls (RBAC) for Impala, Apache Hive, Apache Spark, MapReduce, Apache Pig, and Search. Apache Sentry (incubating) has quickly emerged as the open standard for authorization in Hadoop, with a broad set of contributions for sustained engineering quality, multi-vendor support for portability, and third-party integrations for compatibility – all resulting in wide industry adoption across even the most regulated verticals.


The above image provides an overview of how Sentry authorization works. Here we have a Sentry Role (Fraud Analyst Role) and this role has permissions – for this example, read access to all transaction data, though permissions can range from access to an entire database down to a specific set of columns and a particular set of rows. As defined with the perimeter security, there’s also a group in AD called Fraud Analysts that is connected to the Sentry Fraud Analyst Role. Sam Smith, a member of this AD group, inherits this Sentry role and corresponding permissions. Any AD group assignment changes will automatically update Sentry role assignments – eliminating repetitive, manual mirroring of assignments. Sentry, within Cloudera, also offers visual policy management via the HUE UI.

Another key aspect of security is having reporting on where data came from and how it’s being used. To address this visibility pillar, Cloudera 5 has Cloudera Navigator, the end-to-end data governance solution for Hadoop. Cloudera Navigator offers a full audit history that’s automatically centralized, with tracking on which users have been accessing what data and playback on access control permissions. Cloudera Navigator also offers comprehensive metadata storage and discovery, for both automated technical metadata and user-based business metadata. Combined with a simple yet powerful search interface, users can easily discover, classify, and locate data to not only support governance and compliance, but also self-service discovery within the cluster. In addition, Cloudera Navigator automatically collects lineage data as part of the technical metadata and provides powerful visualizations of both upstream and downstream lineage to quickly verify reliability. Finally, Cloudera Navigator offers a robust policy engine to drive actions related to specific sets and types of data. The policy engine monitors and enforces data curation rules, retention guidelines, and access permissions, and can also integrate with third-party tools.

Big data security is not complete without data protection. Cloudera 5 is the only distribution to offer a complete encryption and key management solution to ensure data at rest is completely protected both in and around the cluster. This includes not only the data stored in HDFS and Apache HBase, but also the supporting metadata stores (e.g. Hive metadata, Sentry security policies, log files, and temporary staging directories for ingest tools like Apache Flume and APache Sqoop) – all through Cloudera Navigator Encrypt. However, no encryption solution is complete without an enterprise-grade key manager. Cloudera Navigator Key Trustee stores and secures all encryption keys in a dedicated system that’s 100% independent of the cluster. This separates the keys completely from the encrypted data, which is an encryption best practice and compliance requirements. Navigator Key Trustee can also integrate with existing Hardware Security Modules (HSM’s), such as Thales, RSA, and SafeNet.

Combined, Cloudera 5 offers a built in security solution that’s comprehensive, transparent, and compliance-ready. Through the Cloudera Center for Security Excellence and our partnership with Intel, we will continue to drive security for Hadoop and our customers. To take advantage of Cloudera’s security offering, upgrade with Cloudera 5.3 or try a free trial of Cloudera Enterprise.


2 responses on “Production-Ready Hadoop: An Overview of Security in Cloudera 5

Leave a Reply