Securing the Enterprise Data Hub

Categories: Product Security

At Cloudera, we have seen how enterprises are transforming data management with unified systems that combine distributed storage and computation at limitless scale, for any amount of data and any type, and with powerful, flexible analytics, from batch processing to interactive SQL to full-text search. These organizations are realizing the benefits of an enterprise data hub — the next generation in data management — and are using Apache Hadoop as their foundations to meet the business challenges of today and tomorrow.

Yet to tap their full potential, these enterprise data hubs require strong security controls, because data security is a top business and IT priority for most enterprises — 1 in 4 business leaders rank big data and security as key initiatives for their organizations and more than 50% of organizations handle sensitive information in these new data hubs.  Many of our clients do business in regulated environments that dictate controls and protection of data.  For example:

  • Retail and banking services must adhere to the Payment Card Industry Data Security Standard (PCI DSS) to protect their clients and their transactions;

  • Healthcare and biotech are subject to the Health Insurance Portability and Accountability Act (HIPAA) compliance standards for patient information; and

  • US federal groups must comply with the various information security mandates of the Federal Information Security Management Act of 2002 (FISMA).

And security is not just to satisfy regulations and mandates.  Many of our clients also have business objectives that establish internal information security standards for their data. For instance, a strong client privacy standard can be a market differentiator and business asset.

Making Security Real

How then can organizations provide these assurances for data security?  Historically, early Hadoop developers did not prioritize data security — it just wasn’t needed for the kinds of data and work in Hadoop at the time.  The resulting early controls were designed to address user error — to protect against accidental deletion, for example — and not to protect against misuse, and each ecosystem project, from HDFS to Apache Hive, developed their own solutions for similar security goals.

“Cloudera has well framed perennial issues that have long proven challenging to enterprises even with their existing data platform environments.”
Tony Baer, Principal Analyst, Ovum

But that was yesterday’s Hadoop.  Security in Hadoop today is very different and addresses the vast majority of security and governance-related issues that enterprises contend with each and every day.  One way to understand how Hadoop meets these security challenges, especially when considered within the diversity of its ecosystem, is to view data security as a series of business capabilities and corresponding functional components — the technical design patterns, best practices, systems, and foundations — that serve as the building blocks to modern data security:

  • Perimeter Security and Authentication, which focuses on guarding access to the system, its data, and its various services. Authentication answers the question: who are you? Authentication is a way of mitigating the risk of unauthorized usage.  In a nutshell, authentication is simply to proving that a user or service is who he claims to be.

  • Visibility and Auditing, which consists of the reporting and monitoring on the where, when, and how of data usage. When users and services work with information, they leave a digital trail that explains what was done and when. Auditing is about capturing a complete and immutable record of all activity within a system. Auditing is central to security operations in case of a breach or other malicious action, to compliance and regulations, and to data forensics and usage pattern analysis.

  • Access and Authorization, which includes the definition and enforcement of what users and applications can do with data. Even if you are allowed to use information or a service, what exactly can you do? Authorization focuses who or what has access or control over a given resource or service.  Since an enterprise data hub merges together previously separate IT systems, it requires multiple authentication controls with different degrees of control.

  • Data Protection, which comprises the protection of data from unauthorized access either while at rest or in transit. Not all information should be free and clear, and the goal of data protection is to ensure that only the right users can view and use the right information.  Data protection can be split into two elements: protecting data when it is stored, commonly called “data at rest,” and protecting data while it moves around, or “data in transit.”


When you roll up your sleeves and see how Hadoop answers these challenges and enables these business capabilities,you will find that today’s Hadoop, like any modern data management system, has the necessary functions to ensure the security of its data, services, and users. And with Cloudera, not only do you get a proven Hadoop distribution with a wealth of security controls meeting each of these challenges, but you also get tools for automating and managing these controls, like Cloudera Manager and Cloudera Navigator, simplifying an otherwise complex and error-prone process. Finally, Cloudera adds additional granularity of authorization controls with Apache Sentry (incubating).

That’s why we are happy to offer our white paper, Securing Your Enterprise Hadoop Ecosystem, to help you explore in greater detail these essential technical functions that make up enterprise Hadoop security.

Download our white paper and discover how you can make a Cloudera Enterprise Data Hub a reality for your information-driven business in today’s regulatory and security-conscious environments.


2 responses on “Securing the Enterprise Data Hub

Leave a Reply