Over the past ten years, Apache Hadoop has come a long way. In Doug Cutting’s recent blog post, he talks about Hadoop’s history and the path of transforming this technology from developer-centric software to a powerful platform for Fortune 500 companies. That’s the premise on which Cloudera was founded, and today we see hundreds and hundreds of companies, young and old, transforming the way they do business thanks to Hadoop.
While Cloudera has constantly improved and innovated on Hadoop to make it faster and easier to use, one critical focus area has been to make it secure. If you want to appeal to the mainstream enterprises and ensure your software moves from testing to production, you need security. And when Hadoop was first introduced, the security simply wasn’t there. This meant Cloudera had a lot of work to do.
Cloudera has been focused on building the most secure Hadoop platform since the beginning. With multiple layers built into Cloudera Enterprise, we provide comprehensive security that balances the needs of the business with the needs of the Information Security team. Cloudera remains the first and only Hadoop distribution to pass a compliance audit and we even have a dedicated Center of Security Excellence that is constantly working on improving and expanding security in the platform to protect your data and your business.
With Hadoop, users from all different departments, using many different tools can process and analyze more data than ever before and iterate on it faster. While this is great for driving new business insights, some of this data may be more sensitive than others (such as personally-identifiable information or PII) and not all users should have access to it. So just like the legacy systems that came before it, Hadoop needed access controls. And Cloudera developed Apache Sentry to do just that. Today, with Sentry graduating from incubation to a top-level Apache project, let’s take a look at this important piece of the Hadoop security puzzle and where it’s headed next.
Sentry provides unified fine-grained access controls across Hadoop. Based on users’ roles, security administrators can easily set the permissions that define what data those users have access to. These permissions only need to be defined once and they will hold true no matter how the user chooses to access the data – whether it’s through a SQL tool, ETL/Processing tool, partner tool, etc. By centrally managing these controls, this dramatically reduces the manual overhead (and risk of human error!) that results from having multiple data access channels across hundreds or even thousands of users.
Companies across all industries, including Epsilon, Equifax, Magnify Analytics, MasterCard, SFR, Western Union, and many others use Sentry to manage fine-grained access controls across Hadoop today. Sentry has also seen adoption from the broader open source community, with active development from multiple companies; other Hadoop vendors shipping it with their distributions; and integrations with leading third-party tools for extended protection.
Recently, Cloudera released the perfect complement to Sentry, RecordService (currently in beta). RecordService is a new core security layer that centrally enforces fine-grained access control policies. With this counterpart, we can now provide row- and column-level permissions for tools that previously didn’t recognize rows and columns, such as Apache Spark. Security administrators can now define the correct permissions once in Sentry, and RecordService ensures that these are all interpreted and enforced correctly. If Sentry provides the guest list, then RecordService is the bouncer for the Hadoop access party.
So, what’s next for these two? In addition to ensuring these fine-grained permissions translate across all the access frameworks, we want to make sure that they also translate across different storage engines. Especially as more customers are looking at cloud or hybrid deployment models, it’s critical that the same access controls seamlessly translate as object stores are added into the mix.
To learn more about both Sentry and RecordService, visit Cloudera’s booth at Strata + Hadoop World San Jose this week and check out the session “Simplifying Hadoop with RecordService, a secure and unified data access path for compute frameworks.”
Congratulations on your graduation, Apache Sentry!