As I mentioned in my last post, Cloudera has made significant progress in recent years around ensuring data safety and security no matter the location for government agencies. One opportunity for securing data less commonly addressed than crypto or guns/gates/guards is the problem of data movement. Every time data is moved, copied, or replicated a new security risk is introduced, the organization is made less efficient via shadow IT, and the essential problem of data discovery is still not solved. That’s why, this year, we’re focused on secure data sharing and discovery. Here’s how:
Strategic Partnering with Academia – We are selecting university labs with a strong basic research background for collaboration and refinement of methods. The product of that work is predicted to assist our government customers directly in ways too challenging to rely solely on program funding. I am also scheduled to complete a book on artificial intelligence in collaboration with George Mason University (GMU.) The book is a unique blend of academia, industry and government experts. In addition, we are excited to be working with the GMU Computational Social Sciences Program. One example of the expected utility of that direct collaboration is getting resource intensive agent-based models to scale. Economists would no longer have to rely solely on flawed linear regression models to predict economic downturn, which was a critical gap during the 2008 housing bubble and is arguably happening right now in the UK.
High-Performance Computing – A common existing investment for R&D computing has, for years, been high-performance computing (HPC). We’ve already begun integrating the Cloudera Enterprise stack to serve as a harmonious layer of additional features with existing HPC investments customers have made. In addition to message passing interface (MPI), customers will be able to run Spark, Streaming, Impala, and greatly enhanced ease-of-use with Cloudera Manager.
Safe Sharing – A logical, and necessary, extension once data is secured (from last year’s accomplishments) is to engage in secure sharing. Beyond simply requesting data and having to be responsible for tracking every piece of new data that arrives in the enterprise, Cloudera is working on a “dataset recommender.” The intended product uses existing recommender algorithms to suggest interesting data feeds to which users can subscribe. Users simply start grabbing data the way they always have and, when the system has learned their data preferences, it can send an automated channel of the data. These algorithms are already working at Facebook, NetFlix, Yahoo; it’s just time that we now make this easily available for enterprise users looking at spreadsheets too.
Insider Threat – Cloudera’s oldest, and most useful government use case is getting a facelift this year. It’s not too surprising that, as the ultimate data science platform, a Cloudera EDH gets the highest marks for finding/monitoring/stopping insider threats because it is scalable and embeds advanced analytics. We’ve been very good at constructing a plethora of artificial intelligence (AI) methods to predict insider threat, but we are now doubling-down. By partnering with independent software vendors to further perfect the end-user experience and make it much more accessible, this use case is expected to reach critical mass this year. While scientific computing users are already comfortable interacting with a terminal screen comprised of an ocean of cosines and other algorithms, other staff prefer the use of existing tools such as search and other graphical user interfaces. Look for several advances this year from us in that area.
But we’re not stopping there. Cloudera is upping its ante in support of the federal government. We’ve hired William Sullivan as our vice president of Public Sector and under his leadership we’re significantly increasing our sales and support functions, and expanding our office space in McLean, Virginia. All of this is aimed at putting data to work for government agencies.