EMC is launching its DSSD offering this week in a worldwide road show, and Cloudera is a proud participant and supporter. We started paying attention to DSSD several years ago, when it was still a standalone company. The company was started in 2010 with funding from Sun Microsystems founder and industry legend Andy Bechtolsheim. We’ve been working with the team since the very early days. EMC acquired DSSD in 2014. Product development – and our collaboration! – continued in the big company. I couldn’t be happier that all the hard work is now in front of the market.
I have written some, lately, about the very exciting changes at the hardware layer that are driving new opportunities for Apache Hadoop in software. When Hadoop was brand new, Yahoo! and Facebook ran it on pizza-box servers with lots of dumb local disks. Putting spinning storage and CPUs on the same backplane was a key reason that the system scaled so well and ran so fast.
In the decade and a half since Google originally designed that architecture, though, the industry has made some progress. A few years back we worked with EMC to integrate the Isilon storage system with Hadoop, and we have customers running Cloudera today against large-scale Isilon stores. Separating the storage grid from the compute grid turns out to have some administrative and cost advantages for enterprise deployment at scale.
The DSSD product is something different, though.
Technically, DSSD is a NAND-based, super-dense non-volatile memory system. It’s not like slotting a couple of SSDs into the motherboard and using them as a slow local cache, though. This is a separate system that speaks the protocols that matter, notably including HDFS. It’s the first-ever system to offer DMA to flash and uses a novel low-latency fabric for speed. It supports multi-host connectivity, so behaves like a shared store, not like a flash augment to RAM.
We’ve had a DSSD D5 in our certification and performance testing lab for some time. We’ve measured an order of magnitude better performance on random-access Apache HBase workloads, due in large part to low latency. That’s the performance metric that matters to most of our operational DBMS clients. At these speeds, I expect to see some really interesting new applications make use of super-fast big data. Examples include applications that serve models in real time for fraud detection, compliance monitoring, cybersecurity and more.
The upcoming Cloudera 5.6 release will include DSSD Hadoop Plug-in integration, that allows DSSD customers to aim their big data at the new appliance. We’ll be on the road with EMC in the coming weeks and months in a series of events talking in more detail about what DSSD can do, and how it integrates with Cloudera.
When we struck our strategic deal with Intel, it was because we were betting on big changes in hardware. When we began working with EMC on Isilon, and when that grew to include DSSD, it was because we believed the big data architecture needed to support both local and remote storage well. Those bets are clearly paying off. I’m really excited to see what great new use cases our customers and partners dream up for the combination of secure Cloudera processing and analytics on DSSD superfast persistent memory. I’m confident that it will drive new adoption and further growth in a market that’s already pretty hot.
Congratulations and a tip of the hat to our colleagues on the DSSD team at EMC. Lots of hard work behind you, and no doubt much you still plan to do – but take a moment to enjoy the accomplishment. Here’s to a really successful launch!