Big Data success requires professionals who can prove their mastery with the tools and techniques of the Hadoop stack. However, experts predict a major shortage of advanced analytics skills over the next few years. At Cloudera, we’re drawing on our industry leadership and early corpus of real-world experience to address the Big Data talent gap with the Cloudera Certified Professional (CCP) program.
As part of this blog series, we’ll introduce the proud few who have earned the CCP: Data Scientist distinction. Featured today is CCP-03, David F. McCoy. You can start on your own journey to data science and CCP:DS with Cloudera’s free Data Science Challenge Solution Kit, featuring a live data set, a step-by-step tutorial, and a detailed explanation of the processes required to arrive at the correct outcomes so that you can get hands-on experience with a real-world scenario at your own pace.
What’s your current role?
Since I became certified CCP:DS, I’ve had the credentials to seek full-time employment as a data scientist. I’m first trying to find data science work within my current employer, but my experience thus far has indicated that not every company has the resources required to conduct data science projects at scale.
My recommendation to someone entering this field is to identify and seek roles at organizations that manage their own large, diverse data. There are non-trivial logistical, privacy, bandwidth, and financial barriers to working with truly Big Data if the organization does not actually own it.
Prior to taking CCP:DS, what was your experience with Big Data, Hadoop, and data science?
Prior to CCP:DS, I had a bit of exposure to Hadoop from taking part in a local hack-a-thon in Plano, Texas. I have 20 years of experience in remote sensing, so I’ve picked up a lot of data-science-style algorithms in the context of my work in image and video processing. I competed in a Kaggle contest, which was a good preliminary step towards a full data science project and great practice for Cloudera’s Data Science Challenge on web analytics for classification, clustering, and collaborative filtering.
What’s most interesting about data science, and what made you want to become a data scientist?
In the Sherlock Holmes stories, his sidekick, Dr. Watson, describes their adventures as a “half-sporting, half-intellectual pleasure.” I like to say data science is a “half-scientific, half-engineering pleasure.” There is the unknown a-ha or eureka factor of scientific investigation paired with the constructive design satisfaction of engineering code to perform the analysis and deal with large size and large dimensionality data sets.
I originally became interested in Hadoop to get more computing resources for image analysis. At one point, I realized that the sorts of algorithms I had been using to manage and analyze pixels all these years could also be used with other data. It has been a fairly smooth transition so far.
How did you prepare for the Data Science Essentials exam and CCP:DS? What advice would you give to aspiring data scientists?
I went through the study guide and read some of the recommended materials. I dedicated a little time to the ecosystem tools (e.g., Hive, Pig), but spending a few days with each—perhaps as part of Cloudera’s Data Analyst Training—instead of a few hours would have helped more. I’d also recommend becoming more familiar with machine learning and recommender systems. Cloudera offers an Introduction to Data Science course, and Coursera offers a few basic on-demand classes, as well.
My advice is to work your way through the study guide in detail, including the parts that don’t seem important at first. Spend a few days with each tool in the Hadoop ecosystem, especially Mahout. Expect to put some serious thought and effort into the project. At least for me, it was not a simple application of APIs.
Since becoming a CCP:DS in November 2013, what has changed in your career and/or in your life?
It helped me get data science work! I’ve acquired a Big Data perspective on problem solving: Big Data allows a data scientist to sample the rare corner cases and the rare data glitches. It provides more accurate statistics that enable training more complex models. The neural networks that never made it out of the lab in the 1990s are now practical solutions.
Also, my Kaggle score improved.
Why should aspiring data scientists consider taking CCP:DS?
The business world’s understanding of who and what a data scientist is remains fuzzy. CCP:DS goes a long way towards removing that ambiguity. Being associated with Cloudera earns instant respect, as well.
Ultimately, if you are applying for a job in a hot but ill-defined area like data science, having a certification that is relevant, recognized, and verifiable makes it a no-brainer for a busy human resources person or hiring manager to move you on to the next round of interviews. Because the exam is based on real-world challenges and is fully vetted by some of the world’s top experts, the certification does the hard work of pre-evaluating candidates against the multiple highly technical areas that would otherwise be difficult to qualify.