Big Data success requires professionals who can prove their mastery with the tools and techniques of the Hadoop stack. However, experts predict a major shortage of advanced analytics skills over the next few years. At Cloudera, we’re drawing on our industry leadership and early corpus of real-world experience to address the Big Data talent gap with the Cloudera Certified Professional (CCP) program.
As part of this new blog series, we’ll introduce the proud few who have earned the CCP: Data Scientist distinction. Featured first is CCP-02, Luis Quintela. You can also learn more about becoming a certified data scientist by joining us for the Data Scientist Takeover in the Cloudera booth at Strata Santa Clara on Tuesday at 5:30pm PT.
What’s your current role?
I’m currently at Samsung SDS (provider of IT enterprise solutions and services), working as Sr. Manager for Big Data Analytics. My responsibilities include defining adoption strategy of Big Data technologies by SDS, identifying new business opportunities enabled by data analytics, and understanding the data analytics technological landscape in search for partnership opportunities, with a focus on innovation coming from Silicon Valley.
Prior to taking CCP:DS, what was your experience with Big Data, Hadoop, and data science?
I’ve always considered Cloudera the absolute leader in this space, so when I decided to acquire knowledge on Hadoop, I signed up for the Cloudera Developer Training for Apache Hadoop and took the CCDH exam to achieve certification.
I also started reading anything I could get my hands on concerning Hadoop and machine learning. The next step was to apply that knowledge to my work, which at the time involved extending the IBM Rational software development platform.
I vectorized a set of software change requests based on term frequency calculated from the text fields—summary, description, and comments—and applied the K-means clustering algorithm using the Mahout library to uncover patterns. The intent was to find opportunities for improving software quality—if too many defects were associated to the same general theme, there might be an opportunity for improving the software development process related to that theme that would decrease the number of future issues—and increasing reuse—change requests that belong to the same grouping might be solved by a similar approach, following the same overall framework or applying the same solution pattern.
What’s most interesting about data science, and what made you want to become a data scientist?
I’ve worked directly with customers helping in the adoption of a business intelligence platform, which was providing good insight with several metrics categorized by different dimensions, all exposed in charts and consolidated dashboards. But there was something missing. Descriptive analytics was getting customers only so far, and those customers kept asking questions such as:
- If I see this type of negative trend in this chart, how should I react?
- Given the current state of my development project, as characterized by those metrics, am I going to meet the deadline?
Data science allows me to take the next step, using data to determine the next most probable outcome, outline possible actions given that outcome, and present the potential implications of each action.
How did you prepare for the Data Science Essentials exam and CCP:DS? What advice would you give to aspiring data scientists?
I started by reviewing all of the course slides from my Introduction to Data Science class and redoing the student exercises. Then I followed the study guide published on the Cloudera web site, which has great references to all the relevant material. I also reviewed the slides from my earlier Developer Training class.
I would encourage anyone who wants to achieve certification to also work on a practical application of the concepts discussed in the Cloudera University classes. There are a lot of data sets publicly available that could be used in applying machine learning methods and a lot of open-source tools to support the work.
Since becoming a CCP:DS in November 2013, what has changed in your career and/or in your life?
At Samsung SDS, we are working on building a Center of Excellence for Big Data Analytics, encompassing a platform, data science and development teams, and a thorough body of knowledge. The objective is to evangelize data analytics concepts and technologies and instantiate analytics solutions at scale for our customers.
The knowledge acquired in the Introduction to Data Science course, studying for the Data Science Essentials exam, and working on the Data Science Challenge gave me the necessary foundation for effectively driving this initiative.
Why should aspiring data scientists consider taking CCP:DS?
I worked several years at IBM, where certification is an important component of professional development. It provides a structured mechanism for acquiring new skills and defining professional goals. Certification can also be instrumental in finding the right resources for projects.
By creating the CCP: Data Scientist program, Cloudera is establishing that same level of structure in an emerging field, framing the role of a data scientist in a clear, unambiguous way. This effort helps in hiring, assigning, and developing the professionals that are going to be required by industry as data science solutions becomes pervasive across multiple industries.
Another important factor is that the certification program that Cloudera has put together goes beyond the written test, including a challenge that is designed to assess the data scientist skills in much greater depth than could be achieved in a multiple choice questionnaire. From my perspective, this makes the exercise much more compelling, valuable, and meaningful than any other certification available today. You are actually solving problems through data analysis in a full simulation of the situations data scientists face in the field.