Announcing: Cloudera Data Science Workbench Release 1.4

Categories: Data Engineering Data Science Machine Learning Product

Cloudera Data Science Workbench (CDSW) makes secure, collaborative data science at scale a reality for the enterprise and accelerates the delivery of new data products. With CDSW, organisations can research and experiment faster, deploy models easily and with confidence, as well as rely on the wider Cloudera platform to reduce the risks and costs of data science projects.

CDSW 1.4 now extends the platform experience from research to production. Two key new capabilities, Experiments and Models, let data scientists build, train, and deploy models in a unified workflow; security enhancements automate user administration.

Experiments. As data scientists iteratively develop models, they often experiment with datasets, features, libraries, and algorithms as well as tuning hyperparameters. Each change can significantly impact the resulting model but is not typically recorded, making it impossible to reproduce and explain a given result. This leads to wasted time and effort during research and collaboration or, worse, compliance risk.

With Experiments, data scientists can run a batch job that will:

  • create a snapshot of model code, dependencies, and configuration parameters necessary to train the model
  • build and execute the training run in an isolated container
  • track model metrics, performance, and any model artifacts the user specifies

Users can now inspect and compare their prior training runs to determine which model is best, and then take next steps, such as deploying the best model.

Models. Data scientists often develop models using a variety of Python/R open source packages. The hard part is exposing those models to different stakeholders. However, deploying models to production typically requires time-consuming and error-prone recoding, as well as complex DevOps knowledge. Furthermore, keeping track of or rolling back deployed models poses significant version control challenges for data scientists and compliance offers alike.

With Models, data scientists can simply select a Python or R function within a project file, and Cloudera Data Science Workbench will:

  • create a snapshot of model code, saved model parameters, and dependencies
  • build an immutable executable container with the trained model and serving code
  • add a REST endpoint that automatically accepts input parameters matching the function signature, and that returns a data structure matching the function’s return type
  • save the built model container, along with metadata like who built or deployed it
  • deploy and start a specified number of model API replicas, automatically load balanced
  • let the user document, test, and share the model


Simplified user administration. Previous CDSW releases offered LDAP and SAML authentication but allowed every user to log in. The consequence was user sprawl and unintended license consumption. Designating CDSW administrators was a manual affair in the tool itself.

With release 1.4 you can now designate the LDAP and SAML groups for both users and administrators. With automatic synchronisation, the ability to log in or administer CDSW is now dependent on group membership; authorisation is now centralised in the system you already use for that purpose.

Cloudera Data Science Workbench 1.4.x is supported on the following versions of CDH and Cloudera Manager: CDH 5.7 or higher 5.x versions. For CSD-based deployments: Cloudera Manager 5.13 or higher 5.x versions; for package-based deployments: Cloudera Manager 5.11 or higher 5.x versions. In addition to cloud options, customers can now deploy on premises with Oracle Linux 7.4 (for the Oracle Big Data Appliance). Full details are available from the online release notes. To see the new capabilities in action, join our webinar on 13 June 2018.

Until CDSW Release 1.4 will become available this summer, you can find the current version 1.3 available for download and trial here.

Learn more about how Cloudera Data Science Workbench makes your data science team more productive.


Leave a Reply