What's New

Major features and updates for the Cloudera Machine Learning data service.

November 29, 2022

Release notes and fixed issues for version 2.0.34.

New Features / Improvements

  • PBJ Runtimes - PBJ Workbench Runtimes are now GA, providing a more consistent experience with the Jupyter ecosystem. With the elimination of proprietary code you can now build a new Runtime from scratch, on your custom base image with your selected kernel, with no need to build on top of a Cloudera image anymore.
  • Experiment tracking - A new Experiments feature built on MLFlow is now available. You can now track your experiments in CML sessions with the preinstalled mlflow library and visualize and compare them in the CML application. This feature is enabled by default.
  • Customizable Scratch Space - You can now configure the amount of ephemeral storage space (also known as scratch space) a CML session, job, application or model can use. This feature helps in better scheduling of CML pods, and provides a safety valve to ensure runaway computations do not consume all available scratch space on the node. By default, each user pod in CML is allowed to use up to 10 GB of scratch space.
  • Iceberg support - Iceberg v2 is supported via Spark Runtime Addon, based on the CDE 1.17-h1 Runtime Addon version.
  • Jobs UI - The Jobs List page and Job Details page now display the job ID and Created At time.
  • Projects UI - When you select a runtime kernel on the project creation page, only the latest standard version of the selected kernel is added to the project.
  • Data tab - The Cloudera Data Visualization application that provides the Data tab experience got upgraded to v7.0.2. You can review the changes here.
  • Applications - Custom polling endpoints for Applications can now be specified in the UI.
  • Register new runtimes via APIv2 - Site administrators can register new ML Runtimes using the APIv2.
  • Runtime addon management - Site administrators can now also disable or deprecate specific Runtime Addons.
  • Environmental variables - The new project environmental variable PROJECT_OWNER holds the username of the project owner.

Fixed Issues

  • Python packages (DSE-21313) - Fixed an issue where Python packages installed via pip install may not be imported correctly until a new session is started.
  • Jobs (DSE-21771) - Fixed an issue where Jobs scheduled with PBJ Runtimes may terminate with a Success state even when there were errors reported in job runs.
  • PBJ Runtimes (DSE-19971) - Fixed an issue where comment blocks may not be rendered correctly in sessions or jobs launched with PBJ Runtimes.
  • Models (DSE-22527) - Fixed an issue where the Model Monitoring Chart was displaying data from other projects or deployments.

October 19, 2022

Release notes and fixed issues for version 2.0.32-H4.

New Features / Improvements

  • In-place Upgrades - In-place upgrades are now available for CML versions 2.0.29 and 2.0.30.
  • Azure backups - When creating snapshots for an Azure backup, the resource group name defined in the environment can be used (if present) as part of the snapshot name.

Fixed Issues

  • Environmental Variables (DSE-23098) - The environment variable WORKLOAD_PASSWORD is redacted in logs.
  • Environmental Variables (DSE-23499) - Environment variables are redacted from grpc request logging.
  • CVE (DSE-23534) - Upgraded several libraries to fix critical CVE vulnerabilities.
  • UI (DSE-22128) - Improved behavior of New Project and New Session buttons.
  • Secret Generator (DSE-22696) - Fixed an issue that was causing secret generator slowness.

September 27, 2022

Release notes and fixed issues for version 2.0.32-H3.

New Features / Improvements

  • Iceberg support - CML Snippets now fully support the Iceberg v1 table format for all Spark, Hive, and Impala data connections. See Create an Iceberg data connection for details.
  • Data connections - Data Hubs are now automatically discovered and connections are created for them in newly-created CML Workspaces.
  • Backup / Restore Workspace - The default timeout is now 12 hours, and the estimated time to complete a backup (from the cloud provider) is now periodically added to the event logs.