Creating a Virtual Warehouse

A Virtual Warehouse is an instance of compute resources in Public Cloud that is equivalent to an on-prem cluster. You learn how to create a new Virtual Warehouse in Cloudera Data Warehouse (CDW) Public Cloud.

A Virtual Warehouse provides access to the data in tables and views in the data lake your Database Catalog uses. A Virtual Warehouse can access only the Database Catalog you select during creation of the Virtual Warehouse.

In this task and subtasks, you configure Virtual Warehouse features, including performance-related features for production workloads, such as the Virtual Warehouse size and auto-scaling. These features are designed to manage huge workloads in production, so if you are evaluating CDW, or just learning, simply accept the default values. This task covers the bare minimum configurations.

In subtopics, you see details about how to configure features for production workloads, such as Hive query isolation and Impala catalog high availability.

In AWS environments, you can optionally select an availability zone. All compute resources run will run in the selected zone.
  • You obtained permissions to access a running environment for creating a Virtual Warehouse.
  • You obtained the DWAdmin role to perform Data Warehouse tasks.
  • You logged into the CDP web interface.
  • Your activated the environment from Cloudera Data Warehouse.
  • If you want to enable SSO, you must obtain the CDW_ACCESS_CONTROL entitlement to set up a user group, which is available as a technical preview.

    Also, to enable SSO, in Management Console > User Management you must set up a user group that identifies the users authorized to access to this Virtual Warehouse.

For more information about meeting prerequisites, see Getting started in CDW.
  1. Navigate to Data Warehouses > Virtual Warehouses > Create Virtual Warehouse.
  2. In New Virtual Warehouse, in Name, specify a Virtual Warehouse name.
  3. In Type, select the Hive or Impala type of Virtual Warehouse you want.
    Virtual Warehouses can use Hive or Impala as the underlying SQL execution engine. Typically, Hive is used to support complex reports and enterprise dashboards. Impala is used to support interactive, ad-hoc analysis.
  4. In Database Catalog, select the image of the Database Catalog to query with the Virtual Warehouse.
  5. In AWS environments only, accept the default availability zone, or select an availability zone, such as us-east-1c.
    The default behavior is to randomly select an availability zone from the list of configured availability zones for the associated environment. Generally, it is fine to accept the default.
  6. Optionally, if you have the required entitlement to set up a user group and have done so, in User Groups, select a user group you set up in advance to access endpoints.
  7. Optionally, if you have selected a user group for SSO, select Enable SSO to enable single sign-on to your Virtual Warehouse
    If you do not have a user group set up for SSO, do not select Enable SSO.
  8. Optionally, enter keys and values for Tagging the Virtual Warehouse.
  9. Select the Size of the Virtual Warehouse as described in the next subtopic.
    The AutoSuspend and Autoscaling controls appear.
  10. Configure auto-scaling as described in the subsequent subtopic.
  11. Select the Hive Image Version or the Impala Image Version version, and the Hue Image version you want to use, or accept the default version (latest) at the top of the drop-down menus.
    For example


  12. Accept default values for other settings, or change the values to suit your use case, and click Create to create the new Virtual Warehouse.
    Click the tooltip for information about settings.
    When you create a Virtual Warehouse, a cluster is created in your cloud provider account. This cluster has two buckets. One bucket is used for managed data and the other is used for external data.