Cluster Connectivity Manager

Using Cluster Connectivity Manager (CCM), CDP can communicate with Data Lake and Data Hub and CDP data services workload clusters that are on private subnets. This functionality is available for CDP deployments on AWS, Azure, and Google Cloud.

What's new

CCMv2 replaces CCMv1. While CCMv1 establishes and uses a tunnel based on the SSH protocol, with CCMv2 the connection is via HTTPS. All new environments created with Runtime 7.2.6 or newer after enabling CCMv2 on your tenant use CCMv2. Existing environments and new environments created with Runtime older than 7.2.6 continue to use CCMv1. All newly registered classic clusters use CCMv2, but previously registered classic clusters continue to use CCMv1.

The steps to register an environment with CCMv2 are similar to CCMv1 configuration steps. The main differences are:

  • If you are deploying in an environment with restricted outbound network access, a new port needs to be added to the allow list.
  • If you are registering a classic cluster, the steps have changed.

Supported services

The following CDP services are supported by this feature:

CCMv2

Supports environments with Runtime 7.2.6+

CDP service AWS Azure GCP
Data Lake GA GA GA
FreeIPA GA GA GA
Data Engineering GA
Data Hub GA GA GA
Data Warehouse GA
DataFlow GA GA
Machine Learning Preview Preview
Operational Database GA GA GA

CCMv1

Supports environments with Runtime <7.2.6 and environments created prior to CCMv2 GA.

CDP service AWS Azure GCP
Data Lake GA GA GA
FreeIPA GA GA GA
Data Engineering
Data Hub GA GA GA
Data Warehouse
DataFlow
Machine Learning
Operational Database

Cluster Connectivity Manager overview

When deploying environments without public IPs, a mechanism for end users to connect to the CDP endpoints should already be established via a Direct Connection, VPN or some other network setup. In the background, the CDP Control Plane must also be able to communicate with the entities deployed in your private network.

The Cluster Connectivity Manager (CCM) enables the CDP Control Plane to communicate with workload clusters that do not expose public IPs. This functionality is available for CDP deployments on all supported cloud providers. Communication takes place over private IPs without any inbound network access rules required, but CDP requires that clusters allow outbound connections to CDP Control Plane.

CCMv2

CCMv2 agents deployed on FreeIPA nodes initiate an HTTPS connection to the CDP Control Plane. This connection is then used for all communication thereafter. Data Lake and Data Hub instances receive connections from the CDP Control Plane via the agents deployed onto FreeIPA nodes. This is illustrated in the diagram below.

CCMv2 also supports classic clusters. You can use Replication Manager with your on-premise CDH, HDP, and CDP Private Cloud Base clusters accessible via a private IPs to assist with data migration and synchronization to cloud storage by first registering your cluster using classic cluster registration.

The following diagram illustrates which CDP services are supported by CCMv2 and when the connection is enabled for each service:



The following three diagrams illustrate CDP connectivity to a customer account without using CCM, using CCMv2, and using CCMv1.

The first diagram illustrates the CDP connectivity to a customer account without CCM. When CDP is deployed in public mode, security groups (called firewall rules in Google Cloud) must be configured to allow inbound access to the environment from the CDP Control Plane exit IP range, in addition to end-user access rules restricting traffic to only originate from the customer’s own network.

  • This is done automatically for new networks created by CDP, so the only CIDRs required during deployment are from the customer’s own network.

  • Customer-provided security groups must be configured to whitelist the Cloudera Control Plane CIDRs in addition to the customer’s own network CIDR.

Figure 1. Connectivity to customer account with CCM disabled

The second diagram illustrates the CDP connectivity to a customer account with CCMv2 enabled. When CCMv2 is enabled, the traffic direction is reversed so the environment does not require inbound access from Cloudera’s network. Since in this setup, inbound traffic is only allowed on the private subnets, configuring security groups is not as critical as in the public IP mode outlined in the previous diagram; However, in case of bridged networks it may be useful to restrict access to a certain range of private IPs.

Figure 2. Connectivity to customer account with CCMv2 enabled

CCMv1

The third diagram illustrates the CDP connectivity to a customer account with CCMv1 enabled. CCMv1 agents are deployed not only on the FreeIPA cluster (like in CCMv2), but also on the Data Lake and Data Hub. While CCMv2 establishes a connection via HTTPS, CCMv1 uses a tunnel based on the SSH protocol. Workload clusters initiate an SSH tunnel to the CDP control plane, which is then used for all communication thereafter.

Figure 3. Connectivity to customer account with CCMv1 enabled

To learn more about CCM, refer to the following documentation: