Kubernetes installation

This guide covers installing Deephaven in a Kubernetes environment using Helm. Commands are shown for running in a Unix-like environment. The installation can be customized by setting the Helm values described in the Kubernetes configuration settings page.

If you are trying out Deephaven in a minimal capacity for testing purposes and do not plan on ingesting a lot of data, see the Quickstart guide. Before continuing further, make sure you have read the planning guide to help determine how much memory and storage to configure for the system.

Prerequisites

You must meet the following prerequisites to deploy Deephaven with Kubernetes:

  • A Kubernetes cluster, with a dedicated namespace created for the Deephaven installation.
  • kubectl, docker, and helm command line tools.
  • An artifact repository to which Docker images can be stored and from which Kubernetes pods may pull them. In this example, my-repo.dev/my-project/images is used; replace this with your image repository URL.
  • A TLS webserver certificate and the private key that corresponds to it. The webserver and certificate must meet Deephaven's requirements. The Deephaven installation includes a LoadBalancer service (Envoy) that is the entry point for the application. A DNS entry for the hostname associated with this certificate must be created after the installation.
  • A Deephaven distributable package containing a Helm chart, support scripts, and Dockerfiles, e.g., deephaven-helm-1.20240517.344.tar.gz.
  • A distributable package containing etcd images, e.g., bitnami-etcd-containers-11.3.6.tar.gz.

If using Deephaven pre-built images:

  • A Deephaven distributable package containing pre-built Docker images, e.g., deephaven-containers-1.20240517.344.tar.gz.

If building your own images:

  • A distributable containing Deephaven Enterprise, e.g., deephaven-enterprise-1.20240517.344.tar.gz.
  • A distributable containing Deephaven Core+ worker, e.g., deephaven-coreplus-0.37.4-1.20240517.344.tgz.

Note

You can either use pre-built container images from Deephaven or build your own images. Building your own images allows for customizing them to make your JARs available to workers or to add plugins.

Kubernetes resource requirements

Before deploying Deephaven, ensure your Kubernetes cluster has sufficient resources to support the deployment. The requirements below are for a deployment with default configuration settings.

Minimum cluster resources

A minimal cluster requires:

Resource TypeMinimum Requirement
Network resources
Cluster IPs20 (plus one for each additional worker)
External IPs1 (for Envoy LoadBalancer)
Compute resources
Memory60GB total across all pods
CPU10 cores total across all pods

Note

These are baseline requirements for a minimal deployment. Production deployments should provision significantly more resources based on expected workload, number of concurrent users, and data processing requirements. See the Kubernetes-specific considerations section of the installation planning guide for guidance on sizing for production use.

Resource breakdown by component

The default Helm chart configuration allocates resources across multiple pods:

  • etcd cluster: 3 pods with memory and CPU requests.
  • Deephaven infrastructure services: Web API, Configuration Server, Authentication Server, etc.
  • Persistent Query Controller: Manages query execution.
  • Envoy proxy: Handles ingress traffic (requires one External IP).
  • Worker pods: For Code Studio sessions and Persistent Queries (scale based on usage).

Storage requirements

In addition to compute resources, your cluster must provide:

  • A read-write-many (RWX) persistent volume for shared data access across pods.
  • Persistent volumes for etcd data and backups.
  • Storage class appropriate for your Kubernetes provider (e.g., premium-rwo for GKE, gp2 for EKS, managed-csi for AKS).

Set the namespace for your Kubernetes context

Start by creating your Kubernetes namespace and setting it to the default for your kubectl context if you have not already done so.

Unzip the Deephaven Helm chart

Unpack the deephaven-helm package:

Push the Deephaven images to your image repository

If you are using Deephaven's pre-built images, load them into your local Docker repository.

To build images from Deephaven Docker files, first change the directory to the docker subdirectory within the unzipped helm distribution and copy the Enterprise and Core+ distributions to the deephaven_base and db_query_worker_coreplus directories, respectively.

  • To build custom JARs into your images and make them available in Persistent Queries and Code Studios, create a zipped tar file containing your JARs, copy it to the deephaven_customer directory, and add a --customer-coreplus-jar flag to the buildAllForK8s.sh command.
  • For Legacy workers, use the --customer-jar flag.
  • To build plugin JARs into your images and make them available to non-worker processes, create zipped tar files containing your plugins, copy them to the deephaven_customer directory, and add up to 10 --customer-plugin flags to the buildAllForK8s.sh command.

Push the etcd images to your image repository

First, load the images to your local Docker repo.

Tag and push the etcd images to your artifact repository.

Change directory

The rest of the commands in this guide must be run from the helm subdirectory of the unpackaged Helm distribution.

Configure Persistent Storage Volumes

The Deephaven deployment requires read-write-many (RWX) persistent volumes for shared data access across pods. You have two options for providing this storage:

  1. Create a new NFS deployment (recommended for testing and simple deployments) - Use the provided manifest files to deploy an NFS server within your Kubernetes cluster.
  2. Use pre-existing PVCs - Configure Deephaven to use existing RWX persistent volume claims that you have already created in your environment.

Choose one of the following approaches:

Option 1: Create a new NFS deployment

The Helm distribution contains manifest files to easily create an NFS server for this purpose.

The default storage class defined in setupTools/nfs-server.yaml is premium-rwo, valid in a GKE environment. Change this to an appropriate storage class for your environment (e.g., gp2 for EKS, managed-csi for AKS), then apply the files to create the deployment.

It may take a minute to start up the pod. You can check its status with kubectl get pods. Once it is running, these commands prepare it for use:

Option 2: Use pre-existing PVCs

If you have already created RWX persistent volume claims in your Kubernetes environment, you can configure Deephaven to use them instead of creating a new NFS deployment.

First, ensure your PVCs have the required directory structure. You can prepare them by running the setup-nfs-minimal.sh script against each PVC. For example, if you have a PVC named my-shared-storage:

Then, when installing the Deephaven Helm chart, configure the following values in your override YAML file to use your pre-existing PVCs:

Note

If you use pre-existing PVCs, you do not need to configure the nfs section in your Helm values file.

Install the etcd helm chart

The setup-etcd.sh script in the setupTools directory of deephaven-helm package installs the etcd helm chart. The command below creates a 3-node etcd deployment named dh-etcd with backup snapshots. Note the etcd installation name and the persistent volume prefix, as those will be needed later when installing the Deephaven Helm chart.

The prefix differentiates Persistent Volumes if you have multiple Deephaven installations in the same Kubernetes cluster, since volumes are not scoped to a namespace. It can be a short arbitrary string such as dh-dev1, dhqa, etc. To see all options available for the etcd install, run setupTools/setup-etcd.sh --help.

Note

The setup-etcd.sh generates two YAML files. The etcd-backup-vol.yaml file is a manifest for a Persistent Volume Claim (PVC) and Persistent Volume (PV) to store etcd automatic backups, and the etcdChartValues.yaml file is a values file for the etcd helm chart override values when performing the helm install. If you wish to see the files generated without installing etcd, you can provide a --dry-run flag to the command.

It will take a minute for the etcd pods to start up and become ready. You can check the status of the pods with this command, and should eventually see all pods with a 1/1 container ready status:

Install cert-manager (optional)

If you wish to run Deephaven services using TLS within the cluster for intra-cluster communication, you need to install cert-manager. This is optional but recommended for production environments.

Check if cert-manager is already installed

To see if cert-manager is installed on your cluster already, run:

If you see a message saying error: the server doesn't have a resource type "clusterissuer", then cert-manager is not installed.

Install cert-manager

There are several ways to install cert-manager. Full instructions are provided at the cert-manager installation page. The most straightforward way is to do the default static install listed there. The cert-manager project also provides a Helm chart that may be used to install cert-manager.

Configure the issuer

The setupTools/ca-bootstrap-issuer.yaml file in your Helm distribution will create:

  • A ClusterIssuer for the entire Kubernetes cluster that creates a self-signed root CA certificate.
  • An Issuer in your target Kubernetes namespace that will issue certificates with the root CA in the certificate chain.

You may create a new YAML file defining an Issuer configuration that is not self-signed if there is infrastructure to support it in your organization. For example, you may define an issuer that is configured to use HashiCorp Vault or an external provider. Details for these configurations may be found in the cert-manager issuer configuration docs.

Using an existing ClusterIssuer

If a ClusterIssuer is already present in your cluster, you can copy the second and third sections from ca-bootstrap-issuer.yaml (Certificate and Issuer definitions) to a new file, and update them for the names of your ClusterIssuer and namespace. Then apply the new file using kubectl apply -f.

Creating the default self-signed cluster issuer

To create the default self-signed cluster issuer:

  1. Edit setupTools/ca-bootstrap-issuer.yaml and replace occurrences of <your-namespace> with your target Kubernetes namespace.
  2. Apply the configuration:

Important

When using cert-manager, you must set certmgr.enabled to true in your values YAML file (see the Install the Deephaven helm chart section below). The namespace issuer and related objects must be deployed for the dispatcher to be able to obtain certificates for the workers.

If you are not using cert-manager, several self-signed certificates (without a common root CA) for Deephaven services will be generated and kept in a keystore for use by the system.

Create a Kubernetes secret for the TLS certificate

With the TLS certificate and private key stored as files named tls.crt and tls.key, respectively, run this command to create a deephaven-tls secret from them:

Create Roles and RoleBindings

Deephaven relies on Kubernetes role-based access control (RBAC) to grant certain pods the ability to add new objects to the Kubernetes cluster. This is used for two purposes:

  • The Helm install hook adds Kubernetes Secrets and ConfigMaps that are used by various components of the Deephaven platform.
  • The query dispatcher pod adds additional pods, services, and other Kubernetes objects to allow new persistent queries and code studios to start.

Verify that the user running the Helm installation is able to create Kubernetes Roles, RoleBindings, and ServiceAccounts. You can check this with the kubectl auth can-i command:

The output should be "yes" three times:

If the output is "yes" for all three permissions, then you may proceed with the installation. If not, please see Configuring roles, role bindings, and service accounts for details on how an administrator can manually create these objects before performing the installation with Helm as described below.

Install the Deephaven helm chart

You can now configure deployment settings in the Deephaven Helm chart. An example configuration YAML file is shown here, with comments providing more details. You can save this as something like deephaven-override-values.yaml, and its values will override the defaults defined in the chart.

Note

The Deephaven installation includes a LoadBalancer service (Envoy) that is the entry point for the application. The Envoy load balancer might be assigned an external IP address by default, and our helm chart allows for setting cloud provider-specific annotations that can change IP address assignment to conform to your environment. See the envoy.serviceAnnotations configurations in the override values YAML example below.

Override values YAML example

You can now install the Deephaven helm chart using your override YAML file:

Note

Properties for a Helm chart are typically stored in one or more YAML files. If more than one is provided to the helm command, priority is given to the last (right-most) file specified with -f. Properties can also be provided with --set flags, and those will take precedence over YAML settings.

The installation takes a couple of minutes. You can see progress by tailing the install job's log output with the command kubectl logs --f job/my-deephaven-deployment-name-pre-release-hook.

Create a DNS entry for the application

You need a DNS entry for the hostname referenced by the TLS certificate, using the external IP address of the Envoy service. How this is done varies with your Kubernetes provider and/or infrastructure. This example uses a Google Cloud Platform environment.

Set a password for the admin user

Run this command to use the Deephaven deployment’s management shell pod to run a command that sets up the user dh_admin with a password. In this example, the password is adminpw1, though you are encouraged to provide your own.

Set up resource monitoring and notifications

Each enterprise has a different way of setting up Kubernetes monitoring, metrics, and notifications using tools like Prometheus/Grafana, Datadog, or your cloud provider's offering, such as Cloud Monitoring in GKE. In a production Deephaven system, it is very important to set up notifications around persistent volume utilization so that action can be taken (e.g., expand storage, remove data) before a volume fills up, potentially stopping services and/or preventing data ingestion.

Caution

It is recommended that a warning-level alert is configured at 75% utilization of persistent volumes, and a critical-level alert is set for 90% utilization.

Log in

You can now access the application at a URL similar to https://yourhost.domain.com:8000/iriside, using the hostname that matches your webserver TLS certificate.

Note

The URL example above uses port 8000. Use port 8000 for servers with Envoy and port 8123 for servers without Envoy.

Autoscaling disruptions

Autoscaling solutions like Karpenter and Cluster Autoscaler add and remove nodes based on demand, and will also move pods between nodes to consolidate resources and drain a node so it can be removed. Since Deephaven pods are typically used continuously throughout their lifetimes, it is undesirable to allow an autoscaler to disrupt them to drain a node. Deephaven pods can be annotated to exempt them from autoscaler disruption. Check the details of your particular autoscaler for the required annotations, if it is not one of the options below.

Add the relevant annotation section to your values.yaml to mark Deephaven pods as protected from autoscaler disruption.

  • Karpenter on Amazon EKS:
  • Cluster Autoscaler on Azure AKS, or Google GKE Autopilot:

EKS specific information

Amazon Load Balancers

The default Amazon EKS setup uses the in-tree Kubernetes load balancer controller, which provisions Classic Load Balancers. The default Classic Load Balancer settings terminate connections after 60 seconds of inactivity. This results in Deephaven workers being killed when their controlling connection is closed. It is possible to manually configure the timeout, but Deephaven recommends installing the AWS Load Balancer Controller add-on, which uses a Network Load Balancer. Additionally, the AWS Load Balancer Controller supports annotations for configuring the service. The complete set of annotations that are suitable for your network is beyond the scope of this document (e.g., subnet and IP allocation), but the section below provides some recommendations for common configurations.

AWS NLB without Karpenter

The following annotations (specified in your Deephaven values.yaml file) instruct the controller to create a suitable Network Load Balancer:

AWS NLB with Karpenter

Karpenter provides autoscaling features for Kubernetes, deploying additional nodes when needed for new pod resources, and consolidating and removing nodes that are no longer needed. Karpenter interacts with the AWS NLB because the default instance target type routes NLB traffic through the ENIs (Elastic Network Interfaces) of cluster nodes. Karpenter is unaware of the NLB's use of the nodes, so it is possible that Karpenter will remove a node while the NLB is using it for a client session. There are two configuration options to address this:

  1. Use ip target type. This configures the load balance to connect directly to target IPs, rather than routing through node ENIs.
  1. If ip targets are not usable -- for example, if Cilium is in use, which is largely incompatible with ip targets -- then it will be necessary to configure the NLB to use labelled nodes that are known to be persistent in the cluster and not subject to sudden removal by Karpenter.

Manually configuring a Classic Load Balancer Timeout

When using a Classic Load Balancer, a manual workaround is to identify the AWS load balancer that the Kubernetes system allocated and increase the connection timeout using the AWS command-line tool.

To identify the load balancer, first run kubectl to find the external name of the load balancer:

In this example, the load balancer is identified by a89229d6c7c3a43fbba5728fb8216c64. The load balancer attributes can be queried with:

To adjust the connection idle setting to 900 seconds, run:

Minimal memory changes needed for EKS and AKS

When deploying to AKS (Azure) or EKS (Amazon Web Services), the default memory requests and limits for some services need to be modified. The settings are detailed in the troubleshooting guide.