Kubernetes installation

This guide covers installing Deephaven in a Kubernetes environment using Helm. Commands are shown for running in a Unix-like environment.

If you are trying out Deephaven in a minimal capacity for testing purposes and do not plan on ingesting a lot of data, see the Quickstart guide. Before continuing further, make sure you have read the planning guide to help determine how much memory and storage to configure for the system.

Note

The kubectl and helm commands shown in this guide do not explicitly set a namespace and assume that your environment is configured to operate in your intended namespace by default.

Prerequisites

You must meet the following prerequisites to deploy Deephaven with Kubernetes:

  • A Kubernetes cluster, with a dedicated namespace created for the Deephaven installation.
  • kubectl, docker and helm command line tools.
  • An artifact repository to which Docker images can be stored and from which Kubernetes pods may pull them. In this example, my-repo.dev/my-project/images/deephaven-install is used; replace this with your image repository URL.
  • A TLS webserver certificate that meets Deephaven's requirements and the private key that corresponds to it.
  • A Deephaven distributable package containing a helm chart, support scripts, and Dockerfiles, e.g. deephaven-helm-1.20240517.344.tar.gz.

If using Deephaven pre-built images:

  • A Deephaven distributable package containing pre-built Docker images, e.g., deephaven-containers-1.20240517.344.tar.gz.

If building your own images:

  • A distributable containing Deephaven Enterprise, e.g., deephaven-enterprise-jdk17-1.20240517.344.tar.gz.
  • A distributable containing Deephaven Core+ worker, e.g., deephaven-coreplus-0.37.4-1.20240517.344-jdk17.tgz.

Note

You can either use pre-built container images from Deephaven or build your own images. Building your own images allows for customizing them to make your JARs available to workers or to add plugins.

Unzip the Deephaven helm chart

Unpack the deephaven-helm package:

tar -xzf deephaven-helm-1.20240517.344.tar.gz

Push the Deephaven images to your image repository

If you are using Deephaven's pre-built images, load them into your local Docker repository.

docker image load -i deephaven-containers-1.20240517.344.tar.gz
./deephaven-helm-1.20240517.344/docker/pushAll.sh --source-tag 1.20240517.344 my-repo.dev/my-project/images/deephaven-install 1.20240517.344

To build images from Deephaven Docker files, first change the directory to the docker subdirectory within the unzipped helm distribution and copy the Enterprise and Core+ distributions to the deephaven_base and db_query_worker_coreplus directories, respectively.

  • To build custom JARs into your images and make them available in Persistent Queries and Code Studios, create a zipped tar file containing your JARs, copy it to the deephaven_customer directory, and add a --customer-coreplus-jar flag to the buildAllForK8s.sh command.
  • To build plugin JARs into your images and make them available to non-worker processes, create zipped tar files containing your plugins, copy them to the deephaven_customer directory, and add up to 10 --customer-plugin flags to the buildAllForK8s.sh command.
# Example: Building Docker images with a custom plugin in my-plugins.tgz and custom jars in my-custom-jars.tgz

# Change to the Docker directory within the unzipped helm distribution.
cd deephaven-helm-1.20240517.344/docker

# Copy the enterprise and coreplus JARs to these directories.
cp /your/path/to/deephaven-enterprise-1.20240517.344 deephaven_base/
cp /your/path/to/deephaven-coreplus-0.37.4-1.20240517.344 db_query_worker_coreplus/

# Optional: if building a custom JAR into a Core+ worker image
cp /your/path/to/my-custom-jars.tgz deephaven_customer/

# Optional: if building a plugin into your image
cp /your/path/to/my-plugins.tgz deephaven_customer/

# Run the docker build scripts. The --customer-coreplus-jar and --customer-plugin flags are optional.
./buildAllForK8s.sh --jdk17 \
    --version 1.20240517.344 \
    --coreplus-tar deephaven-coreplus-0.37.4-1.20240517.344-jdk17.tgz \
    --customer-coreplus-jar my-custom-jars.tgz \
    --customer-plugin my-custom-plugins.tgz

# Push the Docker images to your image repository
./pushAll.sh --source-tag 1.20240517.344 my-repo.dev/my-project/images/deephaven-install 1.20240517.344

Change directory

The rest of the commands in this guide must be run from the helm subdirectory of the unpackaged helm distribution.

cd deephaven-helm-1.20240517.344/helm

Set up an NFS deployment

The Deephaven deployment needs a read-write-many (RWX) store. While that store is not part of the Deephaven helm chart itself (and you may choose to use an available one in your environment), the helm distribution contains the manifest files to easily create an NFS server for this purpose.

The default storage class defined in setupTools/nfs-server.yaml is premium-rwo, which is valid in a GKE environment. Change this to an appropriate storage class for your environment (e.g., gp2 for EKS, managed-csi for AKS), then apply the files to create the deployment.

# NOTE: Update spec.storageClassName in setupTools/nfs-server.yaml to a valid storage class for your environment!
kubectl apply -f setupTools/nfs-server.yaml
kubectl apply -f setupTools/nfs-service.yaml

It may take a minute to start up the pod. You can check its status with kubectl get pods. Once it is running, these commands prepare it for use:

MY_NFS_POD=$(kubectl get pods -l role=deephaven-nfs-server --no-headers -o custom-columns="NAME:.metadata.name")
kubectl cp setupTools/setup-nfs-minimal.sh $MY_NFS_POD:/setup-nfs-minimal.sh
kubectl exec $MY_NFS_POD -- bash -c "export SETUP_NFS_EXPORTS=y && chmod 755 /setup-nfs-minimal.sh && /setup-nfs-minimal.sh"

Install the etcd helm chart

Deephaven depends on a bitnami etcd helm deployment. Create a Persistent Volume Claim (PVC) and Persistent Volume (PV) to store automatic backups, configure the etcd helm chart, and then install it.

# Copy the etcd backup template file included with the helm distribution.
cp setupTools/etcd-backup-pvc-template.yaml setupTools/etcd-backup-pvc.yaml

# Get the IP address of the NFS server created in the previous step.
kubectl get svc deephaven-nfs --no-headers -o custom-columns='IP:.spec.clusterIP'
10.122.10.43

Edit the setupTools/etcd-backup-pvc.yaml PVC file. The file includes comments with descriptions of what needs to be changed. Your file should look similar to this one, with a different IP address and possibly different PVC/PV names.

Example etcd-backup-pvc.yaml file
# Note: if you have multiple Deephaven installations you may want to preface the PV and PVC names
# with a prefix to differentiate it such as dev1, qa, uat, etc.
apiVersion: v1
kind: PersistentVolume
metadata:
  name: etcd-backup-snapshot-pv
spec:
  persistentVolumeReclaimPolicy: Retain
  accessModes:
    - ReadWriteMany
  capacity:
    storage: 20Gi
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    server: 10.122.10.43
    path: /exports/dhsystem/etcd-backup
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: etcd-backup-snapshot-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ''
  volumeName: etcd-backup-snapshot-pv
  resources:
    requests:
      storage: 20Gi

Now apply the file to create the objects:

kubectl apply -f setupTools/etcd-backup-pvc.yaml

If you have changed the name of the PVC then you must edit setupTools/etcdValuesWithBackup.yaml so it reflects the correct PVC name, then install the etcd helm chart.

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install dh-etcd bitnami/etcd --values setupTools/etcdValuesWithBackup.yaml

Create a Kubernetes secret for the TLS certificate

With the TLS certificate and private key stored as files named tls.crt and tls.key, respectively, run this command to create a deephaven-tls secret from them:

kubectl create secret tls deephaven-tls --cert=tls.crt --key=tls.key

Install the Deephaven helm chart

You can now configure deployment settings in the Deephaven helm chart. An example configuration YAML file is shown here, with comments providing more details. You can save this as something like deephaven-override-values.yaml, and the values contained in it will override the defaults defined in the chart.

Note

The Deephaven installation includes a LoadBalancer service (Envoy) that is the entry point for the application. The Envoy load balancer might be assigned an external IP address by default, and our helm chart allows for setting cloud provider-specific annotations that can change IP address assignment to conform to your environment. See the envoy.serviceAnnotations configurations in the override values YAML example below.

Override values YAML example
# deephaven-override-values.yaml

# ------------------------------------------------------------------------------------------------
# Mandatory Configuration Values:
# These values must be defined in order to deploy the Deephaven helm chart.
image:
  # This must be set to the image repository location to which you pushed your images.
  repositoryUrl: 'my-repo.dev/my-project/images/deephaven-install'

  # The tag with which the images were pushed to your repo. Note that you may opt to omit this in your
  # yaml file, and instead provide it on the command line as --set image.tag=1.20240517.344.
  # That would prevent using the incorrect tag in an old config file when performing a future upgrade to
  # a new Deephaven version.
  tag: '1.20240517.344'

# The hostname/DNS entry used for your Deephaven cluster. It must correspond with the TLS certificate
# created in an earlier step as the deephaven-tls secret.
envoyFrontProxyUrl: dh.yourdomain.com

nfs:
  # A short prefix prepended to pv and pvc names to distinguish them, if you have multiple environments.
  pvPrefix: 'dh-dev1'
  # If using the Deephaven nfs server created earlier, the server should be the CLUSTER-IP you see by
  # running kubectl get svc deephaven-nfs, e.g. 10.122.10.43
  server: '10.122.10.43'
  # If using the Deephaven nfs server there is no need to define the root. If you were to use your own
  # nfs server this would need to set to the path available to the system.
  root: '/exports/dhsystem'

etcd:
  # This is the secret containing the etcd root password that was generated when installing
  # etcd. It may not match the etcd release name you used earlier. Run the command 'kubectl get
  # secrets' and make sure your value is there.
  release: 'dh-etcd'

global:
  # This value should be set to a storage class in your environment with a provider that can
  # dynamically provision volumes. For example, this can be 'standard-rwo' for GKE environments,
  # 'gp2' for EKS, or managed-csi for AKS.
  storageClass: 'default'

# ------------------------------------------------------------------------------------------------
# Recommended Configuration Values:
# Strongly consider defining resource values for your use case, particularly memory settings.

# Resource sizes for Deephaven processes can be configured here. Note that only the default config
# for the log aggregator service (LAS) is shown here for brevity. You may add configs for these processes
# in addition to the las: aclwriter, authserver, configuration-server, controller, las, merge-server,
# query-server, tdcp, and webapi.
resources:
  las:
    binlogsSize: '20Gi'
    requests:
      cpu: '500m'
      memory: '4Gi'
      ephemeral-storage: '1Gi'
    limits:
      memory: '4Gi'
      ephemeral-storage: '1Gi'
    tailer:
      requests:
        cpu: '250m'
        memory: '2Gi'
        ephemeral-storage: '1Gi'
      limits:
        memory: '2Gi'
        ephemeral-storage: '1Gi'
  # Note that you do not need to provide all values shown above for the las. You can configure an
  # override only for controller.requests.cpu, for example. However, when setting memory it is
  # recommended that you set requests.memory and limits.memory to be identical, or pods may be
  # subject to eviction and OOMKilled errors.
  # This is an example configuration setting memory for the controller process at 4Gi and the
  # memory for the controller's tailer process to 2Gi, leaving other settings at their defaults.
  controller:
    requests:
      memory: '4Gi'
    limits:
      memory: '4Gi'
    tailer:
      requests:
        memory: '2Gi'
      limits:
        memory: '2Gi'

# Custom DIS process volume configuration. Set intradaySize and intradayUserSize to appropriate values for your
# target environment, or consult with Deephaven to evaluate your use case and make a recommendation. 10Gi default.
dis:
  intradaySize: 10Gi
  intradayUserSize: 10Gi

# ------------------------------------------------------------------------------------------------
# Optional Values:
# These values may be defined to override the Deephaven defaults. Note that this is an incomplete
# list of the values defined in the Deephaven chart, but it represents those values most likely to
# require changes. The full file with all chart configurations is in the helm distribution at
# deephaven-helm-1.20240517.344/helm/deephaven/values.yaml

# ALL OPTIONAL VALUES SHOWN ARE THE DEFAULTS AND CAN BE OMITTED UNLESS YOU ARE CHANGING THEM.

# You can add annotations for the envoy load balancer that is the front-end to the Deephaven application
# by putting the values here under envoy.serviceAnnotations. There is no default here, and values will
# vary by provider. Note that if none are provided, the default service type of 'LoadBalancer' may
# allocate an externally accessible IP address. The values in comments below are examples only - please
# consult your provider's documentation for further information.
envoy:
  serviceAnnotations:
#    service.beta.kubernetes.io/aws-load-balancer-subnets: "dh-ice2-bastion"
#    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
#    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "instance"
#    service.beta.kubernetes.io/aws-load-balancer-eip-allocations: "eipalloc-0a4ac88c475cf2d5b"
#    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

# Set to 'Always' for k8s to always pull the image when starting a container.
imagePullPolicy: 'IfNotPresent'

tailers:
  # How many minutes elapse since file's last modified time before a binlog is deleted in tailer containers.
  fileCleanupDeleteAfterMinutes: 240

# If you wish to use cert-manager for intra-cluster TLS process communication please contact a Deephaven representative.
certmgr:
  enabled: false

management:
  # Add the etcd root password as an env var named ETCD_ROOT_PASSWORD in the management-shell pod.
  addRootEtcdPasswordEnvVar: true

# If you are pulling source code from git to create a persistent query, set this to true.
controller:
  gitCreateVolume: false

You can now install the Deephaven helm chart using your override YAML file:

# You can choose any name to use in place of my-deephaven-deployment-name for your deephaven deployment.
helm upgrade --install my-deephaven-deployment-name deephaven \
    -f my-override-values.yaml \
    --set image.tag="1.20240517.344" \
    --debug

Note

Properties for a helm chart are typically stored in one or more YAML files. If more than one is provided to the helm command, priority is given to the last (right-most) file specified with -f. Properties can also be provided with --set flags, and those will take precedence over YAML settings.

The installation takes a couple of minutes. You can see progress by tailing the log output of the install job with the command kubectl logs -f job/my-deephaven-deployment-name-pre-release-hook.

Create a DNS entry for the application

You need a DNS entry for the hostname referenced by the TLS certificate, using the external IP address of the Envoy service. How this is done varies with your Kubernetes provider and/or infrastructure. This example uses a Google Cloud Platform environment.

MY_ENVOY_IP=$(kubectl get svc envoy -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
gcloud dns record-sets create my-deephaven-deployment-name.mydomain.com --ttl=300 --type=A --zone=myzone --rrdatas=${MY_ENVOY_IP}

Set a password for the admin user

Run this command to use the Deephaven deployment’s management shell pod to run a command that sets up the user iris with a password. In this example, the password is adminpw1, though you are encouraged to provide your own.

kubectl exec -it deploy/management-shell -- bash
/usr/illumon/latest/bin/dhconfig acl users set-password --name iris --hashed-password $(openssl passwd -apr1 adminpw1)

Set up resource monitoring and notifications

Each enterprise has a different way of setting up Kubernetes monitoring, metrics, and notifications using tools like Prometheus/Grafana, Datadog, or your cloud provider's offering such as Cloud Monitoring in GKE. In a production Deephaven system, it is very important to set up notifications around persistent volume utilization so that action can be taken (e.g., expand storage, remove data) before a volume fills up, potentially stopping services and/or preventing data ingestion.

Caution

It is recommended that a warning-level alert is configured at 75% utilization of persistent volumes, and a critical-level alert is set for 90% utilization.

Log in

You can now access the application at a URL similar to https://yourhost.domain.com:8000/iriside, using the hostname that matches your webserver TLS certificate.

Note

The URL example above uses port 8000. Use port 8000 for servers with Envoy and port 8123 for servers without Envoy.

EKS Specific Information

Amazon Load Balancers

The default Amazon EKS setup uses the in-tree Kubernetes load balancer controller, which provisions Classic Load Balancers. The default Classic Load Balancer settings terminate connections after 60 seconds of inactivity. This results in Deephaven workers being killed when their controlling connection is closed. It is possible to manually configure the timeout, but Deephaven recommends installing the AWS Load Balancer Controller add-on, which uses a Network Load Balancer. Additionally, the AWS Load Balancer Controller supports annotations for configuring the service. The complete set of annotations that are suitable for your network is beyond the scope of this document (e.g., subnet and IP allocation), but the following annotations (specified in your Deephaven values.yaml file) instruct the controller to create a suitable Network Load Balancer:

envoy:
  serviceAnnotations:
    service.beta.kubernetes.io/aws-load-balancer-type: 'nlb'
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: 'instance'

Manually configuring a Classic Load Balancer Timeout

When using a Classic Load Balancer, a manual work-around is to identify the AWS load balancer that the Kubernetes system allocated and increase the connection timeout using the AWS command line tool.

To identify the load balancer, first run kubectl to find the external name of the load balancer:

$ kubectl get svc envoy
NAME    TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)                         AGE
envoy   LoadBalancer   172.20.209.132   a89229d6c7c3a43fbba5728fb8216c64-882093713.us-east-1.elb.amazonaws.com   8000:31111/TCP,8001:30504/TCP   7d21h

In this example, the load balancer is identified by a89229d6c7c3a43fbba5728fb8216c64. The load balancer attributes can be queried with:

$ aws elb describe-load-balancer-attributes --load-balancer-name a89229d6c7c3a43fbba5728fb8216c64
{
    "LoadBalancerAttributes": {
        "CrossZoneLoadBalancing": {
            "Enabled": false
        },
        "AccessLog": {
            "Enabled": false
        },
        "ConnectionDraining": {
            "Enabled": false,
            "Timeout": 300
        },
        "ConnectionSettings": {
            "IdleTimeout": 60
        },
        "AdditionalAttributes": [
            {
                "Key": "elb.http.desyncmitigationmode",
                "Value": "defensive"
            }
        ]
    }
}

To adjust the connection idle setting to 900 seconds, run:

$ aws elb modify-load-balancer-attributes --load-balancer-name a89229d6c7c3a43fbba5728fb8216c64  --load-balancer-attributes="{\"ConnectionSettings\":{\"IdleTimeout\":900}}"
{
    "LoadBalancerName": "a89229d6c7c3a43fbba5728fb8216c64",
    "LoadBalancerAttributes": {
        "ConnectionSettings": {
            "IdleTimeout": 900
        }
    }
}