Kubernetes installation with Helm

To install Deephaven on Kubernetes, we use a Helm chart. Deephaven has tested this Helm chart on GKE, AKS and EKS (see Amazon Load Balancers). The Deephaven Helm installation has three prerequisites:

  • An etcd cluster (which can be installed using a bitnami helm chart).
  • An NFS server for various shared Deephaven volumes.
  • A TLS certificate for the Envoy front proxy that serves web traffic.

And one optional prerequisite:

  • Install cert-manager in the cluster to handle issuing certificates to allow the Deephaven services to communicate using TLS.

Each Deephaven cluster should be in its own Kubernetes namespace. The etcd installation must be in that same namespace so that we can read the root passphrase from the secret. The NFS server need not be in the same namespace, or even inside of the Kubernetes cluster, but it needs to be accessible from all the pods and have a defined set of exports.

Although this chart depends on NFS, it can be adapted to any persistent volume that provides an accessMode of ReadWriteMany.

Note

We have chosen not to integrate the etcd installation with the installation of Deephaven at this time. By decoupling the charts, Deephaven can be installed and uninstalled while retaining the configuration.

General directory layout

When you extract the Helm archive provided to you (e.g., tar xzf ./deephaven-helm-1.20231218.432.tar.gz), the root extracted directory will have the following:

  • ./docker contains subdirectories for docker container images, each with a dockerfile, and container image support scripts.
  • ./helm contains the Helm chart and related supporting files.
  • ./dh_helm a wrapper script that automates the steps required to install or uninstall a Deephaven cluster in Kubernetes (see this section).
  • ./README.md basic identification of these files and information about where to find this, detailed, documentation.

Within the helm directory there are:

  • deephaven contains the Helm chart.

  • deephaven/templates contains subdirectories with the Helm chart templates that define the Kubernetes objects (pods, services, etc) for the Deephaven installation.

  • deephaven/values.yaml contains the default values for the Helm chart.

  • setupTools contains useful scripts and YAML for manipulating the system outside of the chart. See the table below.

    FileDescription
    nfs-server.yamlCreates an NFS server for use with your cluster; you can adjust volume sizes as appropriate.
    nfs-service.yamlCreates a service for the NFS server; you will need to use the name in your cluster's YAML.
    etcdValues.yamlSuitable Helm values file for an etcd installation.
    scaleAll.shScale deployments up and down. The argument is the number of replicas (0 to shutdown, 1 to start).
    restartAll.shRestarts all deployments by scaling them down to 0, then up to 1.
    delete-preinstall.shDeletes preinstall hook resources (hooks are not automatically deleted when a release is uninstalled).
    ca-bootstrap-issuer.yamlUsed only if cert-manager is installed in the cluster. Creates a ClusterIssuer and a self-signed root CA certificate, and an Issuer in the namespace that will issue new certificates with the root CA in the certificate chain. This file may be copied and used as a template for defining an issuer that is appropriate for your cluster.

Set up your own cluster using the dh_helm tool

The dh_helm tool automates the process of installing, upgrading, and also uninstalling or reinstalling a Deephaven Kubernetes cluster. This utility allows Deephaven Kubernetes installs to be a one-line command rather than a series of manual steps. By default, for installations and upgrades, the tool will check whether the needed product and Core+ files are already in the needed paths, or, if they are not, will check for them in the current directory and copy them to the needed locations.

Typical installation use is to copy the Deephaven product file (deephaven-enterprise-<jdk version>-<version>.tar.gz) and the Deephaven Core+ file (deephaven-coreplus-<version>.tar) to the directory where dh_helm is located, and run dh_helm from there.

  • For installations, the tool manages such steps as deploying etcd, deploying and configuring an NFS pod, creating the TLS secret, and other steps beyond the helm install step itself. Helm installation of Deephaven requires Docker images for the various container types used in the cluster. The buildAllForK8s.sh and pushAll.sh scripts, described here, can be used to do this manually, or --build-push-images can be passed to dh_helm for the script to do it automatically.

Note

Docker or Docker Desktop is needed to build images. Additional components may be needed to build images on a Mac using Apple Silicon.

  • For uninstallation, the tool offers options to competely remove all DH related artifacts (PVs, secrets, certificates, etc.) as a single step. (Note that this does not remove docker container images from the local or remote registries.)

  • Reinstallation is a complete uninstall followed by a fresh installation as a single command execution. This insures the new installation is totally fresh, with no reused data or configuration.

  • For upgrades, the tool can automate steps such as running upgrade scripts or deleting the management-shell pod so it can be recreated with new standard settings.

If the --values-yaml argument is provided, then the specified values file is passed to Helm. If --values-yaml is not specified, then dh_helm automatically creates the values file using helm/setupTools/my-values.tmpl as the basis for the generated values file. In this case, if there are other customizations to be added that dh_helm does not support, these changes should be made in the my-values.tmpl before running dh_helm. An error with be thrown if --values-yaml is used along with explicit arguments for values it can contain (--image-tag, --nfs-server, --pv-prefix, --etcd-release-name, --cluster-url, --container-registry, --storage-class, and whether to use the cert manager.)

The dh-helm tool has a fairly large set of argument options. These are also detailed by running dh_helm with no arguments, or with --help as an argument.

Minimum required arguments (for uninstall) are:

ArgumentDescription
--namespaceKubernetes namespace to install into / uninstall from.
--nameRelease name for the Helm installation of Deephaven (existing names can be found with helm list).
--etcd-release-nameRelease name for the Helm installation of etcd (existing names can be found with helm list).

Installation, reinstallation, or upgrade also require:

ArgumentDescriptionNotes
--dh-versionDeephaven product version such as 1.20231218.160.
--jdk-versionJava JDK version to use.Must match Java version of Deephaven product and Core+ packages, and be one jdk11 or jdk17 (case sensitive).
--coreplus-versionCore+ tar file version such as 0.28.1.
--container-registryThe registry in which to find images, and where built images will be pushed.
--tls-certThe PEM format X509 certificate to use for the Envoy endpoint of the cluster.Note that, for private PKI certificates, this should include the full chain.
--tls-keyThe key file that corresponds to --tls-cert.Note that there must be no password on this file.

Installation values yaml properties can be provided either with arguments here, or by providing a custom values yaml file:

ArgumentDescriptionNotes
--values-yamlPath and name of customized yaml values files.This allows more flexibility than dh_install discrete arguments.
--cluster-urlFQDN by which the cluster should be reachable (no https://, etc, just the full name)Note that this FQDN, or a wildcard matching it, must be in the SAN block of the certificate. Use this or --values-yaml.
--pv-prefixPrefix for names of persistent volumes created by the Helm install.One of:
  • --create-nfs-pod - Create a new NFS server pod in the cluster
  • --nfs-server - The name of existing NFS server to use
Optional argumentsDescription
--etcd-release-nameRelease name for the Helm installation of etcd. This is optional for installations where the default name (my-etcd-release) will be used and an alternative is not provided, but it is required (either here, or from --values-yaml) for uninstallation.
--dry-runEchoes commands that would have been run without actually running anything.
--verboseDoes not suppress output from commands run; no effect when --dry-run is specified. Only one of --quiet or --verbose can be used.
--quietSupresses all messages not generated in dh_helm itself - e.g. no errors or warnings from called commands; no effect when --dry-run is specified. Only one of --quiet or --verbose can be used.
--image-tagTag to apply to container images when building and pushing them, or to use for reading them if images already have been built. If not provided, then the value of --dh-version will be used.
--extra-requirementsA requirements.txt file of extra Python pacakges to install when bulding worker images.
--build-push-imagesBy default, the script does not build and push needed container images, and will instead attempt to check for them already existing in the container registry. This flag has the script build and push the images, which is a quick verification when the images already exist and caching is enabled (which it is by default).
--nocacheDisable Docker caching when building images. By default, caching is enabled.
--skip-image-checkSkips the container registry image checks that normally occur for install and reinstall when --build-push-images is not specified.
--create-namespaceNormally, the script checks for the specified namespace, and fails if it doesn't exist. With this option, the script attempts to create the namespace if it doesn't exist.
--removeWhen used for an installation or reinstallation, removes all objects, including etcd Helm release, PVs, and PVCs. Required when running --uninstall.
--forceWhen used for an installation or reinstallation, bypasses confirmation of uninstallation and, with --remove, deletion of PVs and PVCs.
--delete-management-shellWhen used with --upgrade, deletes the management shell pod so it can be created with possibly changed properties from the new chart. This is not needed with versions after 1.20231218.053, as the pod has been replaced with a deployment.
--delete-nfs-serverWhen used with --remove, deletes the nfs server deployment.
--storage-classStorage class name for local RWO. If deploying in a non-GKE environment set this to value appropriate for your cluster provider; e.g. 'gp2' for EKS. Default, which is suitable for GKE, is 'standard-rwo'.
--no-cert-managerConfigures to install a cluster that does not use the Kubernetes cluster issuer for TLS between cluster services.

No more than one of the following can be specified:

ArgumentDescription
--installThe default operation, but can be explicitly stated.
--uninstallBy default will helm delete the Deephaven release; with --remove it will additionally uninstall etcd, NFS (if it's a pod), and delete all PVs, PVCs, and jobs.
--reinstallEffectively runs --uninstall, and then installs the specified version. Requires --remove, as reinstall cannot reuse existing configuration.
--upgradeMay run upgrade scripts, if needed, and optionally delete the management pod. Passes through to helm upgrade for the cluster, which maintains existing data and configuration.

Example dh_helm command lines

Installation with creation of pod-based NFS server and build and push of needed images:

./dh_helm \
  --install \
  --namespace test2 \
  --name test-k8s-324-2 \
  --cluster-url test-k8s-cluster-2.int.illumon.com \
  --tls-cert cus-tls/tls.crt \
  --tls-key cus-tls/tls.key \
  --container-registry us.gcr.io/eng/simple-containerization/test \
  --dh-version 1.20231218.432 \
  --jdk-version jdk17 \
  --coreplus-version 0.33.6 \
  --pv-prefix test-k8s-2 \
  --verbose \
  --install \
  --create-nfs-server \
  --build-push-images \
  --create-namespace

Full uninstall with no confirmation prompts:

Warning

This removes all Deephaven Kubernetes cluster components from the test2 namespace. No configuration or data is retained.

./dh_helm \
  --namespace test2 \
  --name test-k8s-324 \
  --etcd-release-name test-etc \
  --uninstall \
  --force \
  --remove \
  --namespace test2 \
  --verbose \
  --name test-k8s-324

An upgrade that also removes the management shell Pod as part of the upgrade process, so it can be recreated with new properties. Some patch versions of Deephaven require that the management shell be deleted prior to the upgrade, because some immutable properties of the Pod have been changed. Versions 1.20231218.053 and later use a management shell Deployment instead of a Pod. For these more recent versions the --delete-management-shell argument is no longer needed.

./dh_helm \
  --upgrade \
  --namespace test2 \
  --name test-k8s-324-2 \
  --tls-cert cus-tls/tls.crt \
  --tls-key cus-tls/tls.key \
  --container-registry us.gcr.io/illumon-eng-170715/simple-containerization/test \
  --cluster-url test-k8s-cluster-2.int.illumon.com \
  --image-tag 1.20231218.162 \
  --dh-version 1.20231218.162 \
  --jdk-version jdk17 \
  --coreplus-version 0.32.1 \
  --pv-prefix test-k8s-2 \
  --verbose \
  --delete-management-shell

Set up your own cluster - manual process

  1. Install kubectl and helm. The machine that you will be installing from must have both of these utilities installed.

    • Verify that kubectl is installed (and that you have connectivity to your cluster) with kubectl get ns.
    • Verify that Helm is installed with helm list.
  2. Create a namespace.

    • You must create a namespace for your Deephaven installation using kubectl create namespace <your-namespace>.
    • Set your kubectl config to use that namespace with kubectl config set-context --current --namespace=<your-namespace>.
  3. Build your containers. This should be done on a host with an architecture that matches the architecture of the host on which the container should be run. Building a Docker container on one platform architecture that targets another architecture is possible but outside the scope of these instructions. The docker directory contains the Dockerfiles, and a script to build and push them.

    • Place a Deephaven installation .tar.gz file into the deephaven_base subdirectory. The Deephaven base image will be built using this version of the software.

    • Place a Core+ worker tar into the db_query_worker_coreplus subdirectory. This will normally be the Core+ tar file that matches the Deephaven version used for the above installation tar.gz.

    • Run ./buildAllForK8s.sh to build the images. The script arguments are described here:

      ArgumentDescription
      --versionThe Deephaven version (e.g., 1.20231218.432) used to select tar file.
      --jdk11|jdk17Specifies which JDK version should be installed.
      --container-pathPath to Dockerfile directories. Defaults to current working directory.
      --no-cacheDisable caching. Defaults to caching.
      --coreplus-tarFull name of a specific Core+ tar.gz file used to build a Core+ worker image.
    • Run ./pushAll.sh <REPOSITORY> <TAG> to push the images to your container registry. The container registry must be accessible from the Kubernetes cluster.

      ArgumentDescription
      REPOSITORYThe repository to push the images to. If a .reporoot file exists in the same directory as the script, the repository used is $REPOROOT/REPOSITORY.
      TAGThe tag of the pushed images (e.g. latest or 1.20231218.432).

      For AKS, you will have to explicitly give your cluster access to your container registry. To do this, run az aks update -n <cluster-name> -g <resource-group> --attach-acr <container-registry-name>. More details can be found in the AKS documentation.

  4. Create your NFS server.

    • Run the following commands to set up a nfs-server deployment and service. You may want to edit these files to rename the deployment, service and persistent volume claim names, and most importantly the storage type, which will be the default for your platform. If you have not edited them you will get the default configuration, which is a service called deephaven-nfs with a fully qualified domain name of deephaven-nfs.<namespace>.svc.cluster.local. This FQDN, or possibly the actual IP address, will be needed later when setting up your cluster's my-values.yaml (see the step below). Run these commands to set up the new NFS deployment:
    
    # NOTE: Consider changing the persistent volume claim storageClass in the below yaml from 'default'
    # to a storage class that meets your performance requirements. If dynamic provisioning for the
    # storage class is not configured, you may need to pre-create a persistent volume beforehand.
    kubectl apply -f setupTools/nfs-server.yaml
    kubectl apply -f setupTools/nfs-service.yaml
    

    An existing NFS server can be used if you have one. If you want to use an existing NFS server, it will need some directories exported. See setupTools/setup-nfs-minimal.sh for what is required.

  5. Set up your NFS server.

    • Run kubectl get pods to get the name of your NFS server Pod and confirm that it is running.

    • Copy the setup script to the NFS pod by running this command, using your specific NFS pod name:

      # Run 'kubectl get pods' to find your specific nfs-server pod name and use that as the copy target host in this command.
      kubectl cp setupTools/setup-nfs-minimal.sh <nfs-server-name>:/setup-nfs-minimal.sh
      
    • Run this command to execute that script, once again substituting the name of your NFS Pod:

      kubectl exec <nfs-server-name> -- bash -c "export SETUP_NFS_EXPORTS=y && chmod 755 /setup-nfs-minimal.sh && /setup-nfs-minimal.sh"
      
  6. Install the bitnami etcd chart.

    • The following command installs the etcd Helm chart with a Helm release name you must choose (e.g., etcd-deephaven). To customize the etcd installation, copy and update setupTools/etcdValues.yaml to suit your particular server.

      helm repo add bitnami https://charts.bitnami.com/bitnami
      helm install <release-name> bitnami/etcd --values setupTools/etcdValues.yaml
      
    • If you uninstall etcd, you must remove the persistent volumes and persistent volume claims before reinstalling. Alternatively, you can use a different etcd release name and update your my-values.yaml for Deephaven accordingly.

      kubectl delete pv,pvc -l app.kubernetes.io/instance=my-etcd-release
      
    • It may take a minute or two for etcd to become ready, particularly if you have replicas that need to synchronize.

      $ kubectl get pods -w -l app.kubernetes.io/name=etcd
      NAME                         READY   STATUS    RESTARTS   AGE
      my-etcd-release-0            1/1     Running   0          3h5m
      my-etcd-release-1            1/1     Running   0          3h5m
      my-etcd-release-2            1/1     Running   0          3h5m
      

      You should wait until all replicas in the stateful set report 1/1 in the READY column before proceeding with the Deephaven installation. You can also verify the etcd installation using the instructions from the Helm notes for that release.

  7. Install cert-manager (optional).

    If you wish to run Deephaven services using TLS within the cluster, then you will need to install cert-manager. To see if cert-manager is installed on your cluster already, run kubectl get clusterissuer. If you see a message saying error: the server doesn't have a resource type "clusterissuer", then it is not installed.

    There are several ways to install cert-manager, and full instructions are provided at the cert-manager installation page. The most straightforward way is to use the default static install listed there. They also provide a helm chart that may be used to install cert-manager.

    The setupTools/ca-bootstrap-issuer.yaml file will create a ClusterIssuer for the entire Kubernetes cluster that creates a self-signed root CA certificate, and an Issuer in your target Kubernetes namespace that will issue certificates that have the root CA in the certificate chain. You may create a new yaml file defining an Issuer configuration that is not self-signed if there is infrastructure to support it in your organization. For example, you may define an issuer that is configured to use HashiCorp Vault, or an external provider. Details for these configurations may be found in the cert-manager issuer configuration docs.

    If a ClusterIssuer was already present in your cluster, you can copy the second and third sections from ca-bootstrap-issuer.yaml (Certificate and Issuer definitions) to a new file, and update them with the names of your ClusterIssuer and namespace. Apply the new file using kubectl apply -f.

    To create the default self-signed cluster issuer, first edit setupTools/ca-bootstrap-issuer.yaml and replace occurrences of <your-namespace> with your target Kubernetes namespace, then run the following command:

    # First edit ca-bootstrap-issuer.yaml and replace occurrences of <your-namespace> with your Kubernetes namespace
    kubectl apply -f setupTools/ca-bootstrap-issuer.yaml
    

    If not using cert-manager, several self-signed certificates (without a common root CA) for Deephaven services will be generated and kept in a keystore for use by the system.

    Note

    You must set certmgr.enabled to true in your my-values.yaml file for the cert-manager installation to be used.

  8. Create a TLS secret for the Kubernetes cluster. The secret must be named deephaven-tls and must be in the same namespace as your Deephaven installation. You must provide the tls.crt and tls.key files for the Web server certificate that meets the requirements specified in the Install and Upgrade Guide.

    kubectl create secret tls deephaven-tls --cert=tls.crt --key=tls.key
    
  9. Create your my-values.yaml values override file.

    • This file will contain values that override the defaults defined in the Deephaven chart's values.yaml file. This is a new file referred to in this document as my-values.yaml, but the name is not significant and you may name it anything you like. Note that no changes should be made to the chart's values.yaml file that already exists in the deephaven directory. This new file is used by helm to set properties required for, and specific to, your installation. The first five of these are required. Additionally, there are some default settings that are only applicable in a Google GKE environment and may need to be overridden if deploying in another provider:
ValueCommentDescription
nfs.pvPrefixRequiredPrefix for persistent volume names stored on the NFS server. This disambiguates releases as PVs are global and not namespaced.
nfs.serverRequiredHostname address of your NFS server. Note that IP address may be required in EKS and AKS, to find it run kubectl get svc nfs-server.
etcd.releaseRequiredThis will be the name you choose for the etcd release with '-etcd' appended to it. Run kubectl get secret my-etcd-release and confirm there is a secret with this name.
envoyFrontProxyUrlRequiredUser facing URL of the envoy front proxy.
image.tagRequiredThis would typically be redefined to the specific Deephaven version you are installing, or perhaps 'latest'.
global.storageClassRecommendedUse this value as the default storage class, e.g., standard-rwo for GKE, gp2 for EKS, or managed-csi for AKS.
dis.intradaySizeRecommendedDefaults to 10G. Set to appropriate value your target environment, or consult with Deephaven to evaluate your use case and make a recommendation.
dis.intradayUserSizeRecommendedDefaults to 10G. Set to appropriate value your target environment, or consult with Deephaven to evaluate your use case and make a recommendation.
dis.storageClassNot requiredMay be set to a specific storage class if desired, otherwise will be set to global.storageClass.
binlogs.storageClassNot requiredMay be set to a specific storage class if desired, otherwise will be set to global.storageClass.
management.storageClassNot requiredMay be set to a specific storage class if desired, otherwise will be set to global.storageClass.
envoy.externalDnsNot RequiredIf your cluster is configured your cluster with an ExternalDns provider, set to true to create a DNS record that points the envoyFrontProxyUrl to the envoy Kubernetes service.
envoy.enableAdminPortNot RequiredIf true, enables the envoy admin interface on port 8001.
image.repositoryUrlNot RequiredThe image repository url and path where the Deephaven container images are hosted. This does have a default value but would typically be redefined to something else.
certmgr.enabledRequired with cert-managerIf using cert-manager, this must be set to true. Otherwise this may be omitted as it defaults to false.

An example my-values.yaml is as follows:

image:
  repositoryUrl: us-central1-docker.pkg.dev/project/path # The repository and path where container images are stored.
  tag: '1.20231218.432'
nfs:
  pvPrefix: dhtest1
  server: 'deephaven-nfs.<your-namespace>.svc.cluster.local' # Some non-GKE K8S providers will require an IP address here
  root: '/exports/dhsystem/'
etcd:
  release: my-etcd-release
global:
  # The name of the default storage class to use as dedicated local persistent storage for pods.
  # This will vary between different Kubernetes providers; standard-rwo is applicable for Google GKE.
  storageClass: 'standard-rwo'

envoyFrontProxyUrl: 'deephaven.kubernetes.internal.company.com'
envoy:
  externalDns: true

# Set to true if cert-manager has been installed and will be used to provide secure intra-service communications.
certmgr:
  enabled: true

# If deploying in a non-GKE environment set these to values appropriate for your cluster provider, e.g. 'gp2'
# for EKS. These are not required, and will take the global.storageClass value if not present.
dis:
  storageClassName: 'standard-rwo'
binlogs:
  storageClassName: 'standard-rwo'
management:
  storageClassName: 'standard-rwo'
  1. Install the Deephaven helm chart. You are now ready to install.

    • Use the following helm install command, substituting the names of your release and my-values.yaml if they are different.
    helm install deephaven-helm-release-name ./deephaven/ -f my-values.yaml --debug
    

    In this example the Helm release name for Deephaven is deephaven-helm-release-name, but you may select a release name of your own. The chart installation executes the preinstall hook, which configures etcd, routing, properties, initializes ACLs and more. --debug will show progress as Helm configures the installation.

  2. Wait for all Pods to come online. After installing the chart, it will take a moment for all Pods to start up and initialize.

    • Use kubectl get pods -w to watch the status of the Pods as they come up. There are dependecies of Deephaven services, and you will see Pods appear in an Init state until that pod's dependent service is available. The configuration service will start first, then the auth service, then the remaining ones. The final Pods you will see are the query-server workers and a merge-server worker for the built-in queries.
  3. Create a password for user iris.

    • Set a password for the iris user from the management shell Pod.
    kubectl exec deploy/management-shell -- /usr/illumon/latest/bin/iris iris_db_user_mod -set_password -user iris -password  $(echo -n <your-password> | base64)
    
  4. Set or update a DNS record.

    If your cluster is already configured with an external DNS provider, then the external DNS controller will provide a correctly configured hostname. If you have not configured external DNS, then the following commands will be helpful to find the IP address needed to create a DNS entry.

    # Find the IP address of the Envoy service IP address:
    kubectl get --namespace=<yournamespace> svc envoy -o 'jsonpath={.status.loadBalancer.ingress[0].ip}'
    
    # Find the hostname of the Envoy service, which may be needed if your Kubernetes cluster ingress uses a hostname:
    kubectl get --namespace=<yournamespace> svc envoy -o 'jsonpath={.status.loadBalancer.ingress[0].hostname}'
    

    Links to setup information for common providers is provided here for reference. [ GKE | EKS | AKS ]

  5. Log in and start using Deephaven.

    You should now be able to navigate to the host defined as the envoyFrontProxyUrl in your values override file; e.g: https://deephaven.kubernetes.internal.company.com:8000/iriside/

Upgrading a Kubernetes cluster with Helm

The upgrade process is similar to the installation process, in that you will create an installation directory structure based on the Helm tar.gz, and you will build images and configure the chart with a values.yaml file. The main differences are that the prerequisites will already be in place, and helm upgrade is used instead of helm install. Note that the dh_helm tool can also be used to upgrade a Deephaven Kubernetes cluster.

  1. In a new directory for the new version, extract the new version's Helm archive; e.g.: tar xzf ./deephaven-helm-1.20231218.432.tar.gz.

  2. Copy the Deephaven product archive into the deephaven_base container directory; e.g.: cp ~/Downloads/deephaven-enterprise-jdk*-1.20231218.432.tar.gz deephaven-helm-1.20231218.432/docker/deephaven_base/

  3. Copy the Deephaven Core+ product archive into the db_query_worker_coreplus container directory; e.g.: cp ~/Downloads/deephaven-coreplus-0.33.6-1.20231218.432-jdk17.tgz deephaven-helm-1.20231218.432/docker/db_query_worker_coreplus/

  4. Copy the previously used values.yaml file to the helm directory; e.g.: cp ~/Downloads/values.yaml deephaven-helm-1.20231218.432/helm/

  5. From the docker directory, build and push the new container images:

  • Run ./buildAllForK8s.sh to build the images.
ArgumentDescription
--versionThe Deephaven version (e.g., 1.20231218.432) used to select the tar file.
--jdk11|jdk17Specifies which JDK version should be installed.
--container-pathPath to Dockerfile directories. Defaults to current working directory.
--[no-]cacheDisable or enable Docker caching. Defaults to caching.
--coreplus-tarFull name of a specific Core+ tar.gz file used to build a Core+ worker image.
  • Run ./pushAll.sh to push the images to your container registry. The container registry must be accessible from the Kubernetes cluster.
ArgumentDescription
REPOSITORYThe repository to push the images to. If a .reporoot file exists in the same directory as the script, the repository used is $REPOROOT/REPOSITORY.
TAGThe tag of the pushed images (e.g. latest or 1.20231218.432).

For AKS, you will have to explicitly give your cluster access to your container registry. To do this, run az aks update -n <cluster-name> -g <resource-group> --attach-acr <container-registry-name>. More details can be found in the AKS documentation.

Warning

Using latest for the image tag, or not updating the image tag, will result in pods not restarting automatically after an upgrade and continuing to run with the older versions of the images, because the system uses the image tag value to detect whether it is already running the correct versions of images.

  1. Update the values.yaml file in the helm directory. At the least, you will likely need to update the tag value for the newly built container images (unless they are the new "latest"). If new functionality was added in the new release of Deephaven that is being installed, you may also need to add values entries to configure the new features.

Ensure that the values.yaml includes a definition for the standard storage class. This became a requirement in build 1.20230511.248. For example:

global:
  storageClass: 'standard-rwo'
  1. If upgrading from a build prior to 1.20230511.248, delete the management-shell pod, since it will need to be recreated by the upgrade process: kubectl delete pod management-shell --grace-period 1 Refer to the Version Log for more details on this change.

  2. From the helm directory, upgrade the Helm chart; e.g.:

helm upgrade deephaven-helm-release-name ./deephaven/ -f my-values.yaml --debug

Amazon Load Balancers

The default Amazon EKS setup uses the in-tree Kubernetes load balancer controller, which provisions Classic Load Balancers. The default classic load balancer settings terminate connections after 60 seconds of inactivity. This results in Deephaven workers being killed when their controlling connection is closed. It is possible to manually configure the timeout, but Deephaven recommends installing the Installing the AWS Load Balancer Controller add-on, which uses a Network Load Balancer. Additionally, the AWS Load Balancer Controller supports annotations for configuring the service. The complete set of annotations that are suitable for your network is beyond the scope of this document (e.g., subnet and IP allocation), but the following annotations (specified in your Deephaven values.yaml file) instruct the controller to create a suitable Network Load Balancer:

envoy:
  serviceAnnotations:
    service.beta.kubernetes.io/aws-load-balancer-type: 'nlb'
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: 'instance'

Manually configuring a Classic Load Balancer Timeout

When using a classic load balancer, a manual work-around is to identify the AWS load balancer that the Kubernetes system allocated and increase the connection timeout using the AWS command line tool.

To identify the load balancer, first run kubectl to find the external name of the load balancer.

$ kubectl get svc envoy
NAME    TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)                         AGE
envoy   LoadBalancer   172.20.209.132   a89229d6c7c3a43fbba5728fb8216c64-882093713.us-east-1.elb.amazonaws.com   8000:31111/TCP,8001:30504/TCP   7d21h

In this example, the load balancer is identified by a89229d6c7c3a43fbba5728fb8216c64. The load balancer attributes can be queried with:

$ aws elb describe-load-balancer-attributes --load-balancer-name a89229d6c7c3a43fbba5728fb8216c64
{
    "LoadBalancerAttributes": {
        "CrossZoneLoadBalancing": {
            "Enabled": false
        },
        "AccessLog": {
            "Enabled": false
        },
        "ConnectionDraining": {
            "Enabled": false,
            "Timeout": 300
        },
        "ConnectionSettings": {
            "IdleTimeout": 60
        },
        "AdditionalAttributes": [
            {
                "Key": "elb.http.desyncmitigationmode",
                "Value": "defensive"
            }
        ]
    }
}

To adjust the connection idle setting to 900 seconds, next run:

$ aws elb modify-load-balancer-attributes --load-balancer-name a89229d6c7c3a43fbba5728fb8216c64  --load-balancer-attributes="{\"ConnectionSettings\":{\"IdleTimeout\":900}}"
{
    "LoadBalancerName": "a89229d6c7c3a43fbba5728fb8216c64",
    "LoadBalancerAttributes": {
        "ConnectionSettings": {
            "IdleTimeout": 900
        }
    }
}