Kubernetes installation with Helm
To install Deephaven on Kubernetes, we use a Helm chart. Deephaven has tested this Helm chart on GKE, AKS and EKS (see Amazon Load Balancers). The Deephaven Helm installation has three prerequisites:
- An etcd cluster (which can be installed using a bitnami helm chart).
- An NFS server for various shared Deephaven volumes.
- A TLS certificate for the Envoy front proxy that serves web traffic.
And one optional prerequisite:
- Install cert-manager in the cluster to handle issuing certificates to allow the Deephaven services to communicate using TLS.
Each Deephaven cluster should be in its own Kubernetes namespace. The etcd installation must be in that same namespace so that we can read the root passphrase from the secret. The NFS server need not be in the same namespace, or even inside of the Kubernetes cluster, but it needs to be accessible from all the pods and have a defined set of exports.
Although this chart depends on NFS, it can be adapted to any persistent volume that provides an accessMode of ReadWriteMany.
Note
We have chosen not to integrate the etcd installation with the installation of Deephaven at this time. By decoupling the charts, Deephaven can be installed and uninstalled while retaining the configuration.
General directory layout
When you extract the Helm archive provided to you (e.g., tar xzf ./deephaven-helm-1.20231218.432.tar.gz), the root extracted directory
will have the following:
./dockercontains subdirectories for docker container images, each with adockerfile, and container image support scripts../helmcontains the Helm chart and related supporting files../dh_helma wrapper script that automates the steps required to install or uninstall a Deephaven cluster in Kubernetes (see this section)../README.mdbasic identification of these files and information about where to find this, detailed, documentation.
Within the helm directory there are:
-
deephavencontains the Helm chart. -
deephaven/templatescontains subdirectories with the Helm chart templates that define the Kubernetes objects (pods, services, etc) for the Deephaven installation. -
deephaven/values.yamlcontains the default values for the Helm chart. -
setupToolscontains useful scripts and YAML for manipulating the system outside of the chart. See the table below.File Description nfs-server.yamlCreates an NFS server for use with your cluster; you can adjust volume sizes as appropriate. nfs-service.yamlCreates a service for the NFS server; you will need to use the name in your cluster's YAML. etcdValues.yamlSuitable Helm values file for an etcd installation. scaleAll.shScale deployments up and down. The argument is the number of replicas (0 to shutdown, 1 to start). restartAll.shRestarts all deployments by scaling them down to 0, then up to 1. delete-preinstall.shDeletes preinstall hook resources (hooks are not automatically deleted when a release is uninstalled). ca-bootstrap-issuer.yamlUsed only if cert-manager is installed in the cluster. Creates a ClusterIssuer and a self-signed root CA certificate, and an Issuer in the namespace that will issue new certificates with the root CA in the certificate chain. This file may be copied and used as a template for defining an issuer that is appropriate for your cluster.
Set up your own cluster using the dh_helm tool
The dh_helm tool automates the process of installing, upgrading, and also uninstalling or reinstalling a Deephaven Kubernetes cluster. This utility allows Deephaven Kubernetes installs to be a one-line command rather than a series of manual steps. By default, for installations and upgrades, the tool will check whether the needed product and Core+ files are already in the needed paths, or, if they are not, will check for them in the current directory and copy them to the needed locations.
Typical installation use is to copy the Deephaven product file (deephaven-enterprise-<jdk version>-<version>.tar.gz) and the Deephaven Core+ file (deephaven-coreplus-<version>.tar) to the directory where dh_helm is located, and run dh_helm from there.
- For installations, the tool manages such steps as deploying etcd, deploying and configuring an NFS pod, creating the TLS secret, and other steps beyond the
helm installstep itself. Helm installation of Deephaven requires Docker images for the various container types used in the cluster. ThebuildAllForK8s.shandpushAll.shscripts, described here, can be used to do this manually, or--build-push-imagescan be passed todh_helmfor the script to do it automatically.
Note
Docker or Docker Desktop is needed to build images. Additional components may be needed to build images on a Mac using Apple Silicon.
-
For uninstallation, the tool offers options to competely remove all DH related artifacts (PVs, secrets, certificates, etc.) as a single step. (Note that this does not remove docker container images from the local or remote registries.)
-
Reinstallation is a complete uninstall followed by a fresh installation as a single command execution. This insures the new installation is totally fresh, with no reused data or configuration.
-
For upgrades, the tool can automate steps such as running upgrade scripts or deleting the management-shell pod so it can be recreated with new standard settings.
If the --values-yaml argument is provided, then the specified values file is passed to Helm. If --values-yaml is not specified, then dh_helm automatically creates the values file using helm/setupTools/my-values.tmpl as the basis for the generated values file. In this case, if there are other customizations to be added that dh_helm does not support, these changes should be made in the my-values.tmpl before running dh_helm. An error with be thrown if --values-yaml is used along with explicit arguments for values it can contain (--image-tag, --nfs-server, --pv-prefix, --etcd-release-name, --cluster-url, --container-registry, --storage-class, and whether to use the cert manager.)
The dh-helm tool has a fairly large set of argument options. These are also detailed by running dh_helm with no arguments, or with --help as an argument.
Minimum required arguments (for uninstall) are:
| Argument | Description |
|---|---|
--namespace | Kubernetes namespace to install into / uninstall from. |
--name | Release name for the Helm installation of Deephaven (existing names can be found with helm list). |
--etcd-release-name | Release name for the Helm installation of etcd (existing names can be found with helm list). |
Installation, reinstallation, or upgrade also require:
| Argument | Description | Notes |
|---|---|---|
--dh-version | Deephaven product version such as 1.20231218.160. | |
--jdk-version | Java JDK version to use. | Must match Java version of Deephaven product and Core+ packages, and be one jdk11 or jdk17 (case sensitive). |
--coreplus-version | Core+ tar file version such as 0.28.1. | |
--container-registry | The registry in which to find images, and where built images will be pushed. | |
--tls-cert | The PEM format X509 certificate to use for the Envoy endpoint of the cluster. | Note that, for private PKI certificates, this should include the full chain. |
--tls-key | The key file that corresponds to --tls-cert. | Note that there must be no password on this file. |
Installation values yaml properties can be provided either with arguments here, or by providing a custom values yaml file:
| Argument | Description | Notes |
|---|---|---|
--values-yaml | Path and name of customized yaml values files. | This allows more flexibility than dh_install discrete arguments. |
--cluster-url | FQDN by which the cluster should be reachable (no https://, etc, just the full name) | Note that this FQDN, or a wildcard matching it, must be in the SAN block of the certificate. Use this or --values-yaml. |
--pv-prefix | Prefix for names of persistent volumes created by the Helm install. | One of:
|
| Optional arguments | Description |
|---|---|
--etcd-release-name | Release name for the Helm installation of etcd. This is optional for installations where the default name (my-etcd-release) will be used and an alternative is not provided, but it is required (either here, or from --values-yaml) for uninstallation. |
--dry-run | Echoes commands that would have been run without actually running anything. |
--verbose | Does not suppress output from commands run; no effect when --dry-run is specified. Only one of --quiet or --verbose can be used. |
--quiet | Supresses all messages not generated in dh_helm itself - e.g. no errors or warnings from called commands; no effect when --dry-run is specified. Only one of --quiet or --verbose can be used. |
--image-tag | Tag to apply to container images when building and pushing them, or to use for reading them if images already have been built. If not provided, then the value of --dh-version will be used. |
--extra-requirements | A requirements.txt file of extra Python pacakges to install when bulding worker images. |
--build-push-images | By default, the script does not build and push needed container images, and will instead attempt to check for them already existing in the container registry. This flag has the script build and push the images, which is a quick verification when the images already exist and caching is enabled (which it is by default). |
--nocache | Disable Docker caching when building images. By default, caching is enabled. |
--skip-image-check | Skips the container registry image checks that normally occur for install and reinstall when --build-push-images is not specified. |
--create-namespace | Normally, the script checks for the specified namespace, and fails if it doesn't exist. With this option, the script attempts to create the namespace if it doesn't exist. |
--remove | When used for an installation or reinstallation, removes all objects, including etcd Helm release, PVs, and PVCs. Required when running --uninstall. |
--force | When used for an installation or reinstallation, bypasses confirmation of uninstallation and, with --remove, deletion of PVs and PVCs. |
--delete-management-shell | When used with --upgrade, deletes the management shell pod so it can be created with possibly changed properties from the new chart. This is not needed with versions after 1.20231218.053, as the pod has been replaced with a deployment. |
--delete-nfs-server | When used with --remove, deletes the nfs server deployment. |
--storage-class | Storage class name for local RWO. If deploying in a non-GKE environment set this to value appropriate for your cluster provider; e.g. 'gp2' for EKS. Default, which is suitable for GKE, is 'standard-rwo'. |
--no-cert-manager | Configures to install a cluster that does not use the Kubernetes cluster issuer for TLS between cluster services. |
No more than one of the following can be specified:
| Argument | Description |
|---|---|
--install | The default operation, but can be explicitly stated. |
--uninstall | By default will helm delete the Deephaven release; with --remove it will additionally uninstall etcd, NFS (if it's a pod), and delete all PVs, PVCs, and jobs. |
--reinstall | Effectively runs --uninstall, and then installs the specified version. Requires --remove, as reinstall cannot reuse existing configuration. |
--upgrade | May run upgrade scripts, if needed, and optionally delete the management pod. Passes through to helm upgrade for the cluster, which maintains existing data and configuration. |
Example dh_helm command lines
Installation with creation of pod-based NFS server and build and push of needed images:
Full uninstall with no confirmation prompts:
Warning
This removes all Deephaven Kubernetes cluster components from the test2 namespace. No configuration or data is retained.
An upgrade that also removes the management shell Pod as part of the upgrade process, so it can be recreated with new properties. Some patch versions of Deephaven require that the management shell be deleted prior to the upgrade, because some immutable properties of the Pod have been changed. Versions 1.20231218.053 and later use a management shell Deployment instead of a Pod. For these more recent versions the --delete-management-shell argument is no longer needed.
Set up your own cluster - manual process
-
Install
kubectlandhelm. The machine that you will be installing from must have both of these utilities installed.- Verify that
kubectlis installed (and that you have connectivity to your cluster) withkubectl get ns. - Verify that Helm is installed with
helm list.
- Verify that
-
Create a namespace.
- You must create a namespace for your Deephaven installation using
kubectl create namespace <your-namespace>. - Set your
kubectl configto use that namespace withkubectl config set-context --current --namespace=<your-namespace>.
- You must create a namespace for your Deephaven installation using
-
Build your containers. This should be done on a host with an architecture that matches the architecture of the host on which the container should be run. Building a Docker container on one platform architecture that targets another architecture is possible but outside the scope of these instructions. The
dockerdirectory contains the Dockerfiles, and a script to build and push them.-
Place a Deephaven installation .tar.gz file into the
deephaven_basesubdirectory. The Deephaven base image will be built using this version of the software. -
Place a Core+ worker tar into the
db_query_worker_coreplussubdirectory. This will normally be the Core+ tar file that matches the Deephaven version used for the above installation tar.gz. -
Run
./buildAllForK8s.shto build the images. The script arguments are described here:Argument Description --versionThe Deephaven version (e.g., 1.20231218.432) used to select tar file.--jdk11|jdk17Specifies which JDK version should be installed. --container-pathPath to Dockerfile directories. Defaults to current working directory. --no-cacheDisable caching. Defaults to caching. --coreplus-tarFull name of a specific Core+ tar.gz file used to build a Core+ worker image. -
Run
./pushAll.sh <REPOSITORY> <TAG>to push the images to your container registry. The container registry must be accessible from the Kubernetes cluster.Argument Description REPOSITORYThe repository to push the images to. If a .reporootfile exists in the same directory as the script, the repository used is$REPOROOT/REPOSITORY.TAGThe tag of the pushed images (e.g. latestor1.20231218.432).For AKS, you will have to explicitly give your cluster access to your container registry. To do this, run
az aks update -n <cluster-name> -g <resource-group> --attach-acr <container-registry-name>. More details can be found in the AKS documentation.
-
-
Create your NFS server.
- Run the following commands to set up a nfs-server deployment and service. You may want to edit these files to
rename the deployment, service and persistent volume claim names, and most importantly the storage type, which will
be the default for your platform. If you have not edited them you will get the default configuration, which is a
service called
deephaven-nfswith a fully qualified domain name ofdeephaven-nfs.<namespace>.svc.cluster.local. This FQDN, or possibly the actual IP address, will be needed later when setting up your cluster'smy-values.yaml(see the step below). Run these commands to set up the new NFS deployment:
An existing NFS server can be used if you have one. If you want to use an existing NFS server, it will need some directories exported. See
setupTools/setup-nfs-minimal.shfor what is required. - Run the following commands to set up a nfs-server deployment and service. You may want to edit these files to
rename the deployment, service and persistent volume claim names, and most importantly the storage type, which will
be the default for your platform. If you have not edited them you will get the default configuration, which is a
service called
-
Set up your NFS server.
-
Run
kubectl get podsto get the name of your NFS server Pod and confirm that it is running. -
Copy the setup script to the NFS pod by running this command, using your specific NFS pod name:
-
Run this command to execute that script, once again substituting the name of your NFS Pod:
-
-
Install the bitnami etcd chart.
-
The following command installs the etcd Helm chart with a Helm release name you must choose (e.g.,
etcd-deephaven). To customize the etcd installation, copy and updatesetupTools/etcdValues.yamlto suit your particular server. -
If you uninstall etcd, you must remove the persistent volumes and persistent volume claims before reinstalling. Alternatively, you can use a different etcd release name and update your
my-values.yamlfor Deephaven accordingly. -
It may take a minute or two for etcd to become ready, particularly if you have replicas that need to synchronize.
You should wait until all replicas in the stateful set report
1/1in theREADYcolumn before proceeding with the Deephaven installation. You can also verify the etcd installation using the instructions from the Helm notes for that release.
-
-
Install cert-manager (optional).
If you wish to run Deephaven services using TLS within the cluster, then you will need to install cert-manager. To see if cert-manager is installed on your cluster already, run
kubectl get clusterissuer. If you see a message sayingerror: the server doesn't have a resource type "clusterissuer", then it is not installed.There are several ways to install cert-manager, and full instructions are provided at the cert-manager installation page. The most straightforward way is to use the default static install listed there. They also provide a helm chart that may be used to install cert-manager.
The
setupTools/ca-bootstrap-issuer.yamlfile will create a ClusterIssuer for the entire Kubernetes cluster that creates a self-signed root CA certificate, and an Issuer in your target Kubernetes namespace that will issue certificates that have the root CA in the certificate chain. You may create a new yaml file defining an Issuer configuration that is not self-signed if there is infrastructure to support it in your organization. For example, you may define an issuer that is configured to use HashiCorp Vault, or an external provider. Details for these configurations may be found in the cert-manager issuer configuration docs.If a ClusterIssuer was already present in your cluster, you can copy the second and third sections from
ca-bootstrap-issuer.yaml(Certificate and Issuer definitions) to a new file, and update them with the names of your ClusterIssuer and namespace. Apply the new file usingkubectl apply -f.To create the default self-signed cluster issuer, first edit
setupTools/ca-bootstrap-issuer.yamland replace occurrences of<your-namespace>with your target Kubernetes namespace, then run the following command:If not using cert-manager, several self-signed certificates (without a common root CA) for Deephaven services will be generated and kept in a keystore for use by the system.
Note
You must set
certmgr.enabledtotruein yourmy-values.yamlfile for the cert-manager installation to be used. -
Create a TLS secret for the Kubernetes cluster. The secret must be named
deephaven-tlsand must be in the same namespace as your Deephaven installation. You must provide thetls.crtandtls.keyfiles for the Web server certificate that meets the requirements specified in the Install and Upgrade Guide. -
Create your
my-values.yamlvalues override file.- This file will contain values that override the defaults defined in the Deephaven chart's
values.yamlfile. This is a new file referred to in this document asmy-values.yaml, but the name is not significant and you may name it anything you like. Note that no changes should be made to the chart'svalues.yamlfile that already exists in thedeephavendirectory. This new file is used by helm to set properties required for, and specific to, your installation. The first five of these are required. Additionally, there are some default settings that are only applicable in a Google GKE environment and may need to be overridden if deploying in another provider:
- This file will contain values that override the defaults defined in the Deephaven chart's
| Value | Comment | Description |
|---|---|---|
| nfs.pvPrefix | Required | Prefix for persistent volume names stored on the NFS server. This disambiguates releases as PVs are global and not namespaced. |
| nfs.server | Required | Hostname address of your NFS server. Note that IP address may be required in EKS and AKS, to find it run kubectl get svc nfs-server. |
| etcd.release | Required | This will be the name you choose for the etcd release with '-etcd' appended to it. Run kubectl get secret my-etcd-release and confirm there is a secret with this name. |
| envoyFrontProxyUrl | Required | User facing URL of the envoy front proxy. |
| image.tag | Required | This would typically be redefined to the specific Deephaven version you are installing, or perhaps 'latest'. |
| global.storageClass | Recommended | Use this value as the default storage class, e.g., standard-rwo for GKE, gp2 for EKS, or managed-csi for AKS. |
| dis.intradaySize | Recommended | Defaults to 10G. Set to appropriate value your target environment, or consult with Deephaven to evaluate your use case and make a recommendation. |
| dis.intradayUserSize | Recommended | Defaults to 10G. Set to appropriate value your target environment, or consult with Deephaven to evaluate your use case and make a recommendation. |
| dis.storageClass | Not required | May be set to a specific storage class if desired, otherwise will be set to global.storageClass. |
| binlogs.storageClass | Not required | May be set to a specific storage class if desired, otherwise will be set to global.storageClass. |
| management.storageClass | Not required | May be set to a specific storage class if desired, otherwise will be set to global.storageClass. |
| envoy.externalDns | Not Required | If your cluster is configured your cluster with an ExternalDns provider, set to true to create a DNS record that points the envoyFrontProxyUrl to the envoy Kubernetes service. |
| envoy.enableAdminPort | Not Required | If true, enables the envoy admin interface on port 8001. |
| image.repositoryUrl | Not Required | The image repository url and path where the Deephaven container images are hosted. This does have a default value but would typically be redefined to something else. |
| certmgr.enabled | Required with cert-manager | If using cert-manager, this must be set to true. Otherwise this may be omitted as it defaults to false. |
An example my-values.yaml is as follows:
-
Install the Deephaven helm chart. You are now ready to install.
- Use the following
helm installcommand, substituting the names of your release andmy-values.yamlif they are different.
In this example the Helm release name for Deephaven is
deephaven-helm-release-name, but you may select a release name of your own. The chart installation executes the preinstall hook, which configures etcd, routing, properties, initializes ACLs and more.--debugwill show progress as Helm configures the installation. - Use the following
-
Wait for all Pods to come online. After installing the chart, it will take a moment for all Pods to start up and initialize.
- Use
kubectl get pods -wto watch the status of the Pods as they come up. There are dependecies of Deephaven services, and you will see Pods appear in anInitstate until that pod's dependent service is available. The configuration service will start first, then the auth service, then the remaining ones. The final Pods you will see are the query-server workers and a merge-server worker for the built-in queries.
- Use
-
Create a password for user
iris.- Set a password for the iris user from the management shell Pod.
-
Set or update a DNS record.
If your cluster is already configured with an external DNS provider, then the external DNS controller will provide a correctly configured hostname. If you have not configured external DNS, then the following commands will be helpful to find the IP address needed to create a DNS entry.
Links to setup information for common providers is provided here for reference. [ GKE | EKS | AKS ]
-
Log in and start using Deephaven.
You should now be able to navigate to the host defined as the envoyFrontProxyUrl in your values override file; e.g:
https://deephaven.kubernetes.internal.company.com:8000/iriside/
Upgrading a Kubernetes cluster with Helm
The upgrade process is similar to the installation process, in that you will create an installation directory structure based on the Helm tar.gz, and you will build images and configure the chart with a values.yaml file. The main differences are that the prerequisites will already be in place, and helm upgrade is used instead of helm install. Note that the dh_helm tool can also be used to upgrade a Deephaven Kubernetes cluster.
-
In a new directory for the new version, extract the new version's Helm archive; e.g.:
tar xzf ./deephaven-helm-1.20231218.432.tar.gz. -
Copy the Deephaven product archive into the
deephaven_basecontainer directory; e.g.:cp ~/Downloads/deephaven-enterprise-1.20231218.432.tar.gz deephaven-helm-1.20231218.432/docker/deephaven_base/ -
Copy the Deephaven Core+ product archive into the
db_query_worker_corepluscontainer directory; e.g.:cp ~/Downloads/deephaven-coreplus-0.33.6-1.20231218.432.tgz deephaven-helm-1.20231218.432/docker/db_query_worker_coreplus/ -
Copy the previously used
values.yamlfile to thehelmdirectory; e.g.:cp ~/Downloads/values.yaml deephaven-helm-1.20231218.432/helm/ -
From the
dockerdirectory, build and push the new container images:
- Run
./buildAllForK8s.shto build the images.
| Argument | Description |
|---|---|
--version | The Deephaven version (e.g., 1.20231218.432) used to select the tar file. |
--jdk11|jdk17 | Specifies which JDK version should be installed. |
--container-path | Path to Dockerfile directories. Defaults to current working directory. |
--[no-]cache | Disable or enable Docker caching. Defaults to caching. |
--coreplus-tar | Full name of a specific Core+ tar.gz file used to build a Core+ worker image. |
- Run
./pushAll.shto push the images to your container registry. The container registry must be accessible from the Kubernetes cluster.
| Argument | Description |
|---|---|
REPOSITORY | The repository to push the images to. If a .reporoot file exists in the same directory as the script, the repository used is $REPOROOT/REPOSITORY. |
TAG | The tag of the pushed images (e.g. latest or 1.20231218.432). |
For AKS, you will have to explicitly give your cluster access to your container registry.
To do this, run az aks update -n <cluster-name> -g <resource-group> --attach-acr <container-registry-name>. More details can be found in the AKS documentation.
Warning
Using latest for the image tag, or not updating the image tag, will result in pods not restarting automatically after an upgrade and continuing to run with the older versions of the images, because the system uses the image tag value to detect whether it is already running the correct versions of images.
- Update the
values.yamlfile in thehelmdirectory. At the least, you will likely need to update the tag value for the newly built container images (unless they are the new "latest"). If new functionality was added in the new release of Deephaven that is being installed, you may also need to add values entries to configure the new features.
Ensure that the values.yaml includes a definition for the standard storage class. This became a requirement in build 1.20230511.248. For example:
-
If upgrading from a build prior to 1.20230511.248, delete the management-shell pod, since it will need to be recreated by the upgrade process:
kubectl delete pod management-shell --grace-period 1Refer to the Version Log for more details on this change. -
From the helm directory, upgrade the Helm chart; e.g.:
Amazon Load Balancers
The default Amazon EKS setup uses the in-tree Kubernetes load balancer controller, which provisions Classic Load Balancers. The default classic load balancer settings terminate connections after 60 seconds of inactivity. This results in Deephaven workers being killed when their controlling connection is closed. It is possible to manually configure the timeout, but Deephaven recommends installing the Installing the AWS Load Balancer Controller add-on, which uses a Network Load Balancer. Additionally, the AWS Load Balancer Controller supports annotations for configuring the service. The complete set of annotations that are suitable for your network is beyond the scope of this document (e.g., subnet and IP allocation), but the following annotations (specified in your Deephaven values.yaml file) instruct the controller to create a suitable Network Load Balancer:
Manually configuring a Classic Load Balancer Timeout
When using a classic load balancer, a manual work-around is to identify the AWS load balancer that the Kubernetes system allocated and increase the connection timeout using the AWS command line tool.
To identify the load balancer, first run kubectl to find the external name of the load balancer.
In this example, the load balancer is identified by a89229d6c7c3a43fbba5728fb8216c64. The load balancer attributes can be queried with:
To adjust the connection idle setting to 900 seconds, next run: