Troubleshooting a Kubernetes installation
Access logs
Run kubectl logs -c <process> <pod> to get the logs for a given process. To avoid needing to copy-paste the pod name, you can use kubectl to get the pod name via label, and pass that into the logs command:
Shell access
If you need to examine the installation, you can use a management-shell pod. All the volumes are mounted read-write, so you can also update files as necessary.
Leftover worker pods
If you have leftover worker pods, they may hold onto PVCs (PersistentVolumeClaims), preventing a new installation.
For example:
This can be corrected by deleting those pods:
If there are other leftover resources after uninstalling the release, they should also be removed via kubectl, as in the following example.
Caution
Note that removing the PVCs for intraday data will delete any unmerged intraday data.
Some files in the NFS data directory, which includes caches and generated TLS/etcd keys, should be removed as well.
If the initial data was extracted to /exports/dhsystem on the NFS server, then the
appropriate command to clean up the outdated configuration and caches (without deleting user or system data) would be:
Restart a pod
Most pods can be restarted by scaling their deployment down to 0 and then back to 1 pod with kubectl scale deployment <deployment-name> --replicas=0 followed by kubectl scale deployment <deployment-name> --replicas=1.
Debug template syntax errors
If you have a syntax error, it is often not clear where it is coming from. To debug, first run the templating engine from helm:
If the YAML fails to render, you will get a message like the following:
Use the --debug flag to render out invalid YAML to file:
Running that file through yamllint may point out your error:
If you have two separate YAML blocks, only the first one will print when the second one has errors, even with the --debug and -s options. You can comment out earlier blocks to get the block of interest to render.
You may also find the --validate flag useful, as it will check your YAML against the Kubernetes object definitions and the state of the cluster.
Debug a pre-install hook
The pre-install hook logs can be retrieved with kubectl logs. You may want to look at log files from an individual run or experiment with running different scripts. To do this, you may introduce a sleep after a failed pre-install script by including a debug.preInstall entry in your values.yaml as follows:
Debug using a remote debugger
You can enable debugging of a remote process by following these steps:
- Configure your
my-values.yamlfile with port and debug information. - Upgrade the Deephaven Helm installation.
- Scale deployment(s) to be debugged down to 0 pods, then back to 1 to ensure that they have the required settings.
- Forward port from your host to the pod.
Add configurations to your my-values.yaml file for the process you want to debug. See
Creating a Service
for more information on Kubernetes services and port settings in general.
Note
Refer to the process section of the values.yaml file in the Deephaven chart to see what other processes may be
configured for debugging in addition to controller.
Once the above settings have been configured, you must upgrade the helm chart. This is similar to what was done
in the chart installation. Run the following in the root directory where the Deephaven chart directory is:
helm upgrade deephaven-helm-release-name ./deephaven/ -f my-values.yaml --set image.tag=<deephaven-release-version> --debug
The easiest way to restart the component is to run kubectl scale deployment/<deployment-name> --replicas=0 followed by
kubectl scale deployment/<deployment_name> --replicas=1.
To see the names of all deployments, you may run kubectl get deployments.
Next, you should enable port forwarding from your local machine to the deployment with a command like the following that forwards local port 5005 to port 5005 on the controller pod:
After your pod has restarted, you can run a remote debugger as normal.
Debug the Swing console
To debug the Swing console, add similar JVM arguments to your getdown.global file, then reload the CUS.
You can then debug your locally launched IrisConsole process.