Startup / Shutdown of Deephaven processes

All Deephaven processes in non-Kubernetes deployments are started and stopped with Monit. Monit is a utility for managing and monitoring processes, programs, files, directories and filesystems on a Unix system. (Refer to https://mmonit.com/monit for more information.)

dh_monit

Deephaven provides the dh_monit wrapper utility to allow Deephaven administrators to interact with Monit without needing root privileges. Besides wrapping monit functionality, dh_monit also adds some Deephaven-specific options:

  • -up - ensures Monit is running and refreshed and starts all Deephaven services.
  • -down - ensures Monit is running and refreshed and stops all Deephaven services.
  • --b/--block - waits until requested start/stop/restart/up/down operations have reported as complete before exiting.

dh_monit also introduces sequencing of start, stop, or restart operations when called with the all option.

For example, monit start all on an infrastructure node will start with whichever service is first in the list and send start commands to all of them. This is not that bad, because the root dependency configuration_server is listed first, and the next most common dependency authentication_server is listed second. However, because the services take some time to start, and monit fires off all the starts quite quickly, some services will fail to start because a dependency they need is not yet ready. This results in the overall start all taking longer than is desirable as retries are run on the services that initially failed. dh_monit, on the other hand, starts the configuration_server and waits until it is running before starting the authentication_server and again waiting; then it starts the remainder of the services.

Monit service

The Monit service itself can be checked with the following command:

sudo service status monit

If Monit is not running, it can be started with the following command:

sudo systemctl start monit

To ensure Monit starts up whenever the system restarts, use the following command:

sudo systemctl enable monit

All Monit configuration files for the Deephaven processes are located in:

/etc/sysconfig/illumon.d/monit

Deephaven services

If any Deephaven process terminates unexpectedly, Monit will restart the process automatically.

You can check which processes are running with the following Monit command:

/usr/illumon/latest/bin/dh_monit summary

To monitor the state of services, run:

watch /usr/illumon/latest/bin/dh_monit summary

Check the status of all processes with the following Monit command:

sudo monit status

Check the status of individual processes with the following Monit command:

/usr/illumon/latest/bin/dh_monit status <process name>

For example:

/usr/illumon/latest/bin/dh_monit status iris_controller

Starting and stopping Deephaven services

If a configuration file has been updated, the associated Deephaven processes will typically need to be restarted for the changes to take effect. One exception to this is the Deephaven Controller Process, which allows various properties to be edited without needing a restart.

When a configuration file has been updated that requires a restart of the associated Deephaven processes, use the following commands.

To stop all the configured Deephaven processes:

/usr/illumon/latest/bin/dh_monit stop all

Alternatively, stop and start individual Deephaven processes:

/usr/illumon/latest/bin/dh_monit stop <process name>
/usr/illumon/latest/bin/dh_monit start <process name>

For example:

/usr/illumon/latest/bin/dh_monit stop authentication_server
/usr/illumon/latest/bin/dh_monit start authentication_server

Stale Process ID Files

Monit checks for processes using the IDs referenced in /etc/deephaven/run/*.pid. After a machine reboot, these files may be stale, meaning they contain the ID of a running process that is not the managed Deephaven process. To prevent stale PID files, you can remove them on reboot. This can be accomplished with a crontab file.

To add a crontab file as root, run:

sudo crontab -e`

The following entry runs rm -f /etc/deephaven/run/*.pid on each reboot:

@reboot rm -f /etc/deephaven/run/*.pid

You can verify the contents of the crontab file with sudo crontab -l.