Process restart guide
Properly restarting Deephaven processes is crucial for applying updates, performing maintenance, and ensuring system stability as part of a comprehensive resilience strategy. This guide provides recommendations for restarting various Deephaven components.
Deephaven Data Labs recommends restarting all Deephaven processes, with the exception of the Persistent Query Controller and Remote Query Dispatchers (query server and merge server), during the weekly maintenance window to prevent undetected situations where the system environment has changed in a way that prevents a process from properly starting. When upgrading the Deephaven system (i.e. installing a new Deephaven version), Deephaven Data Labs recommends restarting all Deephaven processes.
Customers may additionally choose to restart the Persistent Query Controller and Remote Query Dispatchers. However, any running Legacy workers will be terminated when the controller is restarted, and any running Legacy or Core+ workers will be terminated when the dispatcher is restarted.
- The Persistent Query Controller may need restarting to pick up certain configuration changes. If a controller is restarted, all Legacy workers that were started by this controller will be stopped, and those within their schedules will be restarted when the controller is running again. Several configuration parameters can be changed without requiring a controller restart (through the use of the PersistentQueryControllerTool), including:
- The list of database servers
- Default scheduling parameters
- Temporary queues
- The Remote Query Dispatchers (also known as the "query server" and "merge server") may need restarting to pick up changed parameter values, such as maximum concurrent queries or maximum total allowed heap. If a dispatcher is restarted, all queries started by that dispatcher will be stopped, and will not be automatically restarted until their next scheduled start time is reached.
The following processes should be restarted during the maintenance window. Note that many existing installations already restart the Data Import Server and Remote User Table Server every night, and if a weekly maintenance window is to be used instead, these installations will need their cron entries adjusted accordingly.
Authentication Servers
When multiple authentication servers are configured, they should be restarted in a staggered fashion. The first authentication server should be stopped and started, and then after a delay of at least 5 minutes, subsequent authentication servers may be restarted. This ensures one authentication server is available at all times.
ACL Write Server
The ACL Write Server should be restarted weekly to ensure cached resources are released.
Data Import Server (DIS)
The Data Import Server maintains a cache of recently used resources, including operating system file handles. Restarting the DIS ensures those resources are released, and standard operating system cleanup (such as the removal of files marked for deletion) may proceed.
Additionally, if an existing listener class is changed, the DIS will not be able to access it until it is restarted.
Log Aggregator Service
The Log Aggregator Service should be restarted weekly to ensure cached resources are released.
Local Table Data Server (LTDS)
As with the DIS, the Local Table Data Server should be restarted weekly to ensure cached resources are released.
Web API Service
The Web API Service must be restarted after upgrades to the Deephaven system, or changes to customer provided modules or site-specific configuration files. Additionally, changes to certain client configuration files require a Web API Service restart.
Tailer
The Tailer should be restarted to find updates to its configuration files (both properties and its XML configuration). In addition, changes to these configuration files that should take effect immediately will require a tailer restart. A tailer restart during business hours will cause a brief pause in the ingestion of real-time data.