Operate Deephaven 24x6

All Deephaven services are designed to run as long-lived processes. However, to ensure system health, data accuracy, and to apply updates, regular maintenance is essential. It is best practice to establish at least a weekly maintenance window. During this window, Deephaven can be brought to a quiescent state for restarts, general maintenance, and software upgrades if necessary. Key activities for this window include schema deployment, logger generation, and cleaning process log files (described in the Deephaven Operations Guide) while services are stopped.

Beyond this weekly cycle, most Persistent Queries must be restarted daily to initialize using newly available historical and intraday data, especially as data tables are commonly partitioned by date. The appropriate time to restart a query depends on the region it serves and the schedules of the data imports it relies on. Deephaven Data Labs also recommends restarting all Deephaven processes (with the exception of the Persistent Query Controller and Remote Query Dispatchers) during the weekly maintenance window to prevent issues from undetected environmental changes. A full restart of all processes is required when upgrading the Deephaven system (e.g., installing a new Deephaven RPM). Additionally, specific services may require restarts when database tables are added or modified, or for other configuration changes outside of the regular maintenance.

When a Deephaven installation serves only a single region, it is typical for the system to reach a quiescent state overnight, facilitating these maintenance tasks. However, to support continuous data ingestion and analysis across multiple regions, careful consideration must be given to appropriately schedule configuration changes, query restarts, and the overall maintenance window.

This guide provides a high-level overview of operating Deephaven in a 24x6 environment. For detailed instructions, see the following articles:

Maintenance window

The weekly maintenance window should be scheduled when the system can reach a quiescent state with minimal impact on operations. For single-region installations, this typically occurs overnight. For multi-region deployments, coordinate the timing to minimize disruption across all served regions.