Controller and dispatcher overview

The Persistent Query Controller and Remote Query Dispatcher processes work together to manage Deephaven workloads.

Workers

A worker is a Deephaven Java process that performs tasks for a Persistent Query or Code Studio (also known as a console). When a Persistent Query starts, one or more workers are created based on the number of configured replicas and spares. A Code Studio always creates one worker.

Two Deephaven processes manage workers.

  • The Persistent Query Controller (often simply called the controller) maintains an overall view of Persistent Queries (and their workers) across the entire Deephaven installation.
  • Each Remote Query Dispatcher (dispatcher) creates the workers as directed by the controller or other clients.

Persistent Query Controller

The Persistent Query Controller manages every dispatcher, directing the creation and deletion of worker processes. Some key controller functions include:

  • Tracking all Persistent Queries, storing their details in etcd.
  • Starting and stopping Persistent Query workers based on their defined schedules and user requests.
  • Determining where workers should be started using automated server selection.
  • Remote processing profiles specify Java parameters such as garbage collection that are defined by an administrator.
  • Keeping clients up-to-date with information about the Persistent Queries they are authorized to see. The Query Monitor page in the web interface is actually a view into the controller's state, showing details of every Persistent Query they are allowed to see, and the associated workers if the query is running.

Remote Query Dispatcher

A Remote Query Dispatcher creates Deephaven workers based on requests from the controller or other clients. A Deephaven installation includes at least two dispatchers and may include many more. On a bare-metal installation, the dispatcher can only create workers on the same host via standard Unix process control. On a Kubernetes installation, a single dispatcher creates pods that the Kubernetes scheduler distributes among nodes in the Kubernetes cluster.

  • The query server runs most workers which do not require the ability to write data. Some installations run many query servers to handle large workloads.
  • The merge server runs privileged workers which are allowed to write to (and delete from) historical data. Only users with specific privileges are allowed to start merge workers.