---
title: Worker launch
---

A **Deephaven Worker** instance is the primary place within a Deephaven system where "work" is done. A worker process may be interactive, where a user or downstream process sends individual commands, or a Persistent Query (PQ), which is pre-defined and may be set to execute on a schedule. A worker may be used to manipulate and display data to users, ingest data into the system, or perform any number of tasks.

## Architecture

Worker processes are spawned by the `RemoteQueryDispatcher` via shell-script with several parameters. These `RemoteQueryDispatcher` processes are identified as the `db_query_server` process and the `db_merge_server` process. A dispatcher process runs on each node within the cluster that is identified with a `QUERY` and/or `MERGE` role during installation. In a Kubernetes environment, these processes are run as the `query-server` and `merge-server` deployments. A worker process is spawned running as the same user as that of the `RemoteQueryDispatcher` which launched it, unless [per-user workers](../configuration/per-user-workers.md) is enabled.

A `db_query_server` and a `db_merge_server` are very similar, but the processes are run by different users. By default, a `db_query_server` instance is run as the `dbquery` user, and a `db_merge_server` instance is run as the `dbmerge` user. These default users may be defined differently during initial system installation. The `dbmerge` user is able to write [historical data](../../legacy/importing-data/introduction.md#intraday-and-historical-data), so access to a `db_merge_server` process should be limited to administrative and system users. The `dbquery` workers spawned by the `db_query_server` are able to read intraday and historical data, but cannot write historical data.

Each worker is given a unique `ProcessInfoId` by the `RemoteQueryDispatcher`. This identifier is useful for troubleshooting. During worker startup, `stdout` and `stderr` are captured by the `RemoteQueryDispatcher`, and sent to the [Process Event Log](../ops-guide/finding-errors.md#querying-the-processeventlog-table) on behalf of the worker, identified by the `ProcessInfoId`. Once the worker has started, it can write to the [`ProcessEventLog`](../internal-tables/process-event-log.md) (`PEL`) on its own by writing [binary logs](./system-logs.md#dbinternal-binary-logs).

## Troubleshooting

There are a number of reasons why a worker may fail to start. In many cases, the cause can be determined by examining a stack trace for the worker-spawn attempt. For a failed PQ, the stack trace may be found in the `ExceptionDetails` column in the **Query Monitor** / **Query Config** tab of the IDE. In many cases, the `ProcessInfoId` for the worker is also listed in the `ProcessInfoId` column. For an interactive worker session, the exception is found in the **Code Studio** / **Console** that launched the attempt.\
In some cases, a problem with the `RemoteQueryDispatcher` process may prevent successful worker launches. In this case, information may be found in a plain-text logfile for the `RemoteQueryDispatcher` process within the `/var/log/deephaven/query_server` or `/var/log/deephaven/merge_server` directory of the node where the worker launch was attempted.

If the worker _process_ has started but crashes during initialization, additional details may be found in the `ProcessEventLog`. See [Debugging overview](../../debugging/overview.md) for details on troubleshooting via the `ProcessEventLog`.

### Common worker startup issues

> [!NOTE]
> This is not intended to be a complete list of possible errors.

| Cause                                                             | Troubleshooting                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| :---------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `RemoteQueryDispatcher` or `LogAggregatorService` is not started. | The status of these services should be checked on the node where the worker is being launched. Run the command `/usr/illumon/latest/bin/dh_monit summary` to check the status of these processes.<br/><br/>If the appropriate `RemoteQueryDispatcher` is not started (`db_query_server` and/or `db_merge_server`), then the worker cannot be launched. If the `LogAggregatorService` is not started (`log_aggregator_service`), then logs for the worker will not be captured. <br/><br/>Start the `RemoteQueryDispatcher` service(s) and/or `LogAggregatorService` with `sudo -u irisadmin /usr/illumon/latest/bin/dh_monit start ...` for the appropriate service (system administrative privileges will be required).                                                                                                                                                                  |
| Worker requested too much heap.                                   | There is a default "cumulative heap per `RemoteQueryDispatcher`" defined by the [RemoteQueryDispatcher.maxTotalQueryProcessorHeapMB](../pq-controller/worker-heap-size.md#remote-query-dispatcher-properties), which may be overridden per `RemoteQueryDispatcher` instance in `iris-environment.prop`. Even if there is sufficient memory installed in the system, cumulative worker memory cannot exceed the value defined by this property.<br/><br/>Similarly, the `RemoteQueryDispatcher.maxPerWorkerHeapMB` property may be defined, limiting the maximum heap allowed per worker. By default, this property is not defined, and a given worker may allocate memory up to the total `RemoteQueryDispatcher.maxTotalQueryProcessorHeapMB` value configured.<br/><br/>Ensure that the properties are set correctly and that the worker is not requesting more heap than is available. |
| Worker crashes during initialization.                             | If the worker process has started but is unable to complete initialization, the reason should be identified in the `ProcessEventLog`. This may be caused by syntactical errors in the script (for a PQ), by missing resources on the classpath (plugins not installed/activated on the particular node), etc.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |

## Related documentation

- [Controlling query worker heap size](../pq-controller/worker-heap-size.md)
- [Finding errors](../ops-guide/finding-errors.md)
- [Per-user workers](../configuration/per-user-workers.md)
- [System logs](./system-logs.md)
