---
title: Deephaven Process runbooks
sidebar_label: Process runbooks
---

This section outlines the procedures for each Deephaven process.

## Incident classification key

| Severity     | Description                                                                          |
| :----------- | :----------------------------------------------------------------------------------- |
| 0 - None     | Process is running (or down as scheduled).                                           |
| 1 - Critical | Process is down when it should be up.                                                |
| 2 - Moderate | Process is up when it should be down; or process is up but configuration is missing. |
| 3 - Low      | Process is running but producing errors or performing poorly.                        |

## Authentication Server Process

| Level            | Impact                                                  |
| :--------------- | :------------------------------------------------------ |
| Sev 1 - Critical | New users will be unable to login or create new queries |

### Procedures

Check Process is running with Monit:

```
sudo -u irisadmin monit status authentication_server
```

View Application Log Files:

```
cat /var/log/deephaven/authentication_server/AuthenticationServer.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/authentication_server/authentication_server.log.????-??-??
```

Check status of MariaDB/MySQL dependency (if MySQL is used to store ACLs):

```
sudo systemctl status mariadb
```

[Check etcd](./troubleshooting/troubleshooting-etcd.md) endpoint status (if etcd is used to store ACLs):

```text
sudo -u irisadmin /usr/illumon/latest/bin/etcdctl.sh endpoint status --write-out table
```

Restart Procedure:

```
sudo -u irisadmin monit restart authentication_server
```

## ACL Write Server Process

| Level            | Impact                                                                |
| :--------------- | :-------------------------------------------------------------------- |
| Sev 2 - Moderate | Administrators will not be able to update user permissions and groups |

### Procedures

Check process is running with Monit:

```
sudo -u irisadmin monit status db_acl_write_server
```

View Application Log Files:

```
cat /var/log/deephaven/acl_write_server/DbAclWriteServer.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/acl_write_server/db_acl_write_server.log.????-??-??
```

Check status of MariaDB/MySQL dependency (if MySQL is used to store ACLs):

```
sudo systemctl status mariadb
```

[Check etcd](./troubleshooting/troubleshooting-etcd.md) endpoint status (if etcd is used to store ACLs):

```text
sudo -u irisadmin /usr/illumon/latest/bin/etcdctl.sh endpoint status --write-out table
```

Restart Procedure:

```
sudo -u irisadmin monit restart db_acl_write_server
```

## Configuration Server Process

| Level            | Impact                                              |
| :--------------- | :-------------------------------------------------- |
| Sev 1 - Critical | None of the system processes will be able to start. |

### Procedures

Check Process is running with Monit:

```
sudo -u irisadmin monit status configuration_server
```

View Application Log Files:

```
cat /var/log/deephaven/configuration_server/ConfigurationServer.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/configuration_server/configuration_server.log.????-??-??
```

Restart Procedure:

```
sudo -u irisadmin monit restart configuration_server
```

## Persistent Query Controller Process

| Level | Impact |
| :--------------- | :------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Sev 1 - Critical | When configured to run with multiple controllers, running Core+ queries are migrated to a running controller. | Core+ queries that are not yet running are terminated. All Legacy queries, including WebClientData, are terminated. Until the WebClientData reinitializes, users are not able to load the Deephaven console. |

### Procedures

Check Process is running with Monit:

```
sudo -u irisadmin monit status iris_controller
```

View Application Log Files:

```
cat /var/log/deephaven/iris_controller/PersistentQueryController.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/iris_controller/iris_controller.log.????-??-??
```

Restart Procedure:

```
sudo -u irisadmin monit restart iris_controller
```

## Persistent Query Backup and Restore Process

| Level            | Impact                                                                                                                                                                              |
| :--------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Sev 1 - Critical | The controller stores persistent queries in etcd, so it is strongly recommended that periodic backups be taken of this data. The ability to restore persistent queries is critical. |

### Procedures

To export all Deephaven queries, use the following command:

```
sudo /usr/illumon/latest/bin/dhconfig pq export --file /tmp/PersistentQueryBackup.xml
```

To import your queries to any controller running the same Deephaven version, use the following command:

```
sudo /usr/illumon/latest/bin/dhconfig pq import --file /tmp/PersistentQueryBackup.xml
```

It may be useful to keep each query's serial ID so that user workspaces will continue to work. In this case, you can add the following parameter, which will keep each query's original serial, but not import any query if a query already exists with the same serial:

`--retainSerial=keep`

To keep the original serial IDs and also overwrite existing queries with the same IDs, instead use:

`--retainSerial=replace`

For full details, see the [Persistent Query Controller Tool](./pq-controller/pq-controller.md).

## Log Aggregator Service (LAS) Process

| Level            | Impact                                                                                                                                                  |
| :--------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Sev 1 - Critical | Any process configured to use the LAS will fail to write logs to the database. This will cause failure of these processes, including the query workers. |

### Procedures

Check Process is running with Monit:

```
sudo -u irisadmin monit status log_aggregator_service
```

View Application Log Files:

```
cat /var/log/deephaven/las/LogAggregatorService.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/las/log_aggregator_service.log.????-??-??
```

Restart Procedure:

```
sudo -u irisadmin monit restart log_aggregator_service
```

## Tailer 1 Process

| Level            | Impact                                                                                                                                                         |
| :--------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Sev 2 - Moderate | Users will not be directly affected, but internal Deephaven logs (including state, configuration, process and event logs) will not be written to the database. |

### Procedures

Check Process is running with Monit:

```
sudo -u irisadmin monit status tailer1
```

View Application Log Files:

```
cat /var/log/deephaven/tailer/LogtailerMain1.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/tailer/tailer1.log.????-??-??
```

Restart Procedure:

```
sudo -u irisadmin monit restart tailer1
```

## Data Import Server Process

| Level            | Impact                                                                                    |
| :--------------- | :---------------------------------------------------------------------------------------- |
| Sev 1 - Critical | Binary log file data will not be written to the database. Binary store imports will fail. |

### Procedures

Check Process is running with Monit:

```
sudo -u irisadmin monit status db_dis
```

View Application Log Files:

```
cat /var/log/deephaven/dis/DataImportServer.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/dis/db_dis.log.????-??-??
```

Restart Procedure:

```
sudo -u irisadmin monit restart db_dis
```

### Procedure for cleaning up corrupt intraday data

In the event that intraday ticking data becomes corrupted, you do not need to stop the DIS. Instead, simply clean up the intraday data and the DIS's state. In general, that means the following commands, run as the `dbmerge` user:

```
rm -r /db/Intraday/[namespace]/[tablename]/[intraday partition]/[date]
```

For your Order/Event table, you might use:

```
rm -r /db/Intraday/Order/Event/*/2018-02-09
```

## Deephaven Merge Server Process

| Level            | Impact                                               |
| :--------------- | :--------------------------------------------------- |
| Sev 2 - Moderate | Persistent queries for Merges and Imports will fail. |

### Procedures

Check Process is running with Monit:

```
sudo -u irisadmin monit status db_merge_server
```

View Application Log Files:

```
cat /var/log/deephaven/merge_server/db_merge_server.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/merge_server/db_merge_server.log.????-??-??
```

Restart Procedure:

```
sudo -u irisadmin monit restart db_merge_server
```

## Remote Query Dispatcher Process

| Level            | Impact                                                                                                                                                  |
| :--------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Sev 1 - Critical | Any running query workers will terminate, and new ones cannot be started. This includes all running persistent queries as well as interactive consoles. |

### Procedures

Check Process is running with Monit:

```
sudo -u irisadmin monit status db_query_server
```

View Application Log Files:

```
cat /var/log/deephaven/query_server/RemoteQueryDispatcher.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/query_server/db_query_server.log.????-??-??
```

Restart Procedure:

```
sudo -u irisadmin monit restart db_query_server
```

### Process Shutdown

Each Deephaven process has a shutdown manager, set by the property `default.processEnvironmentFactory`. The shutdown manager ensure that processes terminate in an orderly and timely manner.
If a process fails to terminate cleanly, the shutdown manager will stop it forcefully after a timeout set by property `ShutdownManager.deephaven.shutdownTimeoutMillis`.
Modify the following default to change the timeout for worker and dispatcher shutdown.

```
# override the shutdown timeout for all workers
[service.name=dbquery|dbmerge] {
    ShutdownManager.deephaven.shutdownTimeoutMillis=60000
}
```

## Table Data Cache Proxy Process

| Level            | Impact                               |
| :--------------- | :----------------------------------- |
| Sev 1 - Critical | Intraday data will not be available. |

### Procedures

Check Process is running with Monit:

```
sudo -u irisadmin monit status db_tdcp
```

View Application Log Files:

```
cat /var/log/deephaven/tdcp/TableDataCacheProxy.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/tdcp/db_tdcp.log.????-??-??
```

Restart Procedure:

```
sudo -u irisadmin monit restart db_ltds
```

## Local Table Data Server Process

For an architectural overview of this process and its typical uses, see [Local Table Data Service](./core-components/local-table-data-service.md).

| Level            | Impact                                                                                   |
| :--------------- | :--------------------------------------------------------------------------------------- |
| Sev 2 - Moderate | If the LTDS is configured in the routing, then any data it serves will not be available. |

### Procedures

Check Process is running with Monit:

```
sudo -u irisadmin monit status db_ltds
```

View Application Log Files:

```
cat /var/log/deephaven/ltds/LocalTableDataServer.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/ltds/db_ltds.log.????-??-??
```

Restart Procedure:

```
sudo -u irisadmin monit restart db_ltds
```

## Status Dashboard

| Level            | Impact                                       |
| :--------------- | :------------------------------------------- |
| Sev 2 - Moderate | Status dashboard data will not be available. |

### Procedures

Check Process is running with Monit:

```
sudo -u irisadmin monit status status_dashboard
```

View Application Log Files:

```
cat /var/log/deephaven/status_dashboard/StatusDashboard.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/status_dashboard/status_dashboard.log.????-??-??
```

Restart Procedure:

```
sudo -u irisadmin monit restart db_ltds
```

## Web API Service Process Table

| Level            | Impact                                                                                                                                                                                           |
| :--------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Sev 1 - Critical | Both Web API clients and Deephaven Console GUI Users will be impacted. Users will not be able to use the Launcher and Deephaven Clients will not be able to receive any updates from the server. |

### Procedures

Enable the Web API Service:

The Web API Service is disabled by default.

In the M/Monit config folder, remove the `.disabled` extension from the Web API Service config file name and run monit reload. This will instruct the M/Monit daemon to reread its configuration and re-initialize.

```
cd /etc/sysconfig/illumon.d/monit
mv web_api_service.disabled web_api_service.conf
sudo -u irisadmin monit reload
```

Check Process is running with Monit:

```
sudo -u irisadmin monit status web_api_service
```

View Application Log Files:

```
cat /var/log/deephaven/web_api_service/WebServer.log.current
```

List Log Files for Standard Out/Error:

```
ls -ltr /var/log/deephaven/web_api_service/web_api_service.log.????-??-??
```

Restart Procedure:

```
sudo -u irisadmin monit restart web_api_service
```

### Web API Server TLS Keystore (`.p12` keystore file)

The Web API Server's TLS keystore contains the certificate and private key of a TLS enabled service. You must keep this file private, and not distribute it to clients. The Web API Servers keystore file should be unique per node, with a certificate that is signed (issued) by a trusted CA.

The default self-signed key pair for the Web API Server is generated when installing the iris-config.rpm and saved to .p12 keystore file. This default keystore will work, but the browser will give security warnings until you use your own a CA-signed Certificate (see below).

`[-r--r----- irisadmin dbquery ] webServices-keystore.p12`

The Web Server keystore file is also protected by a unique randomly generated password stored in base64 encoded format in a read-only hidden file owned by user `iriadmin` and readable by `dbquery` group with permission set to 440:

`[-r--r----- irisadmin dbquery] .webapi_passphrase`

### Important keystore properties and files

Keystore Filename:
`/etc/sysconfig/illumon.d/auth/keystore.webServices-keystore.p12`

Passphrase File:
`/db/TempFiles/irisadmin/.webapi_passphrase`

Keystore Property:
`WebServer.tls.keystore=/etc/sysconfig/illumon.d/auth/webServices-keystore.p12`

Passphrase Property:
`WebServer.tls.passphrase.file=/db/TempFiles/irisadmin/.webapi_passphrase`

[1] If `iris-common.prop` does not exist (normal for Deephaven versions 20190117 or earlier) or `openapi-defaults.prop` does not exist (normal for versions 20180803 or earlier):

```
cd /etc/sysconfig/illumon.d/resources/
# Move existing web_api_service props to openapi-defaults
cp web_api_service.prop openapi-defaults.prop
# Replace web_api_service with an includefiles on openapi-defaults
echo includefiles=openapi-defaults.prop > web_api_service.prop
# append the include to the end of the query server configuration
cat includefiles=openapi-defaults.prop >> iris-query-server.prop
```

Alternatively, you may wish to put your `includefiles` at the top of the `iris-query-server.prop` file, and manually delete/edit any properties from `openapi-default.prop` that are found in `iris-query-server.prop`. Putting the includefiles at the end of the file is easier because it will override other settings, but may be confusing that a property is defined then overridden. To keep things cleaner, remove/move any properties with a `tls` prefix to `openapi-defaults.prop`. You may also wish to move `RemoteQueryDispatcher.websocket.enabled=true` as well.

### Securing the Web API Server with your CA-signed Certificate

While the default self-signed certificate is good enough for testing, it presents scary security warnings to users, and encourages users to ignore security warnings (a very bad habit), so you should always use a "real" CA-signed certificate for production use.

Obtain a TLS certificate signed by your trusted CA with the domain name matching the Deephaven server, e.g., myserver.mydomain.com.

Backup the existing file keystore file:

```
sudo cp /etc/sysconfig/illumon.d/auth/webServices-keystore.p12 \
/etc/sysconfig/illumon.d/auth/webServices-keystore.p12.ORG
```

Import your CA cert and key files to the Web API Service keystore file. For example:

```
STOREPASS=$(sudo cat /db/TempFiles/irisadmin/.webapi_passphrase | base64 --decode)
# This assumes you have stored your own .key and CA-provided .crt in /etc/ssl/certs/tls.* files
openssl pkcs12 -export -in /etc/ssl/certs/tls.crt -inkey /etc/ssl/certs/tls.key -name webapi -out /etc/sysconfig/illumon.d/auth/webServices-keystore.p12 -passout pass:$STOREPASS
```

> [!NOTE]
> If you are unfamiliar with how to generate a `.key` and `.csr` file to get a `.crt` from a CA, please contact your IT organization.

Set the correct permissions on the web services keystore file:

```
sudo chown irisadmin:dbquery \

/etc/sysconfig/illumon.d/auth/webServices-keystore.p12

sudo chmod 440 /etc/sysconfig/illumon.d/auth/webServices-keystore.p12
```

Set/Verify Open API Props:

```
/etc/sysconfig/illumon.d/resources/iris-common.prop
WebServer.tls.keystore=/etc/sysconfig/illumon.d/auth/webServices-keystore.p12
WebServer.tls.passphrase.file=/db/TempFiles/irisadmin/.webapi_passphrase
# Enable Web Sockets for Query Workers
RemoteQueryDispatcher.websocket.enabled=true
```

Update Query Server Prop File: `/etc/sysconfig/illumon.d/resources/iris-common.prop`:

Replace two lines of content with the following:

```
# Set Dispatcher hostname to match the host for your CA-signed certificate:
RemoteQueryDispatcherParameters.host=myserver.mydomain.com
```

The host set above can also go into `iris-common.prop`, but it is not required.

Restart Web API Service with monit:

```
sudo -u irisadmin monit restart web_api_service
```

### Client Update Service

The Client Update Service (CUS) is a process that updates clients with server-side components, including JARs, properties, etc. By default, each Web API Service's web server will host a CUS instance.

When the Client Update Service is running, you can install and run the Launcher on client desktops. The installers for Windows, Mac and Linux desktops can be downloaded from the Client Update Service on your Deephaven Server at:

`http://<WEBHOST>/launcher`

#### CUS Reload Procedure

To make [new or modified server components](../legacy/legacy-ui/launcher-and-client-configuration.md#customer-updatable-values) available to clients, reload the Client Update Service by navigating to https://WEBHOST/reload/

Clients (e.g., the Swing UI) must exit and restart the launcher to download new components. A client that is not restarted may have outdated code or configuration that is incompatible with the Deephaven installation.

> [!NOTE]
> The Client Update Service is hosted by the Web API Service. This service does not refresh properties before reloading the CUS. If any properties have changed that affect the CUS configuration, such as those described in the [client update service customer-updatable values](../legacy/legacy-ui/launcher-and-client-configuration.md#customer-updatable-values) documentation, you must [restart](#web-api-service-process-table) the Web API Service.

## `etcd` Process

| Level            | Impact                                                                                                                                                                                                                                              |
| :--------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Sev 1 - Critical | Schema, persistent queries, property files, routing configuration, and optionally ACLs are stored in etcd. etcd is used as a shared store for Authentication and Dispatcher runtime processing. Without etcd, the Deephaven system cannot function. |

### Procedures

Check Process is running with systemctl:

```
sudo systemctl status dh-etcd
```

Check endpoint status:

```text
sudo -u irisadmin /usr/illumon/latest/bin/etcdctl.sh endpoint status --write-out table
```

View Log Files:

```
# Get all logs
sudo journalctl -xu dh-etcd
# Follow the logs
sudo journalctl -xefu dh-etcd
```

Restart Procedure:

```
sudo systemctl restart dh-etcd
```

Check connectivity using `etcdctl.sh`:

```
sudo -u irisadmin /usr/illumon/latest/bin/etcdctl.sh role list
```

The `etcdctl.sh` script is a thin wrapper around [`etcdctl`](https://etcd.io/docs/v3.5/dev-guide/interacting_v3/) that passes in the correct user name and password for a given Deephaven role. Each user is stored in a directory of the form `/etc/sysconfig/deephaven/etcd/client/<user>`.

By default, the script uses the `root` user. To change the user, you can set the `DH_ETCD_USER` environment variable or specify the directory manually with the `DH_ETCD_DIR` environment variable. For example, to get a single schema (replace DbInternal and AuditEventLog with the namespace and name of the table of interest) with the `schema-ro` user, the following commands are equivalent:

```
sudo -u irisadmin DH_ETCD_USER=schema-ro /usr/illumon/latest/bin/etcdctl.sh get --prefix /main/config/schema/DbInternal/tables/AuditEventLog
sudo -u irisadmin DH_ETCD_DIR=/etc/sysconfig/illumon.d/etcd/client/schema-ro /usr/illumon/latest/bin/etcdctl.sh get --prefix /main/config/schema/DbInternal/tables/AuditEventLog
```

Show current disk usage per node:

```
sudo /usr/illumon/latest/bin/etcdctl.sh endpoint status --write-out=table
```

## MariaDB (MySQL) Process

If MySQL is used for ACLs, then the MySQL process is necessary for proper system function. If etcd is used for ACLs, then this process is not necessary.

| Level            | Impact                                                                                                                                                                  |
| :--------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Sev 1 - Critical | The Authentication Server, ACL Write Server and Deephaven Clients will be impacted. Query workers will also be affected and unable to check effective user permissions. |

> [!NOTE]
> See:
> https://mariadb.org/

### Procedures

Check Process is running:

```
sudo systemctl status mariadb
```

Sudo access required to view Log File:

`sudo cat /var/log/mariadb/mariadb.log`

Check Config File Settings:

`/etc/my.cnf`

Check Settings in Deephaven ACL Database: `dbacl_iris`

```
sudo mysql -e "show databases"
sudo mysql -D dbacl_iris -e "show tables"
sudo mysql -D dbacl_iris -e "select * from tableacls"
```

Restart Procedure:

```
sudo systemctl restart mariadb
```

## Related documentation

- [Permissions](./permissions/permissions-overview.md)
