Deephaven Process runbooks
This section outlines the procedures for each Deephaven process.
Incident classification key
| Severity | Description |
|---|---|
| 0 - None | Process is running (or down as scheduled). |
| 1 - Critical | Process is down when it should be up. |
| 2 - Moderate | Process is up when it should be down; or process is up but configuration is missing. |
| 3 - Low | Process is running but producing errors or performing poorly. |
Authentication Server Process
| Level | Impact |
|---|---|
| Sev 1 - Critical | New users will be unable to login or create new queries |
Procedures
Check Process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/*.prop
Check status of MariaDB/MySQL dependency:
Restart Procedure:
ACL Write Server Process
| Level | Impact |
|---|---|
| Sev 2 - Moderate | Administrators will not be able to update user permissions and groups |
Procedures
Check process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/*.prop
Check status of MariaDB/MySQL dependency:
Restart Procedure:
Configuration Server Process
| Level | Impact |
|---|---|
| Sev 1 - Critical | None of the system processes will be able to start. |
Procedures
Check Process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/*.prop
Restart Procedure:
Persistent Query Controller Process
| Level | Impact |
|---|---|
| Sev 1 - Critical | All persistent queries for this controller will terminate. Users will not be able to view any persistent queries in the Deephaven Console. |
Procedures
Check Process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/*.prop
Restart Procedure:
Cache Backup and Restore Process
| Level | Impact |
|---|---|
| Sev 1 - Critical | The controller cache is the location in which persistent queries are stored, so it is strongly recommended that periodic backups be taken of this data. The ability to restore persistent queries is critical. |
Procedures
To export all Deephaven queries, use the following command:
By default, the file is named controllerToolExport.xml and placed in the controller tool's workspace at:
/db/TempFiles/irisadmin/controller_tool
To import your queries to any controller running the same Deephaven version, use the following command:
It may be useful to keep each query's serial ID so that user workspaces will continue to work. In this case, you can add the following parameter, which will keep each query's original serial, but not import any query if a query already exists with the same serial:
--retainSerial=keep
To keep the original serial IDs and also overwrite existing queries with the same IDs, instead use:
--retainSerial=replace
For full details, see the Persistent Query Controller Tool.
Log Aggregator Service (LAS) Process
| Level | Impact |
|---|---|
| Sev 1 - Critical | Any process configured to use the LAS will fail to write logs to the database. This will cause failure of these processes, including the query workers. |
Procedures
Check Process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/*.prop
Restart Procedure:
Alternative procedure
Disable the LAS.
Warning
This requires restarting the Remote Query Dispatcher which will stop all running queries.
To disable the LAS and have processes write their logs to plain text log files, add the following properties to iris-environment.prop:
Restart the affected Deephaven processes:
Tailer 1 Process
| Level | Impact |
|---|---|
| Sev 2 - Moderate | Users will not be directly affected, but internal Deephaven logs (including state, configuration, process and event logs) will not be written to the database. |
Procedures
Check Process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/*.prop
Restart Procedure:
Remote Table Appender (Data Import Server) Process
| Level | Impact |
|---|---|
| Sev 1 - Critical | Intraday user data will not be available and updates cannot be written to the database. |
The Remote Table Appender is an instance of a Data Import Server, and in many cases it is the same process as the main Data Import Server Process. If this is the case, refer to Data Import Server Process.
If you have configured a separate process for RTA, you will need to refer to your system to find the service name and configuration. This documentation assumes it is db_rta.
Procedures
Check Process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/iris-common.prop
Restart Procedure:
Data Import Server Process
| Level | Impact |
|---|---|
| Sev 1 - Critical | Binary log file data will not be written to the database. Binary store imports will fail. |
Procedures
Check Process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/*.prop
Restart Procedure:
Procedure for cleaning up corrupt intraday data
In the event that intraday ticking data becomes corrupted, you do not need to stop the DIS (since the March 2018 release). Instead, simply clean up the intraday data and the DIS's state. In general, that means the following commands, run as the dbmerge user:
For your Order/Event table, you might use:
Note that in the latest Deephaven release, you do not need to stop the DIS, and instead simply need to run:
Deephaven Merge Server Process
| Level | Impact |
|---|---|
| Sev 2 - Moderate | Persistent queries for Merges and Imports will fail. |
Procedures
Check Process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/*.prop
Restart Procedure:
Remote Query Dispatcher Process
| Level | Impact |
|---|---|
| Sev 1 - Critical | Any running query workers will terminate, and new ones cannot be started. This includes all running persistent queries as well as interactive consoles. |
Procedures
Check Process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/*.prop
Restart Procedure:
Process Shutdown
Each Deephaven process has a shutdown manager, set by the property default.processEnvironmentFactory. The shutdown manager ensure that processes terminate in an orderly and timely manner.
If a process fails to terminate cleanly, the shutdown manager will stop it forcefully after a timeout set by property ShutdownManager.deephaven.shutdownTimeoutMillis.
Modify the following default to change the timeout for worker and dispatcher shutdown.
Local Table Data Server Process
| Level | Impact |
|---|---|
| Sev 1 - Critical | Intraday data for any dates other than currentDateNy() will not be available. |
Procedures
Check Process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/*.prop
Restart Procedure:
Web API Service Process Table
| Level | Impact |
|---|---|
| Sev 1 - Critical | Deephaven Console GUI Users will not be affected, but Web API clients be impacted. |
Procedures
Enable the Web API Service:
The Web API Service is disabled by default.
In the M/Monit config folder, remove the .disabled extension from the Web API Service config file name and run monit reload. This will instruct the M/Monit daemon to reread its configuration and re-initialize.
Check Process is running with Monit:
View Log File for successful startup messages:
Check Property File Settings:
/etc/sysconfig/illumon.d/resources/*.prop
If the above file does not exist (older installations), instead check
/etc/sysconfig/illumon.d/resources/openapi-defaults.prop
On newer installations, web_api_service.prop and iris-query-server.prop will both include openapi-defaults.prop. This reflects the fact that most OpenAPI configuration is shared between the OpenAPI webserver and system query workers. See note [1] below for details.
Restart Procedure:
Web API Server TLS Keystore (.p12 keystore file)
The Web API Server's TLS keystore contains the certificate and private key of a TLS enabled service. You must keep this file private, and not distribute it to clients. The Web API Servers keystore file should be unique per node, with a certificate that is signed (issued) by a trusted CA.
The default self-signed key pair for the Web API Server is generated when installing the iris-config.rpm and saved to .p12 keystore file. This default keystore will work, but the browser will give security warnings until you use your own a CA-signed Certificate (see below).
[-r--r----- irisadmin dbquery ] webServices-keystore.p12
The Web Server keystore file is also protected by a unique randomly generated password stored in base64 encoded format in a read-only hidden file owned by user iriadmin and readable by dbquery group with permission set to 440:
[-r--r----- irisadmin dbquery] .webapi_passphrase
Important keystore properties and files
Keystore Filename:
/etc/sysconfig/illumon.d/auth/keystore.webServices-keystore.p12
Passphrase File:
/db/TempFiles/irisadmin/.webapi_passphrase
Property File:
/etc/sysconfig/illumon.d/resources/iris-common.prop
Note, if this file does not exist [1], you can edit the following instead:
/etc/sysconfig/illumon.d/resources/openapi-defaults.prop
For legacy installations, you can edit both of the following:
/etc/sysconfig/illumon.d/resources/iris-query-server.prop
/etc/sysconfig/illumon.d/resources/web_api_service.prop
Keystore Property:
WebServer.tls.keystore=/etc/sysconfig/illumon.d/auth/webServices-keystore.p12
Passphrase Property:
WebServer.tls.passphrase.file=/db/TempFiles/irisadmin/.webapi_passphrase
[1] If iris-common.prop does not exist (normal for Deephaven versions 20190117 or earlier) or openapi-defaults.prop does not exist (normal for versions 20180803 or earlier):
Alternatively, you may wish to put your includefiles at the top of the iris-query-server.prop file, and manually delete/edit any properties from openapi-default.prop that are found in iris-query-server.prop. Putting the includefiles at the end of the file is easier because it will override other settings, but may be confusing that a property is defined then overridden. To keep things cleaner, remove/move any properties with a tls prefix to openapi-defaults.prop. You may also wish to move RemoteQueryDispatcher.websocket.enabled=true as well.
Securing the Web API Server with your CA-signed Certificate
While the default self-signed certificate is good enough for testing, it presents scary security warnings to users, and encourages users to ignore security warnings (a very bad habit), so you should always use a "real" CA-signed certificate for production use.
Obtain a TLS certificate signed by your trusted CA with the domain name matching the Deephaven server, e.g., myserver.mydomain.com.
Backup the existing file keystore file:
Import your CA cert and key files to the Web API Service keystore file. For example:
Note
If you are unfamiliar with how to generate a .key and .csr file to get a .crt from a CA, please read [this link](read this link), or contact a security professional to help you with obtaining a .key and .crt.
Set the correct permissions on the web services keystore file:
Set/Verify Open API Props:
Update Query Server Prop File: /etc/sysconfig/illumon.d/resources/iris-common.prop:
Replace two lines of content with the following:
The host set above can also go into iris-common.prop, but it is not required.
Restart Web API Service with monit:
Client Update Service Process (Lighttpd web server)
| Level | Impact |
|---|---|
| Sev 2 - Moderate | Users will not be able to use the Launcher and Deephaven Clients will not be able to receive any updates from the server. |
Note
The Client Update Service (CUS) is powered by lighttpd, a lightweight web server designed for speed-critical environments.
Procedures
The Client Update Service (CUS) is powered by lighttpd to update clients with server side components including, JARs, properties, etc.
The CUS is disabled by default for security reasons.
By default, the CUS does not require user authentication. The CUS is powered by lighttpd and provides basic and digest authentication methods described by RFC 2617.
To enable authentication with users defined in a file, edit /etc/lighttpd/client-update-service.conf and uncomment the lines for mod_auth and mod_authn_file in the server.modules section. Also uncomment the line (further down in the file) to include conf.d/iris-auth.conf.
Authorized users are stored in the htpasswd file:
/etc/lighttpd/illumon-cus.user
The htpasswd file contains the username and the crypt()'ed password separated by a colon. Each entry in the file is terminated by a single newline.
For example:
iris:$apr1$1xsLWNhw$.qiKafnbTpoNda/d6X77l.
You can use the htpasswd utility from the Apache distribution to manage htpasswd files. Note that not all versions of htpasswd default to use Apache's modified MD5 algorithm for passwords, which is required by lighttpd. You can force most to use MD5 by running:
Append the output of the above command to:
/etc/lighttpd/illumon-cus.user
More information on configuration options is available in lighttpd's documentation.
Securing the Customer Update Service (CUS) with HTTPS
To securely enable the CUS on HTTPS port 443:
Obtain a TLS certificate signed by your trusted CA with the domain name matching the Deephaven server, e.g: myserver.mydomain.com
Concatenate your .crt and .key file together into a single PEM file. For example:
On the Deephaven Server, edit the /etc/lighttpd/client-update-service.conf file and set the following properties:
Update /var/www/lighttpd/iris/iris/getdown.txt.pre file as described in the previous section, replacing http with https. For example:
Restart the CUS with monit:
The "Client Update Service" will be available at: https://myserver.mydomain.com/
Check Process is Running with Monit:
Sudo access required to view Log File for successful startup messages:
/var/log/lighttpd/cus-error.log
/var/log/lighttpd/cus-access.log
Sudo access required to check Config File Settings:
/etc/lighttpd/client-update-service.conf
Sudo access required to check Files in Document Root:
/var/www/lighttpd/iris/
Restart Procedure:
To enable the CUS on cleartext HTTP port 80: (Note: This is not recommended. Only do this for testing only on a trusted private network.)
On the Deephaven Server, edit the /var/www/lighttpd/iris/iris/getdown.txt.pre file:
Set the appbase value, replacing WEBHOST with the FQDN (or IP address) of your Deephaven Server.
For example:
In the M/Monit config folder, remove the .disabled extension from the Client Update Service config file name and run monit reload. This will instruct the M/Monit daemon to reread its configuration and re-initialize.
Check the status of the getdown service:
Once the "Client Update Service" is up and running, you can proceed to install and run the Launcher on client desktops. The installers for Windows, Mac and Linux desktops can be downloaded from the "Client Update Service" on your Deephaven Server at:
http://<IRIS_SERVER_ADDRESS>/
MariaDB (MySQL) Process
| Level | Impact |
|---|---|
| Sev 1 - Critical | The Authentication Server, ACL Write Server and Deephaven Clients will be impacted. Query workers will also be affected and unable to check effective user permissions. |
Note
See the MariaDB website for more information on MariaDB.
Procedures
Check Process is running:
Sudo access required to view Log File for successful startup messages:
/var/log/mariadb/mariadb.log
Check Config File Settings:
/etc/my.cnf
Check Settings in Deephaven ACL Database: dbacl_iris
Restart Procedure: