System administration

This cheat sheet provides quick access to common commands and procedures for system administrators.

Management

Service status and control

Check that all Deephaven services have started and are available.

/usr/illumon/latest/bin/dh_monit up --block

Get a summary of all Deephaven services status:

/usr/illumon/latest/bin/dh_monit summary

Restart all Deephaven services:

/usr/illumon/latest/bin/dh_monit restart all

Restart a specific service:

sudo -u irisadmin monit restart <service_name>
# Example: sudo -u irisadmin monit restart db_query_server

Start a service interactively with debug logging:

sudo su - irisadmin
/usr/illumon/latest/bin/iris --debug start <process_name>

Verify cluster connection

# For password-based authentication
/usr/illumon/latest/bin/check-deephaven-cluster --connection-json https://deephaven-host:8000/iris/connection.json --username YOUR_USERNAME

# For public key authentication
/usr/illumon/latest/bin/check-deephaven-cluster --connection-json https://deephaven-host:8000/iris/connection.json --key-file /path/to/private-key.txt

etcd management

Check etcd endpoint status:

sudo -u irisadmin /usr/illumon/latest/bin/etcdctl.sh endpoint status --write-out table

List etcd roles:

sudo -u irisadmin /usr/illumon/latest/bin/etcdctl.sh role list

Defragment etcd node (to reclaim disk space):

# Stop the etcd service on the target node
sudo systemctl stop dh-etcd.service

# Run defragmentation (replace <etcd_data_directory> with your actual etcd data directory)
sudo -u etcd -g etcd /bin/etcdctl defrag --data-dir=/var/lib/etcd/dh/<etcd_data_directory>

# Restart the etcd service
sudo systemctl start dh-etcd.service

MariaDB/MySQL management

If MySQL is used for ACLs, these commands are useful:

# Check MariaDB/MySQL service status
sudo systemctl status mariadb

# View databases
sudo mysql -e "show databases"

# View ACL tables
sudo mysql -D dbacl_iris -e "show tables"

# View table ACLs
sudo mysql -D dbacl_iris -e "select * from tableacls"

# Restart MariaDB
sudo systemctl restart mariadb

Certificate management

Inspect certificates

View a certificate's details including validity period and SAN names:

sudo openssl x509 -in /etc/deephaven/cus-tls/tls.crt -noout -text

Check a remote server's certificate:

openssl s_client -connect your.server.com:443 -servername your.server.com | openssl x509 -text -noout

Verify hostname in a certificate:

openssl s_client -connect your.server.com:443 | grep -A1 'subject\|Subject Alternative Name'

Replace certificates

Replace Deephaven certificates on an infra server (non-Kubernetes):

# 1. Copy new tls.crt and tls.key files to /etc/deephaven/cus-tls
# 2. Generate new web certificates and package them
/usr/illumon/latest/install/iris_keygen.sh --force-new-web
/usr/illumon/latest/install/config_packager.sh query package

# 3. Create lighttpd.pem file for Envoy
source /usr/illumon/latest/bin/dh_users
sudo -u "${DH_ADMIN_USER}" cat /etc/deephaven/cus-tls/tls.key /etc/deephaven/cus-tls/tls.crt | \
    sudo -u "${DH_ADMIN_USER}" tee /etc/sysconfig/illumon.d/client_update_service/lighttpd.pem > /dev/null

# 4. Restart Web API service and query servers
/usr/illumon/latest/bin/dh_monit restart web_api_service

Convert a DER certificate to PEM format:

openssl x509 -inform der -in certificate.cer -out tls.crt

Trust certificates in Java

Import a certificate to the Java truststore:

keytool -cacerts -trustcacerts -import -file <certificate_file> -alias <unique_alias>

Create a PKCS12 truststore:

# Create a password file
echo -n "your_password" | base64 > truststore_passphrase

# Import certificate
keytool -importcert -keystore truststore-iris.p12 -storepass $(cat truststore_passphrase | base64 -d) -alias your_server -file your_cert.pem

View certificates in a truststore:

keytool -list -keystore truststore-iris.p12 -storepass $(cat truststore_passphrase | base64 -d)

Users & groups

See ACL management for more information.

Create a user

/usr/illumon/latest/bin/dhconfig acl users add --name <username>

Delete a user

/usr/illumon/latest/bin/dhconfig acl users delete --name <username>

List users

/usr/illumon/latest/bin/dhconfig acl users list

Set a user password

# Set a password interactively
/usr/illumon/latest/bin/dhconfig acl users set-password --name <username>

# Set a password using a pre-hashed value
/usr/illumon/latest/bin/dhconfig acl users set-password --name <username> --hashed-password $(openssl passwd -apr1 'YOUR_PASSWORD')

Create a group

/usr/illumon/latest/bin/dhconfig acl groups add --name <groupname>

Delete a group

/usr/illumon/latest/bin/dhconfig acl groups delete --name <groupname>

List groups

/usr/illumon/latest/bin/dhconfig acl groups list

Add a user to a group

/usr/illumon/latest/bin/dhconfig acl groups add-member --name <groupname> --member <username>

Remove a user from a group

/usr/illumon/latest/bin/dhconfig acl groups remove-member --name <groupname> --member <username>

Backup, restore, & migration

See the backup and restore guide for more information.

Run the backup script

This example creates a full backup in a specified directory and sets a 7-day retention policy for old backups.

/usr/illumon/latest/bin/backup_deephaven --directory /path/to/backup/dir --all --retain 7

Common backup script parameters:

  • -d, --directory: Specify the backup directory.
  • -a, --all: Back up everything (etcd, ACLs, PQs, schemas, properties, routing, workspace data).
  • -e, --etcd: Back up the etcd database.
  • -A, --ACLs: Back up Access Control Lists.
  • -p, --persistentQueries: Back up Persistent Queries.
  • -w, --workspaceData: Back up the WorkspaceData table.
  • -R, --retain: Delete backups older than a specified number of days.

Restart all services

After restoring a backup or migrating a system, restart all Deephaven services on all nodes.

/usr/illumon/latest/bin/dh_monit restart all

Monitoring & troubleshooting

Check text logs

Standard text logs for Deephaven services are located in sub-directories under /var/log/deephaven.

# View current process log (example for query server)
cat /var/log/deephaven/query_server/db_query_server.log.current

# List all logs for a service (example for authentication server)
ls -ltr /var/log/deephaven/authentication_server/authentication_server.log.????-??-??

# View process launch logs (contains startup info and command line)
cat /var/log/deephaven/process_name/process_name.log.YYYY-MM-DD

View and search binary logs

The iriscat utility converts binary logs to a readable text format. This is useful when you cannot access a console.

# View a specific binary log file
/usr/illumon/latest/bin/iriscat -l /var/log/deephaven/binlogs/pel/DbInternal.ProcessEventLog.System.deephaven-query-1.YYYY-MM-DD.bin.YYYY-MM-DD.214340.599+0000

# Search all binary logs for a specific ProcessInfoId and page the results
/usr/illumon/latest/bin/iriscat -l /var/log/deephaven/binlogs/pel/DbInternal.ProcessEventLog.System.*.YYYY-MM-DD* | grep 'PROCESS_INFO_ID' | sort | less

Troubleshoot process startup

Check the status of all services managed by M/Monit.

/usr/illumon/latest/bin/dh_monit summary

If a service is failing, you can attempt to start it interactively with debug logging to get more information.

# Run as the irisadmin user
sudo su - irisadmin

# Start the service in debug mode
/usr/illumon/latest/bin/iris --debug start <process_name>

Persistent Query (PQ) management

Determine the active PQ controller

In a multi-controller setup, use this command to find which controller is the current leader.

/usr/illumon/latest/bin/dhconfig pq leader

Reload PQ controller configuration

Dynamically reload the PQ controller's configuration without a restart.

/usr/illumon/latest/bin/dhconfig pq reload

Internal tables

Deephaven uses several internal tables to log system events, metrics, and state. These tables are located in the DbInternal namespace and can be queried for monitoring and troubleshooting.

Key internal tables

  • AuditEventLog: Records security-related events such as logins, logouts, and permission changes.
  • ProcessEventLog: Contains detailed log messages from all Deephaven processes and workers. Essential for debugging.
  • PersistentQueryStateLog: Tracks the state changes of all Persistent Queries (e.g., starting, stopping, failing).
  • WorkspaceData: Stores user-created content, including dashboards, notebooks, and layouts.

Querying internal tables

You can query these tables using Python or Groovy. Here is a basic example to retrieve today's data from a table.

// Replace 'TableName' with the actual table name (e.g., AuditEventLog)
def table = db.liveTable("DbInternal", "TableName").where("Date=today()")
# Replace 'TableName' with the actual table name (e.g., AuditEventLog)
table = db.live_table("DbInternal", "TableName").where("Date=today()")

Table and storage management

Deephaven's data is stored on the filesystem, typically under the /db directory. Understanding the layout is key for administration.

Filesystem layout

The /db directory contains several key subdirectories:

  • /db/Systems: Stores historical data for system namespaces.
  • /db/Users: Stores data for user namespaces.
  • /db/Intraday: Stores real-time, intraday data for system namespaces.
  • /db/DbInternal: Contains internal system tables used for logging, monitoring, and state management.

Example directory structure

A simplified view of the directory structure for a partitioned table:

/db/Systems/
└── MyNamespace/
    └── MyTable/
        ├── Partitions/
        │   └── 0/
        │       ├── 2023-10-26/
        │       └── 2023-10-27/
        └── WritablePartitions/
            └── 0 -> ../Partitions/0
  • Table data is broken into partitions, often by date.
  • Each table has a .tbl metadata file (not shown) that defines its schema.

Performance tuning

JVM memory settings

View the current JVM memory settings for a Deephaven service:

# Find the Java process ID
pgrep -f 'db_query_server'

# View JVM memory settings
jcmd <PID> VM.flags | grep -E 'Xms|Xmx'

Adjust JVM memory settings in the service configuration file:

sudo vim /etc/sysconfig/illumon.d/db_query_server/irisdbserver.conf
# Edit the JAVA_OPTS line to adjust -Xms (initial heap) and -Xmx (max heap)
# Example: JAVA_OPTS="-Xms4g -Xmx8g ..."

System resource monitoring

Monitor system resources used by Deephaven processes:

# CPU and memory usage for all Deephaven processes
top -u irisadmin

# Disk I/O statistics
iostat -xz 1

# Network usage statistics
netstat -tpn | grep java

Networking configuration

For more information on Deephaven networking and proxy setup, see:

Verify network connectivity

Test connectivity to Deephaven services:

# Test basic HTTP connectivity
curl -Ik https://deephaven-host:8000/iris/

# Test WebSocket connectivity (requires wscat tool)
wscat -c wss://deephaven-host:8000/ws/iris/iris.api

Check open ports and services

# List all open ports and associated services
sudo netstat -tulpn | grep -E ':(8000|10000|2379)'

# Check if specific Deephaven ports are open
sudo ss -tulpn | grep -E ':(8000|10000|2379)'

Configure Envoy as a front proxy

Envoy can simplify network configuration by exposing only one external port for all Deephaven services:

# Check Envoy status if installed
sudo systemctl status envoy

# View Envoy configuration
cat /etc/envoy/envoy.yaml

# Restart Envoy after configuration changes
sudo systemctl restart envoy

Troubleshoot network issues

# Check firewall status
sudo firewall-cmd --list-all

# Test network latency to Deephaven server
ping -c 5 deephaven-host

# Check DNS resolution
nslookup deephaven-host

Health check commands

See also:

System health checks

Quickly check the health of the system and services:

# Check overall system resource usage
top -b -n 1

# Check disk space usage
df -h

# Check memory usage
free -h

# Check load average
uptime

Deephaven service health checks

Verify that Deephaven services are running properly:

# Use monit to check service health
sudo -u irisadmin monit status

# Check if query server is responding
curl -ik https://localhost:8000/iris/connection.json

# Check if authentication server is healthy
curl -ik https://localhost:10000/auth/health

Log health checks

Monitor logs for errors and warnings:

# Check for errors in query server logs
grep -i error /var/log/deephaven/query_server/db_query_server.log.current

# Check for warnings in authentication server logs
grep -i warn /var/log/deephaven/authentication_server/authentication_server.log.current

# Monitor logs in real-time
tail -f /var/log/deephaven/query_server/db_query_server.log.current

Database and storage checks

Verify database and storage systems:

# Check etcd cluster health
sudo -u irisadmin /usr/illumon/latest/bin/etcdctl.sh endpoint health --write-out table

# Check MariaDB status (if used for ACLs)
sudo systemctl status mariadb

# Check filesystem status where Deephaven data is stored
sudo xfs_info /db

Security hardening

See also:

File permissions audit

Verify and fix permissions on key Deephaven directories:

# Check permissions on config directories
ls -la /etc/deephaven/
ls -la /etc/sysconfig/illumon.d/

# Fix permissions if needed
sudo chown -R irisadmin:irisadmin /etc/deephaven/
sudo chmod 750 /etc/deephaven/

Authentication configuration

Verify and update authentication settings:

# Check if public key authentication is enabled
grep -A5 "auth.publickey.enabled" /etc/sysconfig/illumon.d/authentication_server/irisauth.properties

# Enable public key authentication
sudo sed -i 's/auth.publickey.enabled=false/auth.publickey.enabled=true/' /etc/sysconfig/illumon.d/authentication_server/irisauth.properties

# Set minimum password length (default is 8)
sudo sed -i 's/auth.password.min.length=8/auth.password.min.length=12/' /etc/sysconfig/illumon.d/authentication_server/irisauth.properties

Network security

Configure firewall to restrict access to Deephaven services:

# Check current firewall rules
sudo firewall-cmd --list-all

# Add rules to restrict access to Deephaven ports
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="trusted-subnet/24" port port="8000" protocol="tcp" accept'
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="trusted-subnet/24" port port="10000" protocol="tcp" accept'

# Reload firewall rules
sudo firewall-cmd --reload

System account security

Audit and secure system accounts:

# Check if irisadmin account is locked for direct login
sudo passwd -S irisadmin

# Lock system accounts not needed for direct login
sudo passwd -l irisadmin

# Check sudo permissions
sudo cat /etc/sudoers.d/deephaven