System administration

This cheat sheet provides quick access to common commands and procedures for system administrators.

Management

Service status and control

Check that all Deephaven services have started and are available.

Get a summary of all Deephaven services status:

Restart all Deephaven services:

Restart a specific service:

Start a service interactively with debug logging:

Verify cluster connection

etcd management

Check etcd endpoint status:

List etcd roles:

Defragment etcd node (to reclaim disk space):

MariaDB/MySQL management

If MySQL is used for ACLs, these commands are useful:

Certificate management

Inspect certificates

View a certificate's details including validity period and SAN names:

Check a remote server's certificate:

Verify hostname in a certificate:

Replace certificates

Replace Deephaven certificates on an infra server (non-Kubernetes):

Convert a DER certificate to PEM format:

Trust certificates in Java

Import a certificate to the Java truststore:

Create a PKCS12 truststore:

View certificates in a truststore:

Users & groups

See ACL management for more information.

Create a user

Delete a user

List users

Set a user password

Create a group

Delete a group

List groups

Add a user to a group

Remove a user from a group

Backup, restore, & migration

See the backup and restore guide for more information.

Run the backup script

This example creates a full backup in a specified directory and sets a 7-day retention policy for old backups.

Common backup script parameters:

  • -d, --directory: Specify the backup directory.
  • -a, --all: Back up everything (etcd, ACLs, PQs, schemas, properties, routing, workspace data).
  • -e, --etcd: Back up the etcd database.
  • -A, --ACLs: Back up Access Control Lists.
  • -p, --persistentQueries: Back up Persistent Queries.
  • -w, --workspaceData: Back up the WorkspaceData table.
  • -R, --retain: Delete backups older than a specified number of days.

Restart all services

After restoring a backup or migrating a system, restart all Deephaven services on all nodes.

Monitoring & troubleshooting

Check text logs

Standard text logs for Deephaven services are located in sub-directories under /var/log/deephaven.

View and search binary logs

The iriscat utility converts binary logs to a readable text format. This is useful when you cannot access a console.

Troubleshoot process startup

Check the status of all services managed by M/Monit.

If a service is failing, you can attempt to start it interactively with debug logging to get more information.

Persistent Query (PQ) management

Determine the active PQ controller

In a multi-controller setup, use this command to find which controller is the current leader.

Reload PQ controller configuration

Dynamically reload the PQ controller's configuration without a restart.

Internal tables

Deephaven uses several internal tables to log system events, metrics, and state. These tables are located in the DbInternal namespace and can be queried for monitoring and troubleshooting.

Key internal tables

  • AuditEventLog: Records security-related events such as logins, logouts, and permission changes.
  • ProcessEventLog: Contains detailed log messages from all Deephaven processes and workers. Essential for debugging.
  • PersistentQueryStateLog: Tracks the state changes of all Persistent Queries (e.g., starting, stopping, failing).
  • WorkspaceData: Stores user-created content, including dashboards, notebooks, and layouts.

Querying internal tables

You can query these tables using Python or Groovy. Here is a basic example to retrieve today's data from a table.

Table and storage management

Deephaven's data is stored on the filesystem, typically under the /db directory. Understanding the layout is key for administration.

Filesystem layout

The /db directory contains several key subdirectories:

  • /db/Systems: Stores historical data for system namespaces.
  • /db/Users: Stores data for user namespaces.
  • /db/Intraday: Stores real-time, intraday data for system namespaces.
  • /db/DbInternal: Contains internal system tables used for logging, monitoring, and state management.

Example directory structure

A simplified view of the directory structure for a partitioned table:

  • Table data is broken into partitions, often by date.
  • Each table has a .tbl metadata file (not shown) that defines its schema.

Performance tuning

JVM memory settings

View the current JVM memory settings for a Deephaven service:

Adjust JVM memory settings in the service configuration file:

System resource monitoring

Monitor system resources used by Deephaven processes:

Networking configuration

For more information on Deephaven networking and proxy setup, see:

Verify network connectivity

Test connectivity to Deephaven services:

Check open ports and services

Configure Envoy as a front proxy

Envoy can simplify network configuration by exposing only one external port for all Deephaven services:

Troubleshoot network issues

Health check commands

See also:

System health checks

Quickly check the health of the system and services:

Deephaven service health checks

Verify that Deephaven services are running properly:

Log health checks

Monitor logs for errors and warnings:

Database and storage checks

Verify database and storage systems:

Security hardening

See also:

File permissions audit

Verify and fix permissions on key Deephaven directories:

Authentication configuration

Verify and update authentication settings:

Network security

Configure firewall to restrict access to Deephaven services:

System account security

Audit and secure system accounts: