System overview

Routine procedures and operations for Deephaven system administrators

This guide provides procedures for system administrators to meet the SLA requirements for reliable and high-performing Deephaven production deployments.

A production-deployed Deephaven system comprises various Network, Storage and Application components.

The Network services include IP subnets and associated services (DNS, NTP, etc.).
The Storage components consist of Intraday data on high speed local SSD volumes and Historical data on NFS disk mounts exported from a highly available storage system.
The Application components are deployed across multiple nodes, where each node is a x86_64 (64-bit) Linux physical or virtual server.

For more details on deployment architecture, please refer to Scaling to Multiple Servers.

List of Deephaven services

A Deephaven system comprises the following processes.

Process Name	Run User	Log Prefix
`authentication_server`	`irisadmin`	`AuthenticationServer`
`db_acl_write_server`	`irisadmin`	`DbAclWriteServer`
`db_tdcp`	`dbquery`	`TableDataCacheProxy`
`configuration_server`	`irisadmin`	`ConfigurationServer`
`iris_controller`	`irisadmin`	`PersistentQueryController`
`log_aggregator_service`	`irisadmin`	`LogAggregatorService`
`tailer1`	`irisadmin`	`LogtailerMain`
`db_dis`	`dbmerge`	`DataImportServer`
`db_merge_server`	`dbmerge`	`RemoteQueryDispatcher`
`db_ltds`	`dbquery`	`LocalTableDataServer`
`db_query_server`	`dbquery`	`RemoteQueryDispatcher`
`web_api_service`	`dbquery`	`WebServer`

Other Processes	Run User	Executable
`client_update_service`	`lighttpd`	`/sbin/lighttpd`
`MariaDB`	`mysql`	`/usr/libexec/mysqld`

Troubleshooting procedures

Deephaven is a highly robust and fault-tolerant system. In the event of temporary errors, the system is designed to retry or restart components to recover from errors without much intervention from administrators and operators.

However, even in the most controlled IT environments, failures and incidents do occur. In the event of an outage or non-recoverable error of any component making up the Deephaven system, administrators can perform the basic system checks described in this guide. These include network connectivity checks, firewall rules, configuration file settings, etc.

Administrators should be familiar with the components of the Deephaven system and how to perform basic component checks, including how to stop and start services, view log files and other component diagnostics.

Typical incidents are often caused by file permissions or other system settings that are easily resolved. If you are unable to resolve any incident with basic checks or restarts, the Deephaven Support Team is available to help.

Note

See: Getting help

Prerequisites for performing admin tasks

Deephaven Operating System users

When Deephaven is installed, three Linux users are created with the minimal operating system permissions needed to run their respective process.

irisadmin - OS user for running Deephaven Admin processes.
dbquery - OS user for database query (read) processes.
dbmerge - OS user for database import and merge (read-write) processes.

SSH Terminal access

Certain administrative tasks described in this guide require SSH terminal access to the Deephaven compute nodes. As an administrator, please ensure you have SSH access to each server node in the Deephaven cluster using your own operating system user and not any of the users created during the installation.

Sudo access

Many administrative tasks and Deephaven commands will also require sudo for the admin user, e.g., to mount or unmount NFS file shares, edit configuration files, or set folder permissions.

Please update the /etc/sudoers file to ensure the administration user can run all commands on the system.

Deephaven Console access

Some administrative tasks, such as adding end-users, viewing query logs or performance metrics stored in the database are done using the Deephaven Console Client GUI.

As an administrator, please ensure you have the Deephaven Console installed on your workstation. For more details on installing the Deephaven Console, please refer to Installing the Launcher.

Network services

This section describes some basic network checks administrators should verify for a healthy cluster.

Basic network service checks

System administrators need to be familiar with common Linux utilities to configure and verify that the Deephaven cluster network and associated services are functioning properly. Please refer to the Linux man pages for details on any Linux commands and services.

Check that all servers in the Deephaven cluster have their clocks synchronized using NTP service.
A Deephaven cluster requires high speed (Gigabit) IP network connectivity from clients and between cluster nodes. Tools such as ping, traceroute and iperf can be used to check bandwidth limits and network latency between clients and server nodes.
Deephaven processes require connectivity to various ports on cluster nodes. The Deephaven configuration files specify the host names or IP addresses and ports used by the various Deephaven components. Administrators need to check that DNS is working correctly if hostnames or fully qualified domain names are used in config files instead of IP addresses. DNS tools such as ping, dig, nslookup, etc. can be used to verify nodes in the Deephaven cluster are able to communicate with each other.
The Deephaven Console Client GUI installed on end-users' workstations will also require IP network connectivity to a range of ports to the server nodes. The Administrators need to check network routes and firewall settings to ensure the specified ports are open between server processes and clients. Diagnostic tools such as netstat, netcat and nmap can be used to verify port accessibility between nodes and from clients.

Deephaven process ports

Deephaven processes listen on various TCP ports and port ranges. All ports are configurable.

TCP ports can be configured in the property files:

/etc/sysconfig/illumon.d/resources/*.prop

The default Deephaven ports and port ranges follow.

TCP Ports:

22013
22012
8084

Component: Remote Query Dispatcher (RQD)

Process name: db_query_server

Properties:

RemoteQueryDispatcherParameters.queryPort=22013
RemoteQueryDispatcher.workerPort=22012
RemoteQueryDispatcher.webPort=8084

TCP Ports:

30002
30003
8085

Component: Remote Merge Dispatcher (RMD)

Process name: db_merge_server

Properties:

RemoteQueryDispatcherParameters.queryPort=30002
RemoteQueryDispatcher.workerPort=30003
RemoteQueryDispatcher.webPort=8085

TCP Ports: 23000-24999

Component: RQD Workers

Process name: RemoteQueryDispatcher_worker_<number>

Properties:

RemoteQueryDispatcher.workerServerPorts=23000-23999
RemoteQueryDispatcher.workerServerWebsocketPorts=24000-24999

TCP Ports:

32000-32999
25000-25999

Component: RMD Workers

Process name: RemoteQueryDispatcher_worker_<number>

Properties:

RemoteQueryDispatcher.workerServerPorts=32000-32999
RemoteQueryDispatcher.workerServerWebsocketPorts=25000-25999

TCP Ports:

22021
22015

Component: Data Import Server

Process name: db_dis

Properties:

routing_service.yml:

dataImportServers:
   db_dis:
     tailerPort: 22021
     tableDataPort: 22015

TCP Ports: 22020

Component: Log Aggregator Server

Process name: log_aggregator_service

Properties:

routing_service.yml:

logAggregatorServers:
  log_aggregator_service:
     port: 22020

TCP Ports: 22014

Component: Local Data Table

Process name: db_ltds

Properties:

routing_service.yml:

 tableDataServices:
   db_ltds:
     port: 22014

TCP Ports: 22016

Component: Table Data Cache Proxy

Process name: db_tdcp

Properties:

routing_service.yml:

tableDataServices:
  db_tdcp:
     port: 22016

TCP Ports: 22023

Component: Configuration Server (Centralized Schema Service, Data Routing Service and Configuration Service)

Process name: configuration_server

Properties: configuration.server.port=22023

TCP Ports:

9030
9031

Component: User Authentication Server

Process name: authentication_server

Properties:

authentication.server.port.plaintext=9030
authentication.server.port.ssl=9031

TCP Ports:

9040
9041

Component: User Access Control Server

Process name: db_acl_write_server

Properties:

dbaclwriter.port=9040
dbaclwriter.ssl.port=9041

TCP Ports: 20126

Component: Persistent Query Controller

Process name: iris_controller

Properties: PersistentQueryController.port=20126

TCP Ports: 20021 (outbound)

Component: Log Tailer

Process name: tailer1..tailerN

Properties: This is determined by the db_dis section (see above).

TCP Ports: 80/443

Component: Client Update Server

Process name: client_update_service

Properties:

client-update-service.conf:

  server.port = 80

TCP Ports: 8123

Component: Web API Service

Process name: web_api_service

Properties: Webapi.server.port=8123

TCP Ports: 3306

Component: MySQL Database

Process name: mariadb_server

Properties:

/etc/my.cnf:

  port=3306 # default

TCP Ports: 2812

Component: M/Monit Daemon

Process name: monit

Properties:

/etc/monitrc:

  set httpd port 2812

Contributing applications, daemons, and services

Deephaven includes a few third-party components. For example, Monit has already been discussed in the Starting and stopping Deephaven Services section.

These third-party components are installed on the system as prerequisites prior to performing the Deephaven installation. Examples of these components include lighttpd, Python, and mariadb/MySQL. Please refer to the Deephaven installation guide for a complete list of third-party Deephaven package dependencies.

All of the license agreements, maintenance, troubleshooting and run-books for these external contributing applications, daemons and services are described in their respective and official online documentation.