System overview
This guide provides procedures for system administrators to meet the SLA requirements for reliable and high-performing Deephaven production deployments.
A production-deployed Deephaven system comprises various Network, Storage and Application components.
- The Network services include IP subnets and associated services (DNS, NTP, etc.).
- The Storage components consist of Intraday data on high speed local SSD volumes and Historical data on NFS disk mounts exported from a highly available storage system.
- The Application components are deployed across multiple nodes, where each node is a x86_64 (64-bit) Linux physical or virtual server.
For more details on deployment architecture, please refer to Scaling to Multiple Servers.
List of Deephaven services
A Deephaven system comprises the following processes.
Process Name | Run User | Log Prefix |
---|---|---|
authentication_server | irisadmin | AuthenticationServer |
db_acl_write_server | irisadmin | DbAclWriteServer |
db_tdcp | dbquery | TableDataCacheProxy |
configuration_server | irisadmin | ConfigurationServer |
iris_controller | irisadmin | PersistentQueryController |
log_aggregator_service | irisadmin | LogAggregatorService |
tailer1 | irisadmin | LogtailerMain |
db_dis | dbmerge | DataImportServer |
db_merge_server | dbmerge | RemoteQueryDispatcher |
db_ltds | dbquery | LocalTableDataServer |
db_query_server | dbquery | RemoteQueryDispatcher |
web_api_service | dbquery | WebServer |
Other Processes | Run User | Executable |
---|---|---|
client_update_service | lighttpd | /sbin/lighttpd |
MariaDB | mysql | /usr/libexec/mysqld |
Troubleshooting procedures
Deephaven is a highly robust and fault-tolerant system. In the event of temporary errors, the system is designed to retry or restart components to recover from errors without much intervention from administrators and operators.
However, even in the most controlled IT environments, failures and incidents do occur. In the event of an outage or non-recoverable error of any component making up the Deephaven system, administrators can perform the basic system checks described in this guide. These include network connectivity checks, firewall rules, configuration file settings, etc.
Administrators should be familiar with the components of the Deephaven system and how to perform basic component checks, including how to stop and start services, view log files and other component diagnostics.
Typical incidents are often caused by file permissions or other system settings that are easily resolved. If you are unable to resolve any incident with basic checks or restarts, the Deephaven Support Team is available to help.
Note
See: Getting help
Prerequisites for performing admin tasks
Deephaven Operating System users
When Deephaven is installed, three Linux users are created with the minimal operating system permissions needed to run their respective process.
irisadmin
- OS user for running Deephaven Admin processes.dbquery
- OS user for database query (read) processes.dbmerge
- OS user for database import and merge (read-write) processes.
SSH Terminal access
Certain administrative tasks described in this guide require SSH terminal access to the Deephaven compute nodes. As an administrator, please ensure you have SSH access to each server node in the Deephaven cluster using your own operating system user and not any of the users created during the installation.
Sudo access
Many administrative tasks and Deephaven commands will also require sudo
for the admin user, e.g., to mount or unmount NFS file shares, edit configuration files, or set folder permissions.
Please update the /etc/sudoers
file to ensure the administration user can run all commands on the system.
Deephaven Console access
Some administrative tasks, such as adding end-users, viewing query logs or performance metrics stored in the database are done using the Deephaven Console Client GUI.
As an administrator, please ensure you have the Deephaven Console installed on your workstation. For more details on installing the Deephaven Console, please refer to Installing the Launcher.
Network services
This section describes some basic network checks administrators should verify for a healthy cluster.
Basic network service checks
System administrators need to be familiar with common Linux utilities to configure and verify that the Deephaven cluster network and associated services are functioning properly. Please refer to the Linux man pages for details on any Linux commands and services.
- Check that all servers in the Deephaven cluster have their clocks synchronized using NTP service.
- A Deephaven cluster requires high speed (Gigabit) IP network connectivity from clients and between cluster nodes. Tools such as ping, traceroute and iperf can be used to check bandwidth limits and network latency between clients and server nodes.
- Deephaven processes require connectivity to various ports on cluster nodes. The Deephaven configuration files specify the host names or IP addresses and ports used by the various Deephaven components. Administrators need to check that DNS is working correctly if hostnames or fully qualified domain names are used in config files instead of IP addresses. DNS tools such as ping, dig, nslookup, etc. can be used to verify nodes in the Deephaven cluster are able to communicate with each other.
- The Deephaven Console Client GUI installed on end-users' workstations will also require IP network connectivity to a range of ports to the server nodes. The Administrators need to check network routes and firewall settings to ensure the specified ports are open between server processes and clients. Diagnostic tools such as netstat, netcat and nmap can be used to verify port accessibility between nodes and from clients.
Deephaven process ports
Deephaven processes listen on various TCP ports and port ranges. All ports are configurable.
TCP ports can be configured in the property files:
/etc/sysconfig/illumon.d/resources/*.prop
The default Deephaven ports and port ranges follow.
TCP Ports:
- 22013
- 22012
- 8084
Component: Remote Query Dispatcher (RQD)
Process name: db_query_server
Properties:
RemoteQueryDispatcherParameters.queryPort=22013
RemoteQueryDispatcher.workerPort=22012
RemoteQueryDispatcher.webPort=8084
TCP Ports:
- 30002
- 30003
- 8085
Component: Remote Merge Dispatcher (RMD)
Process name: db_merge_server
Properties:
RemoteQueryDispatcherParameters.queryPort=30002
RemoteQueryDispatcher.workerPort=30003
RemoteQueryDispatcher.webPort=8085
TCP Ports: 23000-24999
Component: RQD Workers
Process name: RemoteQueryDispatcher_worker_<number>
Properties:
RemoteQueryDispatcher.workerServerPorts=23000-23999
RemoteQueryDispatcher.workerServerWebsocketPorts=24000-24999
TCP Ports:
- 32000-32999
- 25000-25999
Component: RMD Workers
Process name: RemoteQueryDispatcher_worker_<number>
Properties:
RemoteQueryDispatcher.workerServerPorts=32000-32999
RemoteQueryDispatcher.workerServerWebsocketPorts=25000-25999
TCP Ports:
- 22021
- 22015
Component: Data Import Server
Process name: db_dis
Properties:
routing_service.yml:
dataImportServers:
db_dis:
tailerPort: 22021
tableDataPort: 22015
TCP Ports: 22020
Component: Log Aggregator Server
Process name: log_aggregator_service
Properties:
routing_service.yml:
logAggregatorServers:
log_aggregator_service:
port: 22020
TCP Ports: 22014
Component: Local Data Table
Process name: db_ltds
Properties:
routing_service.yml:
tableDataServices:
db_ltds:
port: 22014
TCP Ports: 22016
Component: Table Data Cache Proxy
Process name: db_tdcp
Properties:
routing_service.yml:
tableDataServices:
db_tdcp:
port: 22016
TCP Ports: 22023
Component: Configuration Server (Centralized Schema Service, Data Routing Service and Configuration Service)
Process name: configuration_server
Properties: configuration.server.port=22023
TCP Ports:
- 9030
- 9031
Component: User Authentication Server
Process name: authentication_server
Properties:
authentication.server.port.plaintext=9030
authentication.server.port.ssl=9031
TCP Ports:
- 9040
- 9041
Component: User Access Control Server
Process name: db_acl_write_server
Properties:
dbaclwriter.port=9040
dbaclwriter.ssl.port=9041
TCP Ports: 20126
Component: Persistent Query Controller
Process name: iris_controller
Properties: PersistentQueryController.port=20126
TCP Ports: 20021 (outbound)
Component: Log Tailer
Process name: tailer1..tailerN
Properties: This is determined by the db_dis
section (see above).
TCP Ports: 80/443
Component: Client Update Server
Process name: client_update_service
Properties:
client-update-service.conf:
server.port = 80
TCP Ports: 8123
Component: Web API Service
Process name: web_api_service
Properties: Webapi.server.port=8123
TCP Ports: 3306
Component: MySQL Database
Process name: mariadb_server
Properties:
/etc/my.cnf:
port=3306 # default
TCP Ports: 2812
Component: M/Monit Daemon
Process name: monit
Properties:
/etc/monitrc:
set httpd port 2812
Contributing applications, daemons, and services
Deephaven includes a few third-party components. For example, Monit has already been discussed in the Starting and stopping Deephaven Services section.
These third-party components are installed on the system as prerequisites prior to performing the Deephaven installation. Examples of these components include lighttpd, Python, and mariadb/MySQL. Please refer to the Deephaven installation guide for a complete list of third-party Deephaven package dependencies.
All of the license agreements, maintenance, troubleshooting and run-books for these external contributing applications, daemons and services are described in their respective and official online documentation.
Routine procedures and operations for Deephaven system administrators