Plan your installation

Deephaven is a distributed system that can be installed on a single node or scaled to accommodate as many nodes as necessary for your specific use case. This guide will assist you in determining the number and types of nodes you will need. Each installation will include one Infrastructure node, with the option of adding zero or more Query nodes.

Infrastructure nodes

An Infrastructure node hosts the core Deephaven services, which provide authentication, configuration, data ingestion and subscription capabilities, and orchestration. It also serves as a base Merge node for the system.

These nodes require fast, locally attached storage to support live data ingestion and subscription.

Query Nodes

Query nodes host services that run Persistent Queries and Web UI Code Studio sessions and are the primary way to scale out a Deephaven installation.

Minimum system requirements

Below are the minimum requirements for nodes in a Deephaven cluster. These values are chosen to support basic data processing use cases and should be scaled up based on your expected user loads.

Operating System

Red Hat 8 or 9
Rocky 8 or 9
Ubuntu 22.04 0r 24.04

For more details, see the Version support matrix.

CPU & RAM

The recommendation below is a good minimum estimate for a node that runs Deephaven Infrastructure services and a Query server. A more detailed breakdown of the memory requirements follows.

x86_64 (64-bit)
8 Cores
128 GB RAM

Considerations for Infrastructure nodes

Infrastructure nodes run several core Deephaven services that have basic memory requirements:

Service	Memory
Web API Service	4G
ACL Write Server	1G
Root Tailer	4G
Configuration Server	4G
Log Aggregator	4G
Data Import Server	16G
Authentication Server	1G
Persistent Query Controller	4G
Status Dashboard	2G
TOTAL:	40G

If you enable workers to run on the infrastructure node, you should combine this estimate with the CPU and memory estimates for Query and Merge servers.

Considerations for Query and Merge servers

Deephaven workers and Persistent Queries allocate heap space on the servers they run on. The amount of heap allocated to a worker is configurable. Complex queries that manipulate larger data sets require more heap, which needs to be accounted for when sizing RAM for your servers.

A good starting point assumes a 4GB Table Data Cache Proxy (TDCP) and that each concurrent user worker and Persistent Query will consume at least 8GB. For example, if you expect to serve 10 users running 2 workers each and 10 Persistent Queries, you would need at least:

$TDCP + 8GB*((2*N_{users}) + N_{PQ}) = 4GB + 8GB*((2*10) + 10) = 244GB$ .

Similarly, for CPU cores, you should allocate at least one CPU core per expected Persistent Query.

Storage

At a minimum, the Infrastructure server requires locally attached high-speed storage for streaming data. All query servers require shared storage, such as NFS, to access historical data. Deephaven stores data in the following locations:

Intraday data:
- Mount: /db/Intraday
- Size: Sufficient to store several days' worth of ingested data.
- Type: Low-latency direct-attached; an SSD is ideal.
- Needed on servers directly ingesting data.
Historical data:
- Mount: /db/Systems
- Size: Depends on the size of the historical data set.
- Type: Network attached storage shareable between query servers (i.e., NFS).
Binary Log Data:
- Mount: /var/log/deephaven/binlogs
- Size: Sufficient to store at least one day's streamed data.
- Type: Low-latency directly attached SSD is ideal.
Process Log Files:
- Mount: /var/log/deephaven other than binlogs
- Size: 20GB Minimum
- Type: Locally attached disk.
Enterprise Binary Files:
- Mount: /usr/illumon
- Size: 1GB Minimum
- Type: Locally attached SSD is ideal
Core+ Binary Files:
- Mount: /usr/illumon/coreplus
- Size: 5GB Minimum
- Type: Locally attached SSD is ideal
Deephaven Configuration Files:
- Mount: /etc/sysconfig/deephaven
- Size: 100MB Minimum
- Type: Locally attached disk SSD is ideal

For example, let's say that you are planning a production system that will ingest the following data tables:

Table	Expected # rows	Expected row size (bytes)	Daily total (bytes)
Quotes	500 million	72 bytes	34 GB
Trades	50 million	48 bytes	2 GB
Support	2 million	256 bytes	.5 GB

This totals $34GB + 2GB + .5GB = 36.5GB$ per day.

You want to store:

One full day's worth of Intraday data.
One year's worth of historical data.
One week's worth of log files.

You would need the following, with a 20% safety margin:

$36.5GB + 20\% = 43.8GB$ for Intraday storage
$36.5GB + 20\% = 43.8GB$ for Binary Log storage
$(36.5GB * 365) + 20\% = 16.8 TB$ for historical storage
20GB for process log files

Selecting a topology

While you could create a simple Deephaven installation on a single node, a production system will typically include one Infrastructure node and two or more Query nodes. Multiple query nodes allow you to distribute user and Persistent Query loads across multiple physical servers and provide redundancy through features like Persistent Query Replicas and Spares. A more complete guide on how to scale Deephaven can be found in the Scaling guide.

External Access

Deephaven uses a handful of ports for internal and external communication. The Ops Guide describes the required ports. Deephaven can be configured to use the Envoy front proxy so that only a single port needs to be exposed for external access. This is discussed in further detail in the Installation guide and the Envoy documentation.