Plan your installation
Deephaven is a distributed system that can be installed on a single node or scaled to accommodate as many nodes as necessary for your specific use case. This guide will assist you in determining the number and types of nodes you will need. Each installation will include one Infrastructure node, with the option of adding zero or more Query nodes.
Infrastructure nodes
An Infrastructure node hosts the core Deephaven services, which provide authentication, configuration, data ingestion and subscription capabilities, and orchestration. It also serves as a base Merge node for the system.
These nodes require fast, locally attached storage to support live data ingestion and subscription.
Query Nodes
Query nodes host services that run Persistent Queries and Web UI Code Studio sessions and are the primary way to scale out a Deephaven installation.
Minimum system requirements
Below are the minimum requirements for nodes in a Deephaven cluster. These values are chosen to support basic data processing use cases and should be scaled up based on your expected user loads.
Operating System
- Red Hat 8 or 9
- Rocky 8 or 9
- Ubuntu 22.04 0r 24.04
For more details, see the Version support matrix.
CPU & RAM
The recommendation below is a good minimum estimate for a node that runs Deephaven Infrastructure services and a Query server. A more detailed breakdown of the memory requirements follows.
- x86_64 (64-bit)
- 8 Cores
- 128 GB RAM
Considerations for Infrastructure nodes
Infrastructure nodes run several core Deephaven services that have basic memory requirements. If you run either of those processes on the infrastructure node, you should combine this estimate with the CPU and memory estimates for Query and Merge servers.
Basic service requirements:
Service | Memory |
---|---|
Web API Service | 4G |
ACL Write Server | 1G |
Root Tailer | 4G |
Configuration Server | 4G |
Table Data Cache Proxy | 4G |
Log Aggregator | 4G |
Data Import Server | 16G |
Authentication Server | 1G |
Persistent Query Controller | 4G |
Status Dashboard | 2G |
TOTAL: | 44G |
Considerations for Query and Merge servers
Deephaven workers and Persistent Queries allocate heap space on the servers they run on. The amount of heap allocated to a worker is configurable. Complex queries that manipulate larger data sets require more heap, which needs to be accounted for when sizing RAM for your servers.
A good starting point assumes a 4GB Table Data Cache Proxy (TDCP) and that each concurrent user worker and Persistent Query will consume at least 8GB. For example, if you expect to serve 10 users running 2 workers each and 10 Persistent Queries, you would need at least:
.
Similarly, for CPU cores, you should allocate at least one CPU core per expected Persistent Query.
Storage
At a minimum, the Infrastructure server requires locally attached high-speed storage for streaming data. All query servers require shared storage, such as NFS, to access historical data. Deephaven stores data in the following locations:
-
Intraday data:
- Mount:
/db/Intraday
- Size: Sufficient to store several days' worth of ingested data.
- Type: Low-latency direct-attached; an SSD is ideal.
- Needed on servers directly ingesting data.
- Mount:
-
Historical data:
- Mount:
/db/Systems
- Size: Depends on the size of the historical data set.
- Type: Network attached storage shareable between query servers (i.e., NFS).
- Mount:
-
Binary Log Data:
- Mount:
/var/log/deephaven/binlogs
- Size: Sufficient to store at least one day's streamed data.
- Type: Low-latency directly attached SSD is ideal.
- Mount:
-
Process Log Files:
- Mount:
/var/log/deephaven
other than binlogs - Size: 20GB Minimum
- Type: Locally attached disk.
- Mount:
-
Enterprise Binary Files:
- Mount:
/usr/illumon
- Size: 1GB Minimum
- Type: Locally attached SSD is ideal
- Mount:
-
Core+ Binary Files:
- Mount:
/usr/illumon/coreplus
- Size: 5GB Minimum
- Type: Locally attached SSD is ideal
- Mount:
-
Deephaven Configuration Files:
- Mount:
/etc/sysconfig/deephaven
- Size: 100MB Minimum
- Type: Locally attached disk SSD is ideal
- Mount:
For example, let's say that you are planning a production system that will ingest the following data tables:
Table | Expected # rows | Expected row size (bytes) | Daily total (bytes) |
---|---|---|---|
Quotes | 500 million | 72 bytes | 34 GB |
Trades | 50 million | 48 bytes | 2 GB |
Support | 2 million | 256 bytes | .5 GB |
This totals per day.
You want to store:
- One full day's worth of Intraday data.
- One year's worth of historical data.
- One week's worth of log files.
You would need the following, with a 20% safety margin:
- for Intraday storage
- for Binary Log storage
- for historical storage
- 20GB for process log files
Selecting a topology
While you could create a simple Deephaven installation on a single node, a production system will typically include one Infrastructure node and two or more Query nodes. Multiple query nodes allow you to distribute user and Persistent Query loads across multiple physical servers and provide redundancy through features like Persistent Query Replicas and Spares. A more complete guide on how to scale Deephaven can be found in the Scaling guide.
External Access
Deephaven uses a handful of ports for internal and external communication. The Ops Guide describes the required ports. Deephaven can be configured to use the Envoy front proxy so that only a single port needs to be exposed for external access. This is discussed in further detail in the Installation guide and the Envoy documentation.