Plan your installation
Deephaven is a distributed system that can be installed on a single node or scaled to accommodate as many nodes as necessary for your specific use case. This guide will assist you in determining the number and types of nodes you will need. Each installation will include one infrastructure node, with the option of adding zero or more query nodes.
Infrastructure nodes
An infrastructure node hosts the core Deephaven services, which provide authentication, configuration, data ingestion and subscription capabilities, and orchestration. It also serves as a base Merge node for the system.
These nodes require fast, locally attached storage to support live data ingestion and subscription.
Query nodes
Query nodes host services that run Persistent Queries and Web UI Code Studio sessions and are the primary way to scale out a Deephaven installation.
Minimum system requirements
Below are the minimum requirements for nodes in a Deephaven cluster. These values are chosen to support basic data processing use cases and should be scaled up based on your expected user loads.
Operating System
- Red Hat 8 or 9
- Rocky 8 or 9
- Ubuntu 22.04 0r 24.04
For more details, see the Version support matrix.
CPU & RAM
The recommendation below is a good minimum estimate for a node that runs Deephaven infrastructure services and a query server. A more detailed breakdown of the memory requirements follows.
- x86_64 (64-bit)
- 8 Cores
- 128 GB RAM
Considerations for Infrastructure nodes
Infrastructure nodes run several core Deephaven services that have basic memory requirements. For more details about what each of these system services does, see the Architecture Overview and Process runbooks.
| Service | Memory |
|---|---|
| Web API Service | 4G |
| ACL Write Server | 1G |
| Root Tailer | 4G |
| Configuration Server | 4G |
| Log Aggregator | 4G |
| Data Import Server | 16G |
| Authentication Server | 1G |
| Persistent Query Controller | 4G |
| Status Dashboard | 2G |
| TOTAL: | 40G |
If you enable workers to run on the infrastructure node, you should combine this estimate with the CPU and memory estimates for query and merge servers.
Considerations for Query and Merge servers
Deephaven workers and Persistent Queries allocate heap space on the servers they run on. The amount of heap allocated to a worker is configurable. Complex queries that manipulate larger data sets require more heap, which needs to be accounted for when sizing RAM for your servers.
A good starting point assumes a 4GB Table Data Cache Proxy (TDCP) and that each concurrent user worker and Persistent Query will consume at least 8GB. For example, if you expect to serve 10 users running 2 workers each and 10 Persistent Queries, you would need at least:
.
Similarly, for CPU cores, you should allocate at least one CPU core per expected Persistent Query.
Storage
At a minimum, the Infrastructure server requires locally attached high-speed storage for streaming data. All query servers require shared storage, such as NFS, to access historical data. Deephaven stores data in the following locations:
-
Intraday data:
- Mount:
/db/Intraday - Size: Sufficient to store several days' worth of ingested data.
- Type: Low-latency direct-attached; an SSD is ideal.
- Needed on servers directly ingesting data.
- Mount:
-
Historical data:
- Mount:
/db/Systems - Size: Depends on the size of the historical data set.
- Type: Network attached storage shareable between query servers (i.e., NFS).
- Mount:
-
Binary Log Data:
- Mount:
/var/log/deephaven/binlogs - Size: Sufficient to store at least one day's streamed data.
- Type: Low-latency directly attached SSD is ideal.
- Mount:
-
Process Log Files:
- Mount:
/var/log/deephavenother than binlogs - Size: 20GB Minimum
- Type: Locally attached disk.
- Mount:
-
Enterprise Binary Files:
- Mount:
/usr/illumon - Size: 1GB Minimum
- Type: Locally attached SSD is ideal
- Mount:
-
Core+ Binary Files:
- Mount:
/usr/illumon/coreplus - Size: 5GB Minimum
- Type: Locally attached SSD is ideal
- Mount:
-
Deephaven Configuration Files:
- Mount:
/etc/sysconfig/deephaven - Size: 100MB Minimum
- Type: Locally attached disk SSD is ideal
- Mount:
For example, let's say that you are planning a production system that will ingest the following data tables:
| Table | Expected # rows | Expected row size (bytes) | Daily total (bytes) |
|---|---|---|---|
| Quotes | 500 million | 72 bytes | 34 GB |
| Trades | 50 million | 48 bytes | 2 GB |
| Support | 2 million | 256 bytes | .5 GB |
This totals per day.
You want to store:
- One full day's worth of Intraday data.
- One year's worth of historical data.
- One week's worth of log files.
You would need the following, with a 20% safety margin:
- for Intraday storage
- for Binary Log storage
- for historical storage
- 20GB for process log files
Selecting a topology
While you could create a simple Deephaven installation on a single node, a production system will typically include one infrastructure node and two or more query nodes. Multiple query nodes allow you to distribute user and Persistent Query loads across multiple physical servers and provide redundancy through features like Persistent Query Replicas and Spares. A more complete guide on how to scale Deephaven can be found in the Scaling guide.
External Access
Deephaven uses a handful of ports for internal and external communication. The Ops Guide describes the required ports. Deephaven can be configured to use the Envoy front proxy so that only a single port needs to be exposed for external access. This is discussed in further detail in the Installation guide and the Envoy documentation.
Kubernetes-specific considerations
When deploying Deephaven on Kubernetes, resource planning differs from traditional bare-metal or VM-based installations. Instead of provisioning individual nodes with specific resources, you must ensure your Kubernetes cluster has sufficient aggregate resources.
Minimum Kubernetes cluster resources
A minimal deployment on Kubernetes requires:
| Resource Category | Requirement | Notes |
|---|---|---|
| Cluster IPs | 20, plus one for reach additional worker | For internal service communication |
| External IPs | 1 | For LoadBalancer service (Envoy ingress) |
| Total Memory | 60GB minimum | Distributed across all pods |
| Total CPU | 10 cores minimum | Distributed across all pods |
Note
These are baseline minimums for a test deployment. Production clusters should be sized significantly larger to accommodate:
- Additional worker pods for concurrent Code Studio sessions.
- Persistent Query execution with appropriate heap allocation.
- Data processing workloads specific to your use case.
- High availability and redundancy requirements.
Translating traditional requirements to Kubernetes
The CPU and memory requirements described earlier in this guide for infrastructure and query nodes still apply, but in Kubernetes they are allocated across multiple pods:
- Infrastructure services (Web API, Configuration Server, etc.) run as separate pods with dedicated resource requests.
- etcd cluster runs as 3 separate pods for high availability.
- Worker pods are dynamically created for Code Studio sessions and Persistent Queries.
- Persistent Query Controller manages worker orchestration.
Resource scaling guidance
To size your Kubernetes cluster for production use, start with the baseline 60GB memory and 9.25 CPU cores, then add:
- For concurrent users: Add 8GB memory and 1 CPU core per expected concurrent Code Studio session.
- For Persistent Queries: Add 8GB memory and 1 CPU core per Persistent Query, plus replicas if using PQ redundancy.
- For data processing: Add additional resources based on expected data volumes and processing complexity.
For example, a production cluster supporting 20 concurrent users and 15 Persistent Queries would need approximately:
- Memory:
- CPU: cores
Storage in Kubernetes
Kubernetes deployments require:
- A read-write-many (RWX) persistent volume for shared data access across pods (equivalent to the
/db/SystemsNFS mount in traditional deployments). - Read-write-once (RWO) persistent volumes for etcd data and backups.
- Storage classes configured for your cloud provider (see the Kubernetes installation guide for provider-specific storage class names).
Network considerations
Unlike traditional deployments where you manage ports and firewalls directly, Kubernetes deployments use:
- Cluster IPs for internal service-to-service communication.
- LoadBalancer service (requiring one External IP) for the user-facing Envoy proxy endpoint.
- Network policies (optional) for additional security controls.
The Envoy proxy LoadBalancer service handles all external user traffic through a single External IP, eliminating the need to expose multiple ports.