Add a dedicated Data Import Server
This guide explains how to add a host machine to an existing Deephaven Enterprise cluster for the purpose of running a dedicated Data Import Server (DIS).
Adding a dedicated DIS is a common scaling strategy to isolate data ingestion workloads, improve performance, and increase capacity. This is especially useful for:
- High-volume, real-time data feeds.
- Isolating different data sources from each other.
- Offloading ingestion processing from query or administrative servers.
This process involves updating the cluster configuration to recognize the new machine, running the installer, and then configuring the DIS and data routing.
Prerequisites
- An existing, operational Deephaven Enterprise cluster.
- A server that will run the new DIS process, which should be one of the following:
- A new bare-metal server or virtual machine provisioned with a supported OS (e.g., Rocky Linux, Ubuntu) and network connectivity to the existing cluster nodes. See Installation Planning for system requirements.
- A node in the cluster to which you will add the DIS role.
- SSH access to the new machine from your installation host, using a service account with
sudoprivileges. - Your current
cluster.cnffile and installer media, preferably managed under source control. - Familiarity with the Deephaven installation process as described in the Basic Installation Guide.
Step order
The order of steps depends on whether you're adding a DIS role to an existing node or creating a new node.
Adding a DIS role to an existing node
Follow steps in this order:
- Plan data storage and access (Step 1).
- Configure the DIS and data routing (Step 2).
- Update cluster configuration (Step 3).
- Configure the DIS process on the node (Step 4).
- Run the installer (Step 5).
- Verify (Step 6).
Rationale: Configure everything first, then run the installer to apply changes. The DIS process may fail to start initially but will start correctly after you configure it in Step 4.
Adding a new node to the cluster
Follow steps in this order:
- Plan data storage and access (Step 1).
- Configure the DIS and data routing (Step 2) - run on an existing node since the new node doesn't exist yet.
- Update cluster configuration (Step 3).
- Run the installer to create the node (Step 5).
- Configure the DIS process on the new node (Step 4).
- Verify (Step 6).
Rationale: You can configure the DIS and data routing before the node exists by running dhconfig commands on an existing node (the infra node is a good choice). This allows you to define what data the new DIS will handle. After creating the node with the installer, configure how the db_dis process runs on that node.
About dhconfig commands: You can run the dhconfig commands in Step 2 on any installed node in the cluster. The infra node is a good choice because it typically has the default superuser authentication key configured. If you run dhconfig on a different node, you may need to supply authentication options.
Configuration steps
Step 1: Plan storage
The DIS writes ingested data to a storage location defined in the data routing configuration. The default DIS uses /db/Intraday/. For a dedicated DIS, Deephaven recommends /db/dataImportServers/[DIS_Name]/.
Note
Create the storage directory before importing the data routing configuration to avoid errors. The directory must be owned by and writable by the DIS process user (typically dbmerge).
This storage should be fast and local to this host. Shared storage can lead to performance issues and data corruption when other processes can write to the same location. If the data will be merged to historical, this node should include the merge server role for best performance.
Data routing must be configured so that tailers send data to this DIS, and table data is requested from this DIS. The simplest mechanism is claims, but you can configure data routing in other ways.
Step 2: Configure data routing
This step configures what data the new DIS will handle (which tables or namespaces it will manage). You'll create a named DIS configuration in the cluster's routing system that defines the data routing rules.
The recommended method is to use dhconfig dis to create a named DIS configuration with "claims" on the tables or namespaces it will manage.
See Add a Data Import Server for detailed instructions. That page assumes the new DIS will run in a Persistent Query, but the configuration is very similar.
-
Decide what data to route. Identify the namespaces or specific tables you want this dedicated DIS to handle. For example, you might want it to handle all tables in the
TradesandQuotesnamespaces. -
Create the DIS configuration using
dhconfig dis add. This command creates a new, named DIS entry with dynamic endpoints and exclusive claims.--name: A unique, descriptive name for your new DIS configuration.--claim: Specifies a namespace (e.g.,Trades) or a table (e.g.,Trades.TradeTable) that this DIS will exclusively handle. This can be repeated as many times as needed. Tailers will send data for claimed tables only to this DIS.
-
Edit the DIS configuration. The
dhconfig dis addcommand creates a DIS withprivatestorage and dynamic endpoints.-
The standalone DIS process does not support
privatestorage, so you need to specify astoragevalue. Unlike an in-worker DIS where the storage path is passed directly via script arguments, a standalone DIS runs as an independent service. Other processes (workers, merge servers) must locate the data through the routing system, which requires a named storage entry. -
Authentication is required to use the dynamic endpoints. You must specify static values for the ports, or configure authentication for the DIS process. See Authentication and dynamic routing.
Edit the routing configuration to change the
storagevalue fromprivatetodefaultor to your desired storage location, and to set static values for the ports. After editing, the configuration should look like this:
-
Step 3: Update cluster configuration
To add the new server to your cluster, you must define it as a new node in your cluster.cnf file. This tells the Deephaven installer about the new machine's hostname and what services it should run.
-
Open your
cluster.cnffile for editing. -
Add a new node definition for the DIS server (if you are adding a node). Increment the node number from your last node. For example, if you have 3 nodes, the new one will be
DH_NODE_4_.... -
Assign the
DISrole. The essential role for this server isDH_NODE_4_ROLE_DIS="true". It is also common for such a node to run other supporting services like a Tailer, Table Data Cache Proxy (TDCP), or Log Aggregator Service (LAS), which are often enabled by default or as part of other roles. You will needDH_NODE_4_ROLE_MERGE="true"to perform local data merges.
Example cluster.cnf Addition:
DH_NODE_4_NAME: Set this to the short hostname of your new server.DH_NODE_4_ROLE_DIS: This is the key setting that tells the installer to configure and enable a Data Import Server process on this node.
See the Cluster Configuration Guide for a complete list of all possible node roles.
Step 4: Configure the DIS process
This step configures how the db_dis process runs on the node. While Step 2 defined what data the DIS handles in the cluster's routing configuration, this step tells the actual DIS process on the node to use that configuration.
Make the following changes on the DIS node. If you are adding a role to an existing node, you may do this before running the installer.
- Edit the hostconfig file (
/etc/sysconfig/illumon) and addEXTRA_ARGSas below. This tells thedb_disprocess to load the DIS configuration you created in Step 2.
- Run
/usr/illumon/latest/bin/dh_monit restart db_dis.
Step 5: Run the installer
After updating cluster.cnf, run the Deephaven installer to apply the configuration to your cluster. The installer will SSH to the new node, install Deephaven components, and start the required services.
-
From your installation host, navigate to your local installation directory (
$DH_LOCAL_DIR). -
Generate the installation scripts based on your updated
cluster.cnf: -
Run the master installation script:
The installer will connect to the new node (dh-cluster-dis-1 in our example), install the necessary packages, and start the db_dis process under dh_monit.
Step 6: Verify
Once the installer finishes, verify that the new DIS is running correctly.
-
SSH into the new DIS server.
-
Check
dh_monitstatus to confirm thedb_disprocess is running:You should see
db_disin the process list with a status ofRunning. -
Check the logs for any startup errors in
/var/log/deephaven/. -
Verify data flow. Ingest some new data into a
TradesorQuotestable and confirm it appears in queries. -
Check tailer logs on relevant machines to see connection messages to the new DIS.