Add a dedicated Data Import Server

This guide explains how to add a host machine to an existing Deephaven Enterprise cluster for the purpose of running a dedicated Data Import Server (DIS).

Adding a dedicated DIS is a common scaling strategy to isolate data ingestion workloads, improve performance, and increase capacity. This is especially useful for:

  • High-volume, real-time data feeds.
  • Isolating different data sources from each other.
  • Offloading ingestion processing from query or administrative servers.

This process involves updating the cluster configuration to recognize the new machine, running the installer, and then configuring the DIS and data routing.

Prerequisites

  • An existing, operational Deephaven Enterprise cluster.
  • A server that will run the new DIS process, which should be one of the following:
    • A new bare-metal server or virtual machine provisioned with a supported OS (e.g., Rocky Linux, Ubuntu) and network connectivity to the existing cluster nodes. See Installation Planning for system requirements.
    • A node in the cluster to which you will add the DIS role.
  • SSH access to the new machine from your installation host, using a service account with sudo privileges.
  • Your current cluster.cnf file and installer media, preferably managed under source control.
  • Familiarity with the Deephaven installation process as described in the Basic Installation Guide.

Step order

The order of steps depends on whether you're adding a DIS role to an existing node or creating a new node.

Adding a DIS role to an existing node

Follow steps in this order:

  1. Plan data storage and access (Step 1).
  2. Configure the DIS and data routing (Step 2).
  3. Update cluster configuration (Step 3).
  4. Configure the DIS process on the node (Step 4).
  5. Run the installer (Step 5).
  6. Verify (Step 6).

Rationale: Configure everything first, then run the installer to apply changes. The DIS process may fail to start initially but will start correctly after you configure it in Step 4.

Adding a new node to the cluster

Follow steps in this order:

  1. Plan data storage and access (Step 1).
  2. Configure the DIS and data routing (Step 2) - run on an existing node since the new node doesn't exist yet.
  3. Update cluster configuration (Step 3).
  4. Run the installer to create the node (Step 5).
  5. Configure the DIS process on the new node (Step 4).
  6. Verify (Step 6).

Rationale: You can configure the DIS and data routing before the node exists by running dhconfig commands on an existing node (the infra node is a good choice). This allows you to define what data the new DIS will handle. After creating the node with the installer, configure how the db_dis process runs on that node.

About dhconfig commands: You can run the dhconfig commands in Step 2 on any installed node in the cluster. The infra node is a good choice because it typically has the default superuser authentication key configured. If you run dhconfig on a different node, you may need to supply authentication options.

Configuration steps

Step 1: Plan storage

The DIS writes ingested data to a storage location defined in the data routing configuration. The default DIS uses /db/Intraday/. For a dedicated DIS, Deephaven recommends /db/dataImportServers/[DIS_Name]/.

Note

Create the storage directory before importing the data routing configuration to avoid errors. The directory must be owned by and writable by the DIS process user (typically dbmerge).

This storage should be fast and local to this host. Shared storage can lead to performance issues and data corruption when other processes can write to the same location. If the data will be merged to historical, this node should include the merge server role for best performance.

Data routing must be configured so that tailers send data to this DIS, and table data is requested from this DIS. The simplest mechanism is claims, but you can configure data routing in other ways.

Step 2: Configure data routing

This step configures what data the new DIS will handle (which tables or namespaces it will manage). You'll create a named DIS configuration in the cluster's routing system that defines the data routing rules.

The recommended method is to use dhconfig dis to create a named DIS configuration with "claims" on the tables or namespaces it will manage.

See Add a Data Import Server for detailed instructions. That page assumes the new DIS will run in a Persistent Query, but the configuration is very similar.

  1. Decide what data to route. Identify the namespaces or specific tables you want this dedicated DIS to handle. For example, you might want it to handle all tables in the Trades and Quotes namespaces.

  2. Create the DIS configuration using dhconfig dis add. This command creates a new, named DIS entry with dynamic endpoints and exclusive claims.

    • --name: A unique, descriptive name for your new DIS configuration.
    • --claim: Specifies a namespace (e.g., Trades) or a table (e.g., Trades.TradeTable) that this DIS will exclusively handle. This can be repeated as many times as needed. Tailers will send data for claimed tables only to this DIS.
  3. Edit the DIS configuration. The dhconfig dis add command creates a DIS with private storage and dynamic endpoints.

    • The standalone DIS process does not support private storage, so you need to specify a storage value. Unlike an in-worker DIS where the storage path is passed directly via script arguments, a standalone DIS runs as an independent service. Other processes (workers, merge servers) must locate the data through the routing system, which requires a named storage entry.

    • Authentication is required to use the dynamic endpoints. You must specify static values for the ports, or configure authentication for the DIS process. See Authentication and dynamic routing.

      Edit the routing configuration to change the storage value from private to default or to your desired storage location, and to set static values for the ports. After editing, the configuration should look like this:

Step 3: Update cluster configuration

To add the new server to your cluster, you must define it as a new node in your cluster.cnf file. This tells the Deephaven installer about the new machine's hostname and what services it should run.

  1. Open your cluster.cnf file for editing.

  2. Add a new node definition for the DIS server (if you are adding a node). Increment the node number from your last node. For example, if you have 3 nodes, the new one will be DH_NODE_4_....

  3. Assign the DIS role. The essential role for this server is DH_NODE_4_ROLE_DIS="true". It is also common for such a node to run other supporting services like a Tailer, Table Data Cache Proxy (TDCP), or Log Aggregator Service (LAS), which are often enabled by default or as part of other roles. You will need DH_NODE_4_ROLE_MERGE="true" to perform local data merges.

Example cluster.cnf Addition:

  • DH_NODE_4_NAME: Set this to the short hostname of your new server.
  • DH_NODE_4_ROLE_DIS: This is the key setting that tells the installer to configure and enable a Data Import Server process on this node.

See the Cluster Configuration Guide for a complete list of all possible node roles.

Step 4: Configure the DIS process

This step configures how the db_dis process runs on the node. While Step 2 defined what data the DIS handles in the cluster's routing configuration, this step tells the actual DIS process on the node to use that configuration.

Make the following changes on the DIS node. If you are adding a role to an existing node, you may do this before running the installer.

  1. Edit the hostconfig file (/etc/sysconfig/illumon) and add EXTRA_ARGS as below. This tells the db_dis process to load the DIS configuration you created in Step 2.
  1. Run /usr/illumon/latest/bin/dh_monit restart db_dis.

Step 5: Run the installer

After updating cluster.cnf, run the Deephaven installer to apply the configuration to your cluster. The installer will SSH to the new node, install Deephaven components, and start the required services.

  1. From your installation host, navigate to your local installation directory ($DH_LOCAL_DIR).

  2. Generate the installation scripts based on your updated cluster.cnf:

  3. Run the master installation script:

The installer will connect to the new node (dh-cluster-dis-1 in our example), install the necessary packages, and start the db_dis process under dh_monit.

Step 6: Verify

Once the installer finishes, verify that the new DIS is running correctly.

  1. SSH into the new DIS server.

  2. Check dh_monit status to confirm the db_dis process is running:

    You should see db_dis in the process list with a status of Running.

  3. Check the logs for any startup errors in /var/log/deephaven/.

  4. Verify data flow. Ingest some new data into a Trades or Quotes table and confirm it appears in queries.

  5. Check tailer logs on relevant machines to see connection messages to the new DIS.