Add a Data Import Server

There are several reasons for adding additional Data Import Servers (DIS) to a Deephaven system. When new data sources are added, the system data routing configuration must be updated. This guide first shows you the easy (happy path) way to add an in-worker DIS to ingest specific tables, then covers more complex scenarios in Customize the new Data Import Server.

Note

See: See Data routing service configuration via YAML for a full explanation of the data routing format and the dhconfig dis command for an explanation of additional DIS configuration.

How to configure a Deephaven ingester

The easy way

The data routing file must be configured so that Table Data Service (TDS) filtering uses (or is compatible with) claims filters, and "all DISes" are included with the dataImportServers source (typically in the td_tdcp TDS). See Update data routing configuration syntax for help transforming an older Deephaven installation.

The new DIS will use private storage - the DIS script must provide the on-disk storage location, and this location will not be directly accessible by workers.

The new DIS will use a dynamic endpoint. Clients of this service must have access to the Deephaven service registry, so this configuration is not be suitable if you need to support clients that are "outside" the Deephaven cluster. The query worker running this DIS must have permission to register endpoints. See Authentication and dynamic routing .

A DIS created this way can be customized later, as described in the rest of this document.

Create the new Data Import Server

First, decide what tables and namespaces should be handled (exclusively) by this DIS. Claims can be made on entire namespaces, or specific tables. The example below will claim two namespaces and two tables. Ingester1 is a unique arbitrary identifier.

sudo -u irisadmin /usr/illumon/latest/bin/dhconfig dis add --name Ingester1 --claim NamespaceOne --claim NamespaceTwo --claim OtherNamespace.TableOne --claim OtherNamespace.TableTwo

This command creates a new DIS with the following configuration:

---
Ingester1:
  endpoint:
    serviceRegistry: registry
  userIntradayDirectoryName: Users
  throttleKbps: -1
  claims:
    - namespace: NamespaceTwo
    - namespace: NamespaceOne
    - namespace: OtherNamespace
      tableName: TableTwo
    - namespace: OtherNamespace
      tableName: TableOne
  storage: private

Customize the new Data Import Server

The process described above is sufficient for most purposes. The sections below describe the process for making changes to an existing DIS and some scenarios that require more complex configuration.

Edit a Data Import Server configuration

You need to export the configuration to a file, edit the file, and then import the changes. The YAML format is fully described in Data routing service configuration via YAML.

To export the configuration to a file:

/usr/illumon/latest/bin/dhconfig dis export --name Ingester1 --file /tmp/dis.yml

To import the updated file:

sudo -u irisadmin /usr/illumon/latest/bin/dhconfig dis import --file /tmp/dis.yml  --force

Use static endpoints

If you want to use static addresses for the tailer and TDS connections to this DIS, you can specify them in the endpoint section. Edit the DIS configuration and change the endpoint section to include host and port values:

---
Ingester1:
  endpoint:
    serviceRegistry: none
    host: localhost
    tailerPort: 1111
    tableDataPort: 2222
  ...

serviceRegistry can be left as registry, in which case the DIS registers its static endpoints with the service registry.

Make the storage location public

Local storage (reading directly from the disk where the process is running) is the default method for accessing historical tables' data files. It is also the default method when reading data to be merged from intraday to historical. The table data service called "local" is generally the default configuration used for merge processes. The new storage can be configured into "local", or it can be accessed directly by another name. Making the storage public is optional if workers will never read this data directly from disk.

To edit the system’s data routing configuration, you will need to export the configuration to a file, edit the file, and then update the system.

/usr/illumon/latest/bin/dhconfig routing export --file /tmp/routing.yml

vi /tmp/routing.yml

sudo -u irisadmin /usr/illumon/latest/bin/dhconfig routing import --file /tmp/routing.yml

Add a named storage item:

Add a storage location for the ingester. In the example below, "Ingester1" is an arbitrary identifier naming this entry; it does not have to match the ingester's name. The dbRoot "/db/dataImportServers/Ingester1" is also arbitrary and not directly related to the identifier.

Note

Create the directory before importing the data routing file to avoid errors later.

  storage:
    ...
    - name: Ingester1
      dbRoot: /db/dataImportServers/Ingester1

There are two ways to change the local storage routing to include the new storage location.

Modify local to include the original local and Ingester1:

This example:

  • creates a TDS named "ingester1" with the new storage location
  • renames the original "local" TDS to "localDefault"
  • combines the original "local" with the new "Ingester1" storage location, and names the combined service "local"
  tableDataServices:
    ...
    localDefault:
      storage: default
    ingester1:
      storage: Ingester1
    local:
      sources:
        - name: localDefault
        - name: ingester1

Default local and Ingester1 are separate table data services:

This example creates a TDS named "ingester1" with the new storage location, and leaves the "local" TDS unchanged.

  tableDataServices:
    ...
    local:
      storage: default
    ingester1:
      storage: Ingester1

Any table data services defined in this section will be available in the Persistent Query Configuration Editor's Merge Settings tab, under Table Data Service Configuration. Tags can be used to restrict which TableDataService entries are shown in the UI. See Tags and Descriptions in Data Routing Configuration.

Make the DIS use the new storage

Edit the DIS configuration to use the new storage location.

---
Ingester1:
  ...
  storage: Ingester1

Update claims

You might want to claim additional tables or namespaces, or remove claims that are no longer needed.

If no other changes have been made, you can use the dhconfig dis add command to replace the configuration, specifying the complete set of claims. The --force flag is required to overwrite an existing configuration.

sudo -u irisadmin /usr/illumon/latest/bin/dhconfig dis add --name Ingester1 --force --claim NamespaceToAdd --claim NamespaceOne --claim NamespaceTwo --claim OtherNamespace.TableOne --claim OtherNamespace.TableTwo

For any other changes (or if you prefer to edit the file directly), edit the DIS configuration and change the claims section, following the pattern in the example.

---
Ingester1:
  endpoint:
    serviceRegistry: registry
  userIntradayDirectoryName: Users
  throttleKbps: -1
  claims:
    - namespace: NamespaceToAdd
    - namespace: NamespaceTwo
    - namespace: NamespaceOne
    - namespace: OtherNamespace
      tableName: TableTwo
    - namespace: OtherNamespace
      tableName: TableOne
  storage: private

A note on storage

The DIS process needs a location to write data. In all cases, the storage location must be created and provisioned. The DIS expects to own this storage location. The data stored in this location should not be modified by other processes. This should be a location with adequate space, and should be under the exclusive control of this process. File ownership and permissions must allow the DIS process to read and write files. This will be dbmerge if running in a merge worker, and dbquery if running in a query worker.

Deephaven recommends the following location for ingester storage directories: /db/dataImportServers/[DIS_Name]/

If other Deephaven processes need to directly read this data (e.g., for merging), you must name and make the storage location public. Otherwise, you can skip that configuration step and specify private storage. The recommended location is still valid, but it must be configured in the importer script some other way.