Add a Data Import Server
There are several reasons for adding additional Data Import Servers (DIS) to a Deephaven system. When new data sources are added, the system data routing configuration must be updated. This guide first shows you the easy (happy path) way to add an in-worker DIS to ingest specific tables, then covers more complex scenarios in Customize the new Data Import Server.
Note
See:
See Data routing service configuration via YAML for a full explanation of the data routing format and the dhconfig dis
command for an explanation of additional DIS configuration.
How to configure a Deephaven ingester
The easy way
The data routing file must be configured so that Table Data Service (TDS) filtering uses (or is compatible with) claims
filters, and "all DISes" are included with the dataImportServers
source (typically in the td_tdcp
TDS).
See Update data routing configuration syntax for help transforming an older Deephaven installation.
The new DIS will use private
storage - the DIS script must provide the on-disk storage location, and this location will not be directly accessible by workers.
The new DIS will use a dynamic endpoint. Clients of this service must have access to the Deephaven service registry, so this configuration is not be suitable if you need to support clients that are "outside" the Deephaven cluster. The query worker running this DIS must have permission to register endpoints. See Authentication and dynamic routing .
A DIS created this way can be customized later, as described in the rest of this document.
Create the new Data Import Server
First, decide what tables and namespaces should be handled (exclusively) by this DIS. Claims can be made on entire namespaces, or specific tables. The example below will claim two namespaces and two tables. Ingester1
is a unique arbitrary identifier.
sudo -u irisadmin /usr/illumon/latest/bin/dhconfig dis add --name Ingester1 --claim NamespaceOne --claim NamespaceTwo --claim OtherNamespace.TableOne --claim OtherNamespace.TableTwo
This command creates a new DIS with the following configuration:
---
Ingester1:
endpoint:
serviceRegistry: registry
userIntradayDirectoryName: Users
throttleKbps: -1
claims:
- namespace: NamespaceTwo
- namespace: NamespaceOne
- namespace: OtherNamespace
tableName: TableTwo
- namespace: OtherNamespace
tableName: TableOne
storage: private
Customize the new Data Import Server
The process described above is sufficient for most purposes. The sections below describe the process for making changes to an existing DIS and some scenarios that require more complex configuration.
Edit a Data Import Server configuration
You need to export the configuration to a file, edit the file, and then import the changes. The YAML format is fully described in Data routing service configuration via YAML.
To export the configuration to a file:
/usr/illumon/latest/bin/dhconfig dis export --name Ingester1 --file /tmp/dis.yml
To import the updated file:
sudo -u irisadmin /usr/illumon/latest/bin/dhconfig dis import --file /tmp/dis.yml --force
Use static endpoints
If you want to use static addresses for the tailer and TDS connections to this DIS, you can specify them in the endpoint section.
Edit the DIS configuration and change the endpoint
section to include host
and port
values:
---
Ingester1:
endpoint:
serviceRegistry: none
host: localhost
tailerPort: 1111
tableDataPort: 2222
...
serviceRegistry
can be left as registry
, in which case the DIS registers its static endpoints with the service registry.
Make the storage location public
Local storage (reading directly from the disk where the process is running) is the default method for accessing historical tables' data files. It is also the default method when reading data to be merged from intraday to historical. The table data service called "local" is generally the default configuration used for merge processes. The new storage can be configured into "local", or it can be accessed directly by another name. Making the storage public is optional if workers will never read this data directly from disk.
To edit the system’s data routing configuration, you will need to export the configuration to a file, edit the file, and then update the system.
/usr/illumon/latest/bin/dhconfig routing export --file /tmp/routing.yml
vi /tmp/routing.yml
sudo -u irisadmin /usr/illumon/latest/bin/dhconfig routing import --file /tmp/routing.yml
Add a named storage item:
Add a storage location for the ingester. In the example below, "Ingester1" is an arbitrary identifier naming this entry; it does not have to match the ingester's name. The dbRoot "/db/dataImportServers/Ingester1" is also arbitrary and not directly related to the identifier.
Note
Create the directory before importing the data routing file to avoid errors later.
storage:
...
- name: Ingester1
dbRoot: /db/dataImportServers/Ingester1
There are two ways to change the local storage routing to include the new storage location.
Modify local
to include the original local
and Ingester1
:
This example:
- creates a TDS named "ingester1" with the new storage location
- renames the original "local" TDS to "localDefault"
- combines the original "local" with the new "Ingester1" storage location, and names the combined service "local"
tableDataServices:
...
localDefault:
storage: default
ingester1:
storage: Ingester1
local:
sources:
- name: localDefault
- name: ingester1
Default local
and Ingester1
are separate table data services:
This example creates a TDS named "ingester1" with the new storage location, and leaves the "local" TDS unchanged.
tableDataServices:
...
local:
storage: default
ingester1:
storage: Ingester1
Any table data services defined in this section will be available in the Persistent Query Configuration Editor's Merge Settings tab, under Table Data Service Configuration. Tags can be used to restrict which TableDataService entries are shown in the UI. See Tags and Descriptions in Data Routing Configuration.
Make the DIS use the new storage
Edit the DIS configuration to use the new storage location.
---
Ingester1:
...
storage: Ingester1
Update claims
You might want to claim additional tables or namespaces, or remove claims that are no longer needed.
If no other changes have been made, you can use the dhconfig dis add
command to replace the configuration, specifying the complete set of claims. The --force
flag is required to overwrite an existing configuration.
sudo -u irisadmin /usr/illumon/latest/bin/dhconfig dis add --name Ingester1 --force --claim NamespaceToAdd --claim NamespaceOne --claim NamespaceTwo --claim OtherNamespace.TableOne --claim OtherNamespace.TableTwo
For any other changes (or if you prefer to edit the file directly), edit the DIS configuration and change the claims section, following the pattern in the example.
---
Ingester1:
endpoint:
serviceRegistry: registry
userIntradayDirectoryName: Users
throttleKbps: -1
claims:
- namespace: NamespaceToAdd
- namespace: NamespaceTwo
- namespace: NamespaceOne
- namespace: OtherNamespace
tableName: TableTwo
- namespace: OtherNamespace
tableName: TableOne
storage: private
A note on storage
The DIS process needs a location to write data. In all cases, the storage location must be created and provisioned. The DIS expects to own this storage location. The data stored in this location should not be modified by other processes.
This should be a location with adequate space, and should be under the exclusive control of this process.
File ownership and permissions must allow the DIS process to read and write files. This will be dbmerge
if running in a merge worker, and dbquery
if running in a query worker.
Deephaven recommends the following location for ingester storage directories:
/db/dataImportServers/[DIS_Name]/
If other Deephaven processes need to directly read this data (e.g., for merging), you must name and make the storage location public. Otherwise, you can skip that configuration step and specify private
storage. The recommended location is still valid, but it must be configured in the importer script some other way.