Data routing configuration

There are many ways to configure the storage, ingestion, and retrieval of data in Deephaven. The data routing configuration is the information governing the locations, servers, and services that handle data. Like most Deephaven configuration, this routing information is stored in etcd and accessed via the Configuration Service.

What is Data Routing?

A Deephaven deployment is composed of one or more machines (or other deployments such as Kubernetes), services, data providers, and storage locations. Each of these entities is responsible for serving or processing a subset of the data available.

The data routing configuration defines the topology of the system for data, dictating:

Where all live/intraday and historical data is located.
Where the Tailer will send data for any given table (which will be one or more Data Import Servers).
Where query workers and other processes will send internal logging.
Where query workers will send central/live user data for processing.
What data each Data Import Server (DIS) will accept for processing and storage.
What shared file storage is available for query workers to read historical data from, and what tables are contained in those locations.

What is the data routing configuration?

The data routing configuration is a collection of YAML files stored in etcd. These files are accessed by the Data Routing Service, which is hosted in the Configuration Server. Data import servers are managed with dhconfig dis, and the rest of the configuration with dhconfig routing.

When will the data routing configuration need to be changed?

A default data routing configuration is created when Deephaven is installed. The configuration is thereafter owned by the customer, and Deephaven cannot update it later when updating the system. Changes might be necessary, according to the Deephaven release notes for any upgrade.

It is very common to create a Persistent Query to handle ingesting data to handle storing and serving certain data. This requires changes to the data routing configuration so that all parts of the system will handle the data correctly. Deephaven makes this change very convenient via the dhconfig dis command.

As a Deephaven system grows, scaling and reliability concerns can be addressed by adding some more complex data routing topology, such as:

Partitioning data, so that it is stored and/or served by separate processes.
Adding duplicate storage locations or failover groups for data recovery or redundancy.
Sending data off-system for monitoring.

These changes require edits to the data routing configuration.