Data routing configuration
There are many ways to configure the storage, ingestion, and retrieval of data in Deephaven. The data routing configuration is the information governing the locations, servers, and services that handle data. Like most Deephaven configuration, this routing information is stored in etcd and accessed via the Configuration Service.
What is Data Routing?
A Deephaven deployment is composed of one or more machines (or other deployments such as Kubernetes), services, data providers, and storage locations. Each of these entities is responsible for serving or processing a subset of the data available.
The data routing configuration defines the topology of the system for data, dictating:
- Where all live/intraday and historical data is located.
- Where the Tailer will send data for any given table (which will be one or more Data Import Servers).
- Where query workers and other processes will send internal logging.
- Where query workers will send central/live user data for processing.
- What data each Data Import Server (DIS) will accept for processing and storage.
- What shared file storage is available for query workers to read historical data from, and what tables are contained in those locations.
What is the data routing configuration?
The data routing configuration is a collection of YAML files stored in etcd. These files are accessed by the Data Routing Service, which is hosted in the Configuration Server. Data import servers are managed with dhconfig dis
, and the rest of the configuration with dhconfig routing
.
When will the data routing configuration need to be changed?
A default data routing configuration is created when Deephaven is installed. The configuration is thereafter owned by the customer, and Deephaven cannot update it later when updating the system. Changes might be necessary, according to the Deephaven release notes for any upgrade.
It is very common to create a Persistent Query to handle ingesting data to handle storing and serving certain data. This requires changes to the data routing configuration so that all parts of the system will handle the data correctly.
Deephaven makes this change very convenient via the dhconfig dis
command.
As a Deephaven system grows, scaling and reliability concerns can be addressed by adding some more complex data routing topology, such as:
- Partitioning data, so that it is stored and/or served by separate processes.
- Adding duplicate storage locations or failover groups for data recovery or redundancy.
- Sending data off-system for monitoring.
These changes require edits to the data routing configuration.
Data routing configuration changes are dynamic
When you change the configuration, many processes incorporate the changes immediately. When the routing configuration is changed in any way:
- The Table Data Cache Proxy recreates its TableDataService.
- Tailers restart all Data Import Server connections. This might change where data is sent.
- A Data Import Server replaces its configuration if the only changes are filtering (defining the data accepted and served). If the DIS configuration changes in other ways, the DIS will start logging errors until it is restarted or the configuration is corrected.
- Query workers don’t automatically see data routing changes. However, live data generally comes from the local TDCP, and that process does respond to changes dynamically. The most common case - adding an import server (DIS) for a new table - is seamlessly handled.