Dynamic data routing and endpoint configuration in YAML
The Deephaven data routing configuration specifies and configures the services involved in moving data in the system. This configuration contains "endpoints" (an address, or host and port for a service). Specifying these endpoints can be painful, especially when using in-worker services, containers, or dynamic provisioning (e.g., Kubernetes).
Dynamic data routing delays decisions about addresses and ports until runtime, by using a registration service. This allows services to choose addresses later, according to information from sources other than the static configuration file. Actual service endpoints are registered when available, and consumers look them up at runtime.
Dynamic data routing requires some new syntax in the data routing YML file. The old syntax is supported (defining static endpoints), so no file changes are required. The new endpoint syntax supports static and dynamic endpoints, and allows free form configuration data to be passed to services.
The new format is incompatible with the old format to avoid confusion. Any given service must be given either an endpoint
section or the old host/port tags, but not both.
Note
A consumer might request an endpoint when the service is not running. In this case, a NotReadyException
(a type of IOException
) will be thrown. Any attempt to resolve a dynamic endpoint could result in a NotReadyException
.
Authentication and dynamic routing
When an endpoint is configured for dynamic routing, server and client must interact with the process registry service to resolve the actual endpoint address.
Clients
Any process may read process registration data from the process registry service. Authentication is not required.
Servers
Any process that registers endpoints (writes process registration data) must be authenticated, with a user context configured for service registry writing. Deephaven workers are always authenticated, so in-worker ingesters (Data Import Servers) always have a user context that can be configured for write access. Standalone processes, including TDCP and Log Aggregator, might not be configured with authentication. These processes will need to be configured with an authentication key in the hostconfig file. See Database authentication for more information.
The authenticated user must be in a group configured for service registry writing. The superusers group (iris-superusers by default) always has write permission. Additional groups can be given permission with the following property, which must be visible to the configuration server:
ServiceRegistry.writers=group1,group2
In-worker ingestion processes typically run on a merge server as a Live Script. By default, merge servers have more restricted access than query servers based on the RemoteQueryDispatcher.allowedGroups
property. The service.name=dbmerge
stanza in iris-defaults.prop
sets this to iris-schemamanagers
`; but administrators may override it in another property file.
An In-worker ingestion process can also be run with configuration type In-Worker Service
and service type Data Import Server
. In this case, property InWorkerService.dis.allowedGroups
must be set to allow the authenticated user to run this type of query.
Endpoint syntax
An "endpoint" represents a service address (host and port). A given service might host multiple endpoints. An endpoint may be static or dynamic:
- "Static" means the address is fully configured in the data routing file.
- "Dynamic" means the server will register the endpoint at runtime when it is available, and clients will look up the address as needed at runtime.
Tags
The sample YAML block below highlights the tags that capture a simple endpoint:
endpoint:
serviceRegistry: none
host: localhost
port: 20222
endpoint: [map]
- This defines all the endpoints (host and ports) supported by this service configuration.serviceRegistry: [String]
- (Optional) Valid values arenone
andregistry
.none
is the default value and indicates a static configuration.registry
indicates that the configured endpoints will be registered and retrieved at runtime.host: [String]
- Optional ifserviceRegistry
isregistry
. If not provided, the server must be able to determine its address.port: [int]
- Optional ifserviceRegistry
isregistry
. If not provided, the server must be able to determine what port to use.
Any additional tags provided under endpoint
will be passed to the server. These values can be used to help the service choose how to select ports and register its endpoints.
A Data Import Server (DIS) hosts multiple services, all of which are optional. The following additional tags are used by endpoints in the dataImportServices
section:
tailerPortDisabled: [boolean]
- (Optional) If present andtrue
, the DIS will not accept tailed data (or commands).tableDataPortDisabled: [boolean]
- (Optional) If present andtrue
, the DIS will not start or register a table data service.tailerPort: [int]
- (Optional) This indicates the port for tailing data.-1
impliestailerPortDisabled: true
.tableDataPort: [int]
- (Optional) This indicates the port where the Table Data Service will be hosted.-1
impliestableDataPortDisabled: true
.tableDataPortEnabled: [boolean]
- Deprecated. UsetableDataPortDisabled
where needed.tailerPortEnabled: [boolean]
- Deprecated. UsetailerPortDisabled
where needed.
serviceRegistry
When serviceRegistry
is omitted, or given as serviceRegistry: none
, the configuration is static, and all configuration required by the server and clients must be present. host
and port
are required for endpoints in the logAggregatorServers
and tableDataServices
sections. In dataImportServers
sections, host
is required, and the tailer and TDS ports must be disabled or given values.
When serviceRegistry
is given as serviceRegistry: registry
, the configuration is dynamic. The configured service will register actual values for its endpoint or endpoints, and clients will look the addresses up at runtime. The configuration values may be configured statically, and they will still be registered at runtime.
The hostname which a Data Import Server (DIS) registers with the service registry may be defined in the host
tag within the DIS' routing endpoint
or by using the ServiceRegistry.overrideHostname
system property. The precedence for the service registry host is from:
- The routing endpoint configuration.
ServiceRegistry.overrideHostname
property, which may be set globally, per-host in an appropriate configuration stanza, or at the query level with an Extra JVM Argument.- On Kubernetes, the worker's service's hostname.
- On bare metal, it is the result of the Java
InetAddress.getLocalHost().getHostName()
function.
Most Deephaven services will choose an ephemeral port if one is not provided.
Aliased Maps
Default data routing files provided by Deephaven include some aliased maps (e.g., <<: *DIS-default
) for convenience. This YAML syntax copies all the key-value pairs from the template into the target location. If this aliased map contains tags involved in the endpoint
syntax change, it will cause parsing errors.
Deephaven recommends that the legacy tags (tailerPort
, tableDataPort
, etc) be removed from the aliased map, or the aliased map be removed entirely.
For example:
Before:
Default data routing configuration files might look something like the following. Some comments are removed for clarity.
DIS-default:
- &DIS-default
tailerPort: *default-tailerPort
throttleKbps: -1
userIntradayDirectoryName: "IntradayUser"
storage: default
tableDataPort: *default-tableDataPort
# if a DIS instance adds a properties key, it will REPLACE this one. Add another anchor so they can be added back in.
properties: &DIS-defaultProperties
StringCacheHint.tableNameAndColumnNameEquals_QuoteArcaStock/Exchange: ConcurrentUnboundedStringCache,String,1
StringCacheHint.columnNameEquals_USym: ConcurrentUnboundedStringCache,String,20000
# ...
dataImportServers:
# The primary data import server
db_dis:
# import the default values
<<: *DIS-default
host: *dh-import # reference the address defined above for "dh-import"
userIntradayDirectoryName: "IntradayUser"
webServerParameters:
enabled: true
port: 8086
authenticationRequired: false
sslRequired: false
filters:
- {namespaceSet: System, online: true}
properties:
# re-add the default properties (omit the <<: if you don't want the defaults)
<<: *DIS-defaultProperties
## any additional properties for this DIS
After:
The "Before" example, modified to remove legacy endpoint tags and the imported defaults.
DIS-default:
- &DIS-default
properties: &DIS-defaultProperties
StringCacheHint.tableNameAndColumnNameEquals_QuoteArcaStock/Exchange: ConcurrentUnboundedStringCache,String,1
StringCacheHint.columnNameEquals_USym: ConcurrentUnboundedStringCache,String,20000
# ...
dataImportServers:
db_dis:
throttleKbps: -1
userIntradayDirectoryName: "IntradayUser"
storage: default
endpoint:
host: *ddl_dis
tailerPort: *default-tailerPort
tableDataPort: *default-tableDataPort
webServerParameters:
enabled: true
port: 8086
authenticationRequired: false
sslRequired: false
filters:
- {namespaceSet: System, online: true}
properties:
# re-add the default properties (omit the <<: if you don't want the defaults)
<<: *DIS-defaultProperties
Example endpoint configurations
Static and dynamic endpoints
The YAML examples below illustrate static and dynamic endpoint configurations. Using dynamic endpoints for in-worker Data Import Servers is recommended.
Data import servers
In the following example, db_dis
has static endpoints. The SimpleLastBy and Kafka ingesters have dynamic endpoints. Tailing is disabled for the KafkaImporter.
dataImportServers:
db_dis:
throttleKbps: -1
userIntradayDirectoryName: "IntradayUser"
storage: default
# static endpoints
endpoint:
host: *ddl_dis
tailerPort: *default-tailerPort
tableDataPort: *default-tableDataPort
filters: {whereTableKey: "NamespaceSet = `System` && Online"}
webServerParameters:
enabled: true
port: 8083
# Last-By DIS that using dynamic endpoints
SimpleLastBy:
storage: lastByStorage
endpoint:
serviceRegistry: registry
claims: {namespace: LastByNamespace}
# Kafka DIS that does not accept tailed data
KafkaImporter:
storage: kafkaStorage
endpoint:
serviceRegistry: registry
tailerPortDisabled: true
claims: {namespace: Kafka}
Log aggregators
The rta
log aggregator is configured with a static endpoint in the following example:
logAggregatorServers: !!omap # need an ordered map for precedence, to allow the filters to overlap
# rta static endpoint
- rta:
endpoint:
port: *default-lasPort
host: *localhost
filters:
- namespaceSet: User
The equivalent configuration with a dynamic endpoint:
logAggregatorServers: !!omap
# need an ordered map for precedence, to allow the filters to overlap
# rta static endpoint
- rta:
endpoint:
serviceRegistry: registry
filters:
- namespaceSet: User
Table data services
Endpoints are used in tableDataServices
only when defining a remote Table Data Service provider, such as the table data cache proxy (TDCP) or a local table data service (LTDS).
This example shows the LTDS configured with a static endpoint:
tableDataServices:
# Local Table Data Service with static configuration
db_ltds:
endpoint:
host: *iris-dis
port: *default-localTableDataPort
This example shows the LTDS configured with a dynamic endpoint:
tableDataServices:
# Local Table Data Service with static configuration
db_ltds:
endpoint:
serviceRegistry: registry
Static endpoints (legacy format)
The example below uses the legacy format for all endpoints. Any endpoints using this format will need to be converted before some new syntax features can be used (e.g., claims
).
#Data Import Servers
dataImportServers:
#db_dis represented in Legacy Format
db_dis:
<<: *DIS_default # import all defaults from the DIS_default section
host: *ddl_dis
tailerPort: *default-tailerPort
tableDataPort: *default-tableDataPort
filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `Order`"}
...
# Last By DIS in Legacy Format
SimpleLastBy:
host: *dh-import
tailerPort: 22222
tableDataPort: 22223
...
#Log Aggregator Servers
logAggregatorServers: !!omap # need an ordered map for precedence, to allow the filters to overlap
# rta in Legacy Format
- rta:
port: *default-lasPort
host: *localhost
filters:
- namespaceSet: User
# Table Data Services
tableDataServices:
#Local Table Data Service in Legacy Format
db_ltds:
host: *iris-dis
port: *default-localTableDataPort
...