Dynamic data routing and endpoint configuration in YAML
The Deephaven data routing configuration specifies and configures the services involved in moving data in the system. This configuration contains "endpoints" (an address, or host and port for a service). Specifying these endpoints can be painful, especially when using in-worker services, containers, or dynamic provisioning (e.g., Kubernetes).
Dynamic data routing delays decisions about addresses and ports until runtime, by using a registration service. This allows services to choose addresses later, according to information from sources other than the static configuration file. Actual service endpoints are registered when available, and consumers look them up at runtime.
Dynamic data routing requires some new syntax in the data routing YML file. The old syntax is supported (defining static endpoints), so no file changes are required. The new endpoint syntax supports static and dynamic endpoints, and allows free form configuration data to be passed to services.
The new format is incompatible with the old format to avoid confusion. Any given service must be given either an endpoint section or the old host/port tags, but not both.
Note
A consumer might request an endpoint when the service is not running. In this case, a NotReadyException (a type of IOException) will be thrown. Any attempt to resolve a dynamic endpoint could result in a NotReadyException.
Authentication and dynamic routing
When an endpoint is configured for dynamic routing, server and client must interact with the process registry service to resolve the actual endpoint address.
Clients
Any process may read process registration data from the process registry service. Authentication is not required.
Servers
Any process that registers endpoints (writes process registration data) must be authenticated, with a user context configured for service registry writing. Deephaven workers are always authenticated, so in-worker ingesters (Data Import Servers) always have a user context that can be configured for write access. Standalone processes, including TDCP and Log Aggregator, might not be configured with authentication. These processes will need to be configured with an authentication key in the hostconfig file. See Database authentication for more information.
The authenticated user must be in a group configured for service registry writing. The superusers group (iris-superusers by default) always has write permission. Additional groups can be given permission with the following property, which must be visible to the configuration server:
Note
The authenticated user is the user that authenticated to the system to initiate the operation. The effective user is the user that Deephaven enforces permission checks for.
A simple example is the "Operate as" login, where the user who logs in (via password, SAML, etc.) is the Authenticated User, and the user they are operating as is the Effective User.
In-worker ingestion processes typically run on a merge server as a Live Script. By default, merge servers have more restricted access than query servers based on the RemoteQueryDispatcher.allowedGroups property. The service.name=dbmerge stanza in iris-defaults.prop sets this to iris-schemamanagers
; but administrators may override it in another property file.
An In-worker ingestion process can also be run with configuration type In-Worker Service and service type Data Import Server. In this case, property InWorkerService.dis.allowedGroups must be set to allow the authenticated user to run this type of query.
Endpoint syntax
An "endpoint" represents a service address (host and port). A given service might host multiple endpoints. An endpoint may be static or dynamic:
- "Static" means the address is fully configured in the data routing file.
- "Dynamic" means the server will register the endpoint at runtime when it is available, and clients will look up the address as needed at runtime.
Tags
The sample YAML block below highlights the tags that capture a simple endpoint:
endpoint: [map]- This defines all the endpoints (host and ports) supported by this service configuration.serviceRegistry: [String]- (Optional) Valid values arenoneandregistry.noneis the default value and indicates a static configuration.registryindicates that the configured endpoints will be registered and retrieved at runtime.host: [String]- Optional ifserviceRegistryisregistry. If not provided, the server must be able to determine its address.port: [int]- Optional ifserviceRegistryisregistry. If not provided, the server must be able to determine what port to use.
Any additional tags provided under endpoint will be passed to the server. These values can be used to help the service choose how to select ports and register its endpoints.
A Data Import Server (DIS) hosts multiple services, all of which are optional. The following additional tags are used by endpoints in the dataImportServices section:
tailerPortDisabled: [boolean]- (Optional) If present andtrue, the DIS will not accept tailed data (or commands).tableDataPortDisabled: [boolean]- (Optional) If present andtrue, the DIS will not start or register a table data service.tailerPort: [int]- (Optional) This indicates the port for tailing data.-1impliestailerPortDisabled: true.tableDataPort: [int]- (Optional) This indicates the port where the Table Data Service will be hosted.-1impliestableDataPortDisabled: true.tableDataPortEnabled: [boolean]- Deprecated. UsetableDataPortDisabledwhere needed.tailerPortEnabled: [boolean]- Deprecated. UsetailerPortDisabledwhere needed.
serviceRegistry
When serviceRegistry is omitted, or given as serviceRegistry: none, the configuration is static, and all configuration required by the server and clients must be present. host and port are required for endpoints in the logAggregatorServers and tableDataServices sections. In dataImportServers sections, host is required, and the tailer and TDS ports must be disabled or given values.
When serviceRegistry is given as serviceRegistry: registry, the configuration is dynamic. The configured service will register actual values for its endpoint or endpoints, and clients will look the addresses up at runtime. The configuration values may be configured statically, and they will still be registered at runtime.
The hostname which a Data Import Server (DIS) registers with the service registry may be defined in the host tag within the DIS' routing endpoint or by using the ServiceRegistry.overrideHostname system property. The precedence for the service registry host is from:
- The routing endpoint configuration.
ServiceRegistry.overrideHostnameproperty, which may be set globally, per-host in an appropriate configuration stanza, or at the query level with an Extra JVM Argument.- On Kubernetes, the worker's service's hostname.
- On bare metal, it is the result of the Java
InetAddress.getLocalHost().getHostName()function.
Most Deephaven services will choose an ephemeral port if one is not provided.
Aliased Maps
Default data routing files provided by Deephaven include some aliased maps (e.g., <<: *DIS-default) for convenience. This YAML syntax copies all the key-value pairs from the template into the target location. If this aliased map contains tags involved in the endpoint syntax change, it will cause parsing errors.
Deephaven recommends that the legacy tags (tailerPort, tableDataPort, etc) be removed from the aliased map, or the aliased map be removed entirely.
For example:
Before:
Default data routing configuration files might look something like the following. Some comments are removed for clarity.
After:
The "Before" example, modified to remove legacy endpoint tags and the imported defaults.
Example endpoint configurations
Static and dynamic endpoints
The YAML examples below illustrate static and dynamic endpoint configurations. Using dynamic endpoints for in-worker Data Import Servers is recommended.
Data import servers
In the following example, db_dis has static endpoints. The SimpleLastBy and Kafka ingesters have dynamic endpoints. Tailing is disabled for the KafkaImporter.
Log aggregators
The rta log aggregator is configured with a static endpoint in the following example:
The equivalent configuration with a dynamic endpoint:
Table data services
Endpoints are used in tableDataServices only when defining a remote table data service provider, such as the Table Data Cache Proxy (TDCP) or the Local Table Data Service (LTDS).
This example shows the LTDS configured with a static endpoint:
This example shows the LTDS configured with a dynamic endpoint:
Static endpoints (legacy format)
The example below uses the legacy format for all endpoints. Any endpoints using this format will need to be converted before some new syntax features can be used (e.g., claims).