Dynamic data routing and endpoint configuration in YAML

The Deephaven data routing configuration specifies and configures the services involved in moving data in the system. This configuration contains "endpoints" (an address, or host and port for a service). Specifying these endpoints can be painful, especially when using in-worker services, containers, or dynamic provisioning (e.g., Kubernetes).

Dynamic data routing delays decisions about addresses and ports until runtime, by using a registration service. This allows services to choose addresses later, according to information from sources other than the static configuration file. Actual service endpoints are registered when available, and consumers look them up at runtime.

Dynamic data routing requires some new syntax in the data routing YML file. The old syntax is supported (defining static endpoints), so no file changes are required. The new endpoint syntax supports static and dynamic endpoints, and allows free form configuration data to be passed to services.

The new format is incompatible with the old format to avoid confusion. Any given service must be given either an endpoint section or the old host/port tags, but not both.

Note

A consumer might request an endpoint when the service is not running. In this case, a NotReadyException (a type of IOException) will be thrown. Any attempt to resolve a dynamic endpoint could result in a NotReadyException.

Authentication and dynamic routing

When an endpoint is configured for dynamic routing, server and client must interact with the process registry service to resolve the actual endpoint address.

Clients

Any process may read process registration data from the process registry service. Authentication is not required.

Servers

Any process that registers endpoints (writes process registration data) must be authenticated, with a user context configured for service registry writing. Deephaven workers are always authenticated, so in-worker ingesters (Data Import Servers) always have a user context that can be configured for write access. Standalone processes, including TDCP and Log Aggregator, might not be configured with authentication. These processes will need to be configured with an authentication key in the hostconfig file. See Database authentication for more information.

The authenticated user must be in a group configured for service registry writing. The superusers group (iris-superusers by default) always has write permission. Additional groups can be given permission with the following property, which must be visible to the configuration server:

ServiceRegistry.writers=group1,group2

In-worker ingestion processes typically run on a merge server as a Live Script. By default, merge servers have more restricted access than query servers based on the RemoteQueryDispatcher.allowedGroups property. The service.name=dbmerge stanza in iris-defaults.prop sets this to iris-schemamanagers `; but administrators may override it in another property file.

An In-worker ingestion process can also be run with configuration type In-Worker Service and service type Data Import Server. In this case, property InWorkerService.dis.allowedGroups must be set to allow the authenticated user to run this type of query.

Endpoint syntax

An "endpoint" represents a service address (host and port). A given service might host multiple endpoints. An endpoint may be static or dynamic:

"Static" means the address is fully configured in the data routing file.
"Dynamic" means the server will register the endpoint at runtime when it is available, and clients will look up the address as needed at runtime.

`serviceRegistry`

When serviceRegistry is omitted, or given as serviceRegistry: none, the configuration is static, and all configuration required by the server and clients must be present. host and port are required for endpoints in the logAggregatorServers and tableDataServices sections. In dataImportServers sections, host is required, and the tailer and TDS ports must be disabled or given values.

When serviceRegistry is given as serviceRegistry: registry, the configuration is dynamic. The configured service will register actual values for its endpoint or endpoints, and clients will look the addresses up at runtime. The configuration values may be configured statically, and they will still be registered at runtime.

The hostname which a Data Import Server (DIS) registers with the service registry may be defined in the host tag within the DIS' routing endpoint or by using the ServiceRegistry.overrideHostname system property. The precedence for the service registry host is from:

The routing endpoint configuration.
ServiceRegistry.overrideHostname property, which may be set globally, per-host in an appropriate configuration stanza, or at the query level with an Extra JVM Argument.
On Kubernetes, the worker's service's hostname.
On bare metal, it is the result of the Java InetAddress.getLocalHost().getHostName() function.

Most Deephaven services will choose an ephemeral port if one is not provided.

Aliased Maps

Default data routing files provided by Deephaven include some aliased maps (e.g., <<: *DIS-default) for convenience. This YAML syntax copies all the key-value pairs from the template into the target location. If this aliased map contains tags involved in the endpoint syntax change, it will cause parsing errors. Deephaven recommends that the legacy tags (tailerPort, tableDataPort, etc) be removed from the aliased map, or the aliased map be removed entirely.

For example:

Before:

Default data routing configuration files might look something like the following. Some comments are removed for clarity.

  DIS-default:
    - &DIS-default
      tailerPort: *default-tailerPort
      throttleKbps: -1
      userIntradayDirectoryName: "IntradayUser"
      storage: default
      tableDataPort: *default-tableDataPort
      # if a DIS instance adds a properties key, it will REPLACE this one. Add another anchor so they can be added back in.
      properties: &DIS-defaultProperties
        StringCacheHint.tableNameAndColumnNameEquals_QuoteArcaStock/Exchange: ConcurrentUnboundedStringCache,String,1
        StringCacheHint.columnNameEquals_USym: ConcurrentUnboundedStringCache,String,20000
        # ...

  dataImportServers:
    # The primary data import server
    db_dis:
      # import the default values
      <<: *DIS-default
      host: *dh-import   # reference the address defined above for "dh-import"
      userIntradayDirectoryName: "IntradayUser"
      webServerParameters:
        enabled: true
        port: 8086
        authenticationRequired: false
        sslRequired: false
      filters:
        - {namespaceSet: System, online: true}
      properties:
        # re-add the default properties (omit the <<: if you don't want the defaults)
        <<: *DIS-defaultProperties
        ## any additional properties for this DIS

After:

The "Before" example, modified to remove legacy endpoint tags and the imported defaults.

  DIS-default:
    - &DIS-default
      properties: &DIS-defaultProperties
        StringCacheHint.tableNameAndColumnNameEquals_QuoteArcaStock/Exchange: ConcurrentUnboundedStringCache,String,1
        StringCacheHint.columnNameEquals_USym: ConcurrentUnboundedStringCache,String,20000
        # ...

  dataImportServers:
    db_dis:
      throttleKbps: -1
      userIntradayDirectoryName: "IntradayUser"
      storage: default
      endpoint:
        host: *ddl_dis
        tailerPort: *default-tailerPort
        tableDataPort: *default-tableDataPort
      webServerParameters:
        enabled: true
        port: 8086
        authenticationRequired: false
        sslRequired: false
      filters:
        - {namespaceSet: System, online: true}
      properties:
        # re-add the default properties (omit the <<: if you don't want the defaults)
        <<: *DIS-defaultProperties

Example endpoint configurations

Static and dynamic endpoints

The YAML examples below illustrate static and dynamic endpoint configurations. Using dynamic endpoints for in-worker Data Import Servers is recommended.

Data import servers

In the following example, db_dis has static endpoints. The SimpleLastBy and Kafka ingesters have dynamic endpoints. Tailing is disabled for the KafkaImporter.

  dataImportServers:
    db_dis:
      throttleKbps: -1
      userIntradayDirectoryName: "IntradayUser"
      storage: default
      # static endpoints
      endpoint:
        host: *ddl_dis
        tailerPort: *default-tailerPort
        tableDataPort: *default-tableDataPort
      filters: {whereTableKey: "NamespaceSet = `System` && Online"}
      webServerParameters:
        enabled: true
        port: 8083

    # Last-By DIS that using dynamic endpoints
    SimpleLastBy:
      storage: lastByStorage
      endpoint:
        serviceRegistry: registry
      claims: {namespace: LastByNamespace}

    # Kafka DIS that does not accept tailed data
    KafkaImporter:
      storage: kafkaStorage
      endpoint:
        serviceRegistry: registry
        tailerPortDisabled: true
      claims: {namespace: Kafka}

Log aggregators

The rta log aggregator is configured with a static endpoint in the following example:

  logAggregatorServers: !!omap  # need an ordered map for precedence, to allow the filters to overlap
    # rta static endpoint
    - rta:
        endpoint:
          port: *default-lasPort
          host: *localhost
        filters:
          - namespaceSet: User

The equivalent configuration with a dynamic endpoint:

logAggregatorServers: !!omap
  # need an ordered map for precedence, to allow the filters to overlap
  # rta static endpoint
  - rta:
      endpoint:
        serviceRegistry: registry
      filters:
        - namespaceSet: User

Table data services

Endpoints are used in tableDataServices only when defining a remote Table Data Service provider, such as the table data cache proxy (TDCP) or a local table data service (LTDS).

This example shows the LTDS configured with a static endpoint:

  tableDataServices:
    # Local Table Data Service with static configuration
    db_ltds:
      endpoint:
        host: *iris-dis
        port: *default-localTableDataPort

This example shows the LTDS configured with a dynamic endpoint:

tableDataServices:
  # Local Table Data Service with static configuration
  db_ltds:
    endpoint:
      serviceRegistry: registry

Static endpoints (legacy format)

The example below uses the legacy format for all endpoints. Any endpoints using this format will need to be converted before some new syntax features can be used (e.g., claims).

  #Data Import Servers
  dataImportServers:
    #db_dis represented in Legacy Format
    db_dis:
      <<: *DIS_default # import all defaults from the DIS_default section
      host: *ddl_dis
      tailerPort: *default-tailerPort
      tableDataPort: *default-tableDataPort
    filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `Order`"}
    ...

    # Last By DIS in Legacy Format
    SimpleLastBy:
      host: *dh-import
      tailerPort: 22222
      tableDataPort: 22223
      ...

  #Log Aggregator Servers
  logAggregatorServers: !!omap  # need an ordered map for precedence, to allow the filters to overlap
    # rta in Legacy Format
    - rta:
        port: *default-lasPort
        host: *localhost
        filters:
          - namespaceSet: User

  # Table Data Services
  tableDataServices:
    #Local Table Data Service in Legacy Format
    db_ltds:
      host: *iris-dis
      port: *default-localTableDataPort
    ...