How to configure and use a Development Persistent Query-based DIS Instance

Background

Historically, development and testing of new schemata and logging processes in Deephaven had to occur in a separate development environment to avoid the disruption of restarting a production Data Import Server (DIS) during business hours. With the addition of the in-Worker DIS process to Deephaven v1.20181212, it is now possible to run multiple DIS instances that can be restarted or otherwise reconfigured without impacting production data handling.

Prerequisites

Deephaven v1.20190607 or later release installed.
Installation configured to use the Data Routing Service (YAML-based configuration). See Data Routing Service Configuration via YAML.
Entries defined in the Routing Service YAML file for one or more in-Worker DIS processes:
- Storage
- DIS Instance
- Table Data Services
Filters to ensure development schemata are handled exclusively by the in-Worker DIS. The easiest way to configure this is to have one or more development namespaces that are routed to the in-Worker DIS, and then move the related tables to a production namespace when development is complete. See Data Filters.

Some of the elements above are listed as prerequisites so they can be set ahead of time without requiring changes to the Data Routing Service YAML file during development work.

Example

The snippets from a Routing Service YAML file (shown below) define the default production DIS (running as a monit-controlled process) - db_dis - and an in-Worker DIS - test_LastBy in the dataImportServers section. test_LastBy, as the name implies, can also provide Import-driven lastBy processing.

The filters used in both the DIS and TDCP (Table Data Cache Proxy) configuration sections basically split System (non-user) namespaces into DXFeed (processed by test_LastBy) and everything else (processed by db_dis). For querying, most data is routed through the TDCP, but the DXFeed development namespace is routed directly to the test_LastBy DIS. Working directly with the DIS sacrifices some caching performance for DXFeed queries, but allows for the schema to be updated and data to be deleted and replaced without requiring a TDCP restart.

storage:
# The standard db root location
- name: default
   dbRoot: /db
# This defines a storage location for use in an in-worker import server
- name: test_LastBy
   dbRoot: /db/dataImportServers/test

...
dataImportServers:

   db_dis:
     <<: *DIS_default
     host: *ddl_dis
     userIntradayDirectoryName: "IntradayUser"
     webServerParameters:
       enabled: true
       port: 8084
       authenticationRequired: false
       sslRequired: false
     # do not process DXFeed
     filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `DXFeed`"}

...

   # configuration for an in-worker lastBy import server, handling the DXFeed namespace
   test_LastBy:
     host: *ddl_query
     tailerPort: 22222
     # handle Orders tables only
     filters: {whereTableKey: "NamespaceSet = `System` && Namespace == `DXFeed`"}
     webServerParameters:
       enabled: false
     storage: test_LastBy
     definitionsStorage: default
     tableDataPort: 22223

...

   # Configuration for the TableDataCacheProxy named db_tdcp, and define
   # the data published by that service.
   # There is typically one such service on each worker machine.
   db_tdcp:
     host: localhost
     port: *default-tableDataCacheProxyPort
     sources:
       # SYSTEM_INTRADAY tables for "current date", minus DXFeed tables handled by test_LastBy
       - name: db_dis
         filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `DXFeed`", whereLocationKey: "ColumnPartition == currentDateNy()"}
       # local for SYSTEM_INTRADAY non-current date, minus Order tables handled by Simple_LastBy. This assumes that /db/Systems is mounted locally for the tdcp process.
       - name: local
         filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `DXFeed`", whereLocationKey: "ColumnPartition != currentDateNy()"}
       # LTDS for SYSTEM_INTRADAY non-current date, minus Order tables handled by Simple_LastBy
       # If historical system data is not mounted locally, get it from the LTDS service where it is available
       # - name: db_ltds
       # filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `DXFeed`", whereLocationKey: "ColumnPartition != currentDateNy()"}

       # all user data
       - name: db_rta
         filters: {namespaceSet: User}

   # define the TableDataService for query servers.
   query:
     sources:
     # Read offline system data from local storage.
     # Note that it is not absolutely necessary to exclude Namespace DXFeed, because that data is not visible to local.
     - name: local
       filters: {whereTableKey: "NamespaceSet == `System` && Offline"}
     # Read everything else from the defined table data cache proxy.
     - name: db_tdcp
       filters: {whereTableKey: "Online && Namespace != `DXFeed`"}
     # only DXFeed data
     - name: test_LastBy
       filters: {whereTableKey: "NamespaceSet = `System` && Namespace == `DXFeed`"}
Note that the query TableDataService name is the default table data service configuration used by all Deephaven workers. As such, every worker will try to connect to all sources specified here. If the in-worker DIS is not running, workers will report connection refusal errors in their logs, including in the console log. Similar errors may be shown if/when an in-worker DIS is restarted.

Alternatively, a separate TDS group can be set up for use during development of  in-worker DIS data feeds and their clients:

 # define the dev TableDataService.
   query_dev:
     sources:
     - name: query
       filters: {whereTableKey: "Namespace != `DXFeed`"}
     # only DXFeed data
     - name: test_LastBy
       filters: {whereTableKey: "NamespaceSet = `System` && Namespace == `DXFeed`"}
This would then require workers connecting to the tables provided by test_LastBy to use DataRoutingService.TableDataService=query_dev to see the data. This could be set using -D as a custom Java command line argument or as an extra JVM argument when launching a connection of the Deephaven console.

Since a worker can only have one value of TableDataService, it may be desirable to duplicate other sources into the dev group so that development workers can see all data, including the development data, and production users can see everything except the development data.

 # Define the TableDataService for query servers
   query:
     sources:
     # Read offline system data from local storage.
     # Note that it is not absolutely necessary to exclude Namespace Order, because that data is not visible to local.
     - name: local
       filters: {whereTableKey: "NamespaceSet == `System` && Offline"}
     # Read everything else from the defined table data cache proxy.
     - name: db_tdcp
       filters: {whereTableKey: "Online && Namespace != `DXFeed`"}
# Define the dev TableDataService
   query_dev:
     sources:
     # Read offline system data from local storage.
     # Note that it is not absolutely necessary to exclude Namespace Order, because that data is not visible to local.
     - name: local
       filters: {whereTableKey: "NamespaceSet == `System` && Offline"}
     # Read everything else from the defined table data cache proxy.
     - name: db_tdcp
       filters: {whereTableKey: "Online && Namespace != `DXFeed`"}
     # Only DXFeed data
     - name: test_LastBy
       filters: {whereTableKey: "NamespaceSet = `System` && Namespace == `DXFeed`"}

Use

Once this routing is set up, tailers writing to DXFeed tables will send their data to the test_LastBy DIS instance, and intraday queries for DXFeed data will also receive that data from test_LastBy. As such, DXFeed schemata, loggers, and listeners can be modified mid-day, and the test_LastBy persistent query can be restarted to pick up the changes without impacting all other namespaces which are being handled by the db_dis default DIS instance.

The script below can be used in a persistent query or console query to run the test_LastBy DIS process:

import com.illumon.iris.db.tables.dataimport.logtailer.DataImportServer;
import com.illumon.iris.db.tables.dataimport.importstate.lastby.LastByTableMapFactory;
import com.fishlib.configuration.Configuration;
import com.fishlib.util.process.ProcessEnvironment;
import java.nio.file.Paths;
import com.illumon.util.jettyhelper.JettyServerHelper;
import com.illumon.iris.console.utils.LiveTableMapSelector;
import com.illumon.iris.db.v2.configuration.DataRoutingService;
import com.illumon.iris.db.v2.configuration.DataRoutingServiceFactory;

routingService = DataRoutingServiceFactory.getDefault();

disConfig = routingService.getDataImportServiceConfig("test_LastBy");
dis = new DataImportServer(ProcessEnvironment.getDefaultLog(), disConfig, Configuration.getInstance());
dis.start();

//Create a "Fast LastBy" table: DXQuoteLastByTable
tmf=LastByTableMapFactory.forDataImportServer(dis);
DXQuoteLastByTableMap = tmf.createTableMap("DXFeed", "QuoteStock", currentDate(TZ_NY));
DXQuoteLastByTable = DXQuoteLastByTableMap.merge();

The bottom part of this script creates an Import-driven lastBy table called DXQuoteLastByTable, and, if this is running in a persistent query, that table can be read by other users that have access rights for the query. The lastBy table portion of the script is not required to enable the DIS process; once dis.start() is executed, the DIS process will begin accepting and processing tailer data streams for tables in the DXFeed namespace.

The Data Services persistent query type (available in Deephaven v1.20190322 or higher) simplifies configuration of an in-worker DIS process by setting up much of the prerequisite script code for the user. For the same configuration as shown above, a Data Services query would require only this script:

DXQuoteLastByTableMap = lastByTableMapFactory
   .createTableMap("DXFeed", "QuoteStock", currentDate(TZ_NY));
DXQuoteLastByTable = DXQuoteLastByTableMap.merge();

The rest of the work to import classes and configure and start the DIS process are handled by the built-in setup code of the Data Services script type itself. The one other piece of configuration that is required with this script type is to select the DIS process on the DataServiceScript Settings tab: