When to schedule Import, Merge, and Validation queries

Import Queries

When not using the Deephaven log -> tailer -> DIS mechanism, data will usually be imported through the Deephaven import queries (CsvImport, JdbcImport, and XmlImport). It is important to determine when data will be available to import.

  • Some data may be imported at the end of a business day, in which case the query should be configured to run at the appropriate time. Remember the query scheduling panel allows selection of several different time zones.
  • Data may be imported many times during a day, in which case repeated scheduling may be a good option. The "Append" option may be selected here to ensure that the data is not overwritten.

Merge Queries

Merge queries should typically be run at the end of the day, when all data has been loaded. This may mean that different data needs to be merged at different times.

For example, data for a given Asia business day may be completed many hours before data for the New York business day (Tokyo is at least 16 hours ahead of San Francisco). In this case, if the data is being merged into two different tables, two merge queries should be defined and run at different times. However, if the data is being merged into a single table, then the merge should wait until the intraday data is complete for both time zones.

Validation Queries

Assuming the latest version of Deephaven is installed, the use of automated query dependencies with validation is recommended. The addition of appropriate validation XML to the schemas, followed by the definition of validation queries that are dependent on the tables' merge queries, will allow the automatic validation of expected data and even deletion of intraday data once validation is complete.