Preemptive tables
Deephaven provides the ability to share tables between queries. Table sharing is typically accomplished via viewport tables or Preemptive Tables:
Viewport Tables provide a consistent view of a portion of a table. The Deephaven GUI uses viewport tables when displaying tables in GUI panels. Only the portion of the table being "viewed" is delivered to the GUI. This allows users to manipulate large tables — tens of millions of rows — directly within the Deephaven user interface, with minimal use of CPU, memory and network bandwidth on their computers.
With Preemptive Tables, the query processor automatically pushes a consistent snapshot of all data from a table on the server to subscribed clients at regular intervals.
Preemptive tables are typically used when:
- One query performs expensive data analysis and shares the results with other queries. For example, an existing query may already use extensive computing power and time to analyze a large dataset, which subsequently creates a much smaller table containing the results of that analysis. By sharing this small result table, other queries and other users can use the result without needing to rerun the original (and expensive) query multiple times for each additional query and/or user.
- Users do not have permissions to access raw data but do have permissions to access results derived from the raw data. For example, a trading firm may decide that competing trading groups are not allowed to see each other's positions, but all trading groups can see the aggregate positions for the firm. A query could generate a Preemptive Table containing the aggregate positions for the firm and share it with all trading groups.
During a Preemptive Table refresh, the entire table is sent over the network to subscribed clients. Therefore, care should be taken to ensure:
- the table size will not overwhelm clients during initial connection, and
- the table's update frequency will not cause network congestion. Although Preemptive Tables solve some important problems, they should be employed judiciously, and data size and update frequency must be carefully considered.
Publishing a Preemptive Table
Warning
A threshold is set that will trigger a warning in the log when a certain amount of data is transmitted. This can be manually configured by setting the RemoteQueryProcessor.warnTransmitThresholdMB
with a positive integer value not to exceed 1000
, or disabled by setting it to 0
. If a threshold is crossed, it is raised to the level that triggered the warning, creating a high-water mark in the log record. This threshold is initially set to be 75% of the lesser value of the NIO.default.server.maxWriteQueue
or NIO.default.client.maxReadBuffer
property. If the serialized size of a table snapshot or delta exceeds NIO.default.server.maxWriteQueue
or NIO.default.client.maxReadBuffer
, the sending or receiving process will terminate with a fatal error.
Creating and publishing a Preemptive Table for sharing is very easy. Consider the example persistent query below, which is named "ExamplePreemptivePublisher" and is managed by a user named Mike. The query computes a result table named result and then creates and publishes a preemptive version of the result table named "resultPre". In this case, the preemptive "resultPre" table refreshes every five seconds (5000 ms).
// perform some calculations to produce a result table
result = db.timeTable("00:00:03")
// create and publish a preemptive version of the result table
// which refreshes every 5 seconds
resultPre = result.preemptiveUpdatesTable(5000)
from deephaven import *
# perform some calculations to produce a result table
result = db.timeTable("00:00:03")
# create and publish a preemptive version of the result table
# which refreshes every 5 seconds
resultPre = result.preemptiveUpdatesTable(5000)
To create a Preemptive Table, the preemptiveUpdatesTable
method is called on the source table. The only argument is the refresh interval, specified in milliseconds. This value determines how often the query will push data changes to connected clients. The refresh interval should always be greater than or equal to 1,000 milliseconds.
Subscribing to a Preemptive Table
To subscribe to a Preemptive Table:
- a query must exist that publishes the Preemptive Table, and
- you must have sufficient permissions to access the Preemptive Table.
The example below assumes the query from the prior section is being run as a persistent query owned by Mike and named "ExamplePreemptivePublisher". (The query owner and query name can be found in the Query Config window.) In the example, the query first connects to Mike's ExamplePreemptivePublisher query. Once the connection is established, the "resultPre" table from the query is retrieved as "t", which can now be used like any other table.
import com.illumon.iris.controller.utils.PersistentQueryTableHelper
// connect to Mike's ExamplePreemptivePublisher query
client = PersistentQueryTableHelper.getClientForPersistentQuery(log, "Mike", "ExamplePreemptivePublisher", 3*60*1000)
// get the "resultPre" table from Mike's ExamplePreemptivePublisher query
t = PersistentQueryTableHelper.getPreemptiveTableFromPersistentQuery(client, "resultPre")
// use the table to perform calculations
t2 = t.where("i>10")
// connect to the ExamplePreemptivePublisher query by using its serial number
client = PersistentQueryTableHelper.getClientForPersistentQuery(log, 1502904477776000000, 3*60*1000)
from deephaven import *
# connect to Mike's ExamplePreemptivePublisher query
client = PersistentQueryTableHelper.getClientForPersistentQuery(3*60*1000, owner="Mike", name="ExamplePreemptivePublisher")
# get the "resultPre" table from Mike's ExamplePreemptivePublisher query
t = PersistentQueryTableHelper.getPreemptiveTableFromPersistentQuery("resultPre", helper=client)
# use the table to perform calculations
t2 = t.where("i>10")
# connect to the ExamplePreemptivePublisher query by using its serial number
client = PersistentQueryTableHelper.getClientForPersistentQuery(3*60*1000, configSerial=1502904477776000000)
In the example, the connection will timeout and the query will fail to run if a connection to ExamplePreemptivePublisher cannot be established within three minutes.
In addition to connecting to a query using the query owner and query name, it is also possible to connect to a query by using the query's unique serial number, which is also displayed in the Query Config window. As shown in the following example client script, the query's serial number (e.g., 1502904477776000000 below) replaces the query author's name and the query name:
// connect to the ExamplePreemptivePublisher query by using its serial number
client = PersistentQueryTableHelper.getClientForPersistentQuery(log, 1502904477776000000, 3*60*1000)
Note
These examples can be run in your own instance of Deephaven. However, you will need to change the respective user name, query name and/or the query's serial number as needed based on how you configured the persistent queries.
Connection-Aware Remote Table (CART)
The Connection-Aware Remote Table provides an error-resilient proxy for preemptive tables being delivered from a persistent query. If the persistent query stops, the Connection-Aware Remote Table will continue operating, and will reconnect if the persistent query restarts again later. This allows queries to have optional dependencies on persistent queries. The Connection-Aware Remote Table (CART) can be used in most places that otherwise used the PersistentQueryTableHelper.getPreemptiveTableFromPersistentQuery function.
Using the Connection-Aware Remote Table
The Connection-Aware Remote Table (CART) is a caching proxy between a query and a table that has been exposed preemptively from a persistent query. A parent query using a CART can be written as though it was using the preemptive table directly, but if the persistent query stops, the CART will not cause the parent query to also stop. Instead, the CART will either immediately remove all of its own rows, or retain the last-known set of data but stop updating until the persistent query restarts.
To create a CART, a query must have several pieces of information available:
- The owner of the persistent query that the CART will connect to.
- The name of the persistent query that the CART will connect to.
- The name of the table within the persistent query that the CART will retrieve.
- Whether the CART should clear its data on disconnect.
- When
true
, the CART will immediately remove all of its own rows if the underlying persistent query stops, and then add new data when the underlying persistent query restarts. In other words, a stopped query is treated as an empty table. - When
false
, the CART will retain the last-known information from the underlying persistent query until the underlying persistent query restarts, then send out an update replacing the old data with the restarted data. In other words, a stopped query is treated as a non-updating table.
- When
- A prototype Table or TableDefinition to be used to define the columns returned by the CART. The underlying preemptive table may return more columns than the prototype or definition includes, but must include all columns that the prototype or definition includes, and must exactly match the column type of the prototype or definition.
Properties
- The boolean property
ConnectionAwareRemoteTable.printDependencyInformation
defaults tofalse
. When set totrue
, the CART will output additional information about its internal dependency resolution. However, this should normally be leftfalse
unless performing explicit dependency resolution error investigations. ConnectionAwareRemoteTable.autoReconnectDelayMillis
: if the CART disconnects from a query, but the query does not actually stop, the CART will wait this many milliseconds before attempting to automatically reconnect.
Logs
Log entries related to the CART will be prefaced with the words "Connection-aware remote table". Most log entries will also identify which CART instance is logging the message, by specifying which underlying persistent query table the logging CART is listening to.
Any time a CART disconnects from or connects to a query, a log message will record that fact. If the underlying persistent query table has an error, the CART will continue operating, and the underlying error will be recorded in the log as a warning.
Example CART
This example creates a CART that has the columns "Timestamp" and "a", and connects to a query owned by user "owner", named "queryName", with table "tableName". The table "tableName" must at least have the columns "Timestamp" (of type DBDateTime) and "a" (of type int). Since clearOnDisconnect
is set to true
, if the query stops, the CART will immediately remove all of its own rows.
import com.illumon.dataobjects.ColumnDefinition
import com.illumon.iris.db.tables.TableDefinition
import com.illumon.iris.controller.utils.\*
cols = new LinkedList<ColumnDefinition>()
cols.add(new ColumnDefinition("Timestamp", DBDateTime.class))
cols.add(new ColumnDefinition("a", int))
defn = new TableDefinition(cols)
cart = new ConnectionAwareRemoteTable(log, "owner", "queryName" "tableName", true, defn)
Retrieving Schemas
It may be inconvenient to specify a schema within a query. One pattern that eliminates this need and makes it easier to keep the CART and source query in sync is to have the source query explicitly write out the schema of its running tables to a known file location, then have the query with the CART read in that definition.
Note
This example assumes that queries will have unique table names; if that is not the case, you may need to adjust accordingly.
In the persistent query, after generating the preemptive table to be published:
import com.illumon.iris.db.tables.TableDefinition
TableDefinition definition = preemptiveTable.getDefinition()
TableDefinition.persistDefinition(definition, new File(KNOWN_FILE_LOCATION), preemptiveTable)
In the query with the CART:
import com.illumon.iris.db.tables.TableDefinition
TableDefinition definition = TableDefinition.loadDefinition(
new File(KNOWN_FILE_LOCATION +"/" + tableName + ".tbl"))
cart = new ConnectionAwareRemoteTable(log, owner, queryName, tableName, true, definition)
Troubleshooting
-
If the query specified in the CART's creation cannot be found, the CART will immediately fail. The query does not have to be running, but it does have to exist. Otherwise, it would be likely that a typo in the owner or query name would make it appear that the CART was failing to pick up data; by failing, a more direct indicator of the problem can be made available.
-
If two persistent queries are running with the same owner and name, any CART listening for that query will fail, as it will be unable to determine which query it is supposed to be tracking.
-
If the table prototype or definition passed in is not compatible with the table definition from the persistent query, the CART will fail. A compatible definition means that the persistent query's table must contain all of the columns defined in the CART, with exactly matching types. Any extra columns in the persistent query's table will be ignored.
-
If you connect a console to the persistent query you want the CART to connect to, and then try creating the CART from within that console, the CART will get stuck in an infinite loop, and the persistent query will become unresponsive and have to be restarted.
-
An initially-created CART is a valid empty table until it receives data from its underlying persistent query. In some cases (such as writing data out to a file), it may be important to wait for the CART to populate. In these cases, the awaitUpdate method is recommended.