Snapshots
Deephaven tables are updated on a cycle - when data sources tick, all tables that are dependent on the data source are updated. Typically snapshots are used to:
- Reduce the update frequency for a given table to either improve query performance or prevent excessive changes from propagating downstream.
- Provide a mechanism for creating a static table from a ticking table, or static snapshot.
With snapshots, you can explicitly set the interval at which a table updates to reflect changing data:
snapshotTable = triggerTable.snapshot(targetTable)
The snapshot operation produces an in-memory copy of a table ("targetTable"), which refreshes every time another table ("triggerTable") ticks.
The triggerTable is often a timeTable, a special type of table that adds new rows at a regular, user-defined interval. The sole column of a timeTable is Timestamp.
Note
You must ensure that the corresponding column names in the targetTable do not have the same name as the column in your triggerTable. For instance, your timeTable column will be Timestamp, which will likely conflict with the Timestamp column in your targetTable. Be sure to change these column names in your query if necessary; e.g., .renameColumns("TradeTime=Timestamp")
.
Ticking at a Specified Interval
Standard Snapshots
The following line of code is an example of a timeTable that ticks (adding a new row) every five seconds:
timeTable = timeTable("00:00:05")
The following line of code demonstrates how to use timeTable as the trigger table to create a snapshot of a target table called quotes. The value from the Timestamp column in the newest row of timeTable will be included in all rows of the resulting quotesSnapshot table.
quotesSnapshot = timeTable.snapshot(quotes)
Whenever timeTable ticks, in this case, every five seconds, quotesSnapshot will refresh to reflect the current version of the quotes table. The query engine will recalculate the values for all cells in the quotes table (whether the rows have ticked or not) and store them in memory; further, downstream table operations will process ticks for all rows in the quotesSnapshot table. This will incur the cost of running all deferred calculations from view and updateView operations, as well as updating the state of downstream operations (such as joins and aggregations). However, quotesSnapshot will not change again until the next tick in timeTable, regardless of how much the quotes table changes in the interim. By replacing all ticks in quotes with a user-defined five-second interval, the frequency at which quotes ticks will no longer significantly affect performance.
The standard snapshot()
operation will:
- recalculate values for all rows in the target table, even if those rows in the target table have not ticked.
- force downstream operations to process a tick for all rows.
This is often desirable when a query's calculations depend on time, such as formulas referencing the currentTime() method or a timestamp from a time table.
Example
In this example, we create two tables that will update every 5 seconds, when the triggerTable (a time table) ticks. Let's start with a large table of trades data, and two sub-tables from that data. We will also create the triggerTable, a time table that adds a row at 1 minute intervals.
Note
In a real use case, these tables would be ticking once per second. However, only historical data is available in the demo LearnDeephaven namespace.
For a real ticking dataset, you would replace db.t
with db.i
.
allStockTrades=db.t("LearnDeephaven", "StockTrades").where()
lastTrades=allStockTrades.lastBy("Sym").renameColumns("TradeTime=Timestamp")
volume=allStockTrades.view("Sym", "Size").sumBy("Sym").renameColumns("Volume=Size")
triggerTable=timeTable("00:01:00")
snappedPrice=triggerTable.snapshot(lastTrades)
snappedVolume=triggerTable.snapshot(volume)
lastByVolume=snappedVolume.dropColumns("Timestamp").sortDescending("Volume").naturalJoin(snappedPrice, "Sym")
snappedPrice and snappedVolume only update when the time table ticks. Since the lastByVolume table depends on the periodically refreshing tables, it need only be recomputed every minute rather than every second.
Incremental Snapshots
The fundamental difference between a regular snapshot and an incremental snapshot is that a regular snapshot replaces ticks of the target table, whereas an incremental snapshot collapses ticks of the target table.
The syntax is the same:
quotesSnapshot = timeTable.snapshotIncremental(quotes, "Timestamp")
With the snapshotIncremental()
operation:
- The query engine will track which rows of the target table have changed between ticks of the trigger table.
- When the trigger table ticks, the query engine will only update the result table for the rows that changed in the target table, reducing the amount of processing that must occur.
This is useful when not all formulas must be refreshed at a regular interval. In these situations, it may be more expedient to refresh only the rows in the result table for which the corresponding rows in the target table have ticked.
Tip
Since there is an added cost to tracking ticks in the targetTable, snapshot()
should be preferred over snapshotIncremental()
, unless it is likely that only a portion of the target table changes between ticks of the triggerTable. (This is rarely the case for tables containing market data.)
Creating a static snapshot
When the left hand side of a snapshot operation is static, then the resulting table is also static. This can be useful if you would like to freeze a table in time by making an in-memory copy of it. The TableTools emptyTable
method creates a static table with zero columns and a specified number of rows. A single row zero-column table can be used to make an exact (but static) copy of a ticking table as shown below.
This example uses one of Deephaven's internal tables:
pqsl=db.i("DbInternal", "PersistentQueryStateLog").where("Date=currentDateNy()")
snappqsl=emptyTable(1).snapshot(pqsl, true)
println("PQSL Refreshing: ” + pqsl.isLive())
println("SnapPQSL Refreshing:" + snappqsl.isLive())
from deephaven import ttools
pqsl=db.i("DbInternal", "PersistentQueryStateLog").where("Date=currentDateNy()")
snappqsl=ttools.emptyTable(1).snapshot(pqsl, True)
print("PQSL Refreshing: " + str(pqsl.isLive()))
print("SnapPQSL Refreshing: " + str(snappqsl.isLive()))
Snapshot History
The snapshotHistory()
operation provides a history of the right hand side table at each interval:
historyTable = triggerTable.snapshotHistory(targetTable)
For example, the following snippet records the current state of each persistent query every minute in the "psqlHistory" table.
pqsl=db.i("DbInternal", "PersistentQueryStateLog").where("Date=currentDateNy()").lastBy("SerialNumber")
tt=timeTable("00:01:00").renameColumns("SnapshotTime=Timestamp")
pqslHistory=tt.snapshotHistory(pqsl)
The history table will grow in memory on each interval, because previous versions and the new version must be saved. This memory utilization must be taken into account when determining the proper interval for snapshotting and the required query heap resources.