Table data service

This guide covers using the Table Data Service API to integrate custom, partitioned, lazily-loaded data into Deephaven workflows.

In Groovy, you use the Java TableDataService interface and its implementations directly. The Python Table Data Service API is a higher-level wrapper built on top of the same interface.

Note

This feature is currently experimental. The API and its characteristics are subject to change.

TableDataService API

The TableDataService interface provides a way to integrate external data sources into Deephaven tables in both static and refreshing contexts. The data is partitioned, and each partition is loaded lazily — the engine only reads column data when a query needs it.

The API involves seven cooperating classes, each responsible for one layer of the data access stack:

ClassRole
TableKeyUnique identifier for a logical table
TableLocationKeyIdentifies a specific partition within a table
AbstractTableDataServiceService entry point: maps a TableKey to a TableLocationProvider
AbstractTableLocationProviderDiscovers and manages TableLocation objects for a given key
AbstractTableLocationPer-partition: reports row count and creates ColumnLocation objects
AbstractColumnLocationPer-column: provides typed data to the engine via ColumnRegion objects
PartitionAwareSourceTableAssembles everything into a queryable Deephaven table

The following sections describe the implementation requirements for each class and show a complete working example using an in-memory data store.

TableKey

A TableKey is a unique identifier for a logical table. Implement ImmutableTableKey (a sub-interface of TableKey) to get a default makeImmutable() that returns this. You must provide:

  • getImplementationName(): A human-readable name used in logging.
  • append(LogOutput): Describes the key in log output.
  • hashCode() and equals(): Must be consistent with each other.

TableLocationKey

A TableLocationKey identifies a specific partition within a table. Extend PartitionedTableLocationKey, which handles partition-map storage, sorting, hashCode(), and equals(). Only getImplementationName() and append(LogOutput) need to be provided:

For unpartitioned tables, use the built-in singleton StandaloneTableLocationKey.getInstance().

ColumnLocation

A ColumnLocation provides the actual column data for one partition. It is created on demand by AbstractTableLocation.makeColumnLocation() and returns typed ColumnRegion objects that the engine uses to read data.

Extend AbstractColumnLocation and implement one makeColumnRegion* method per column type present in your table schema. For each, return an AppendOnlyFixedSizePageRegion* backed by an AppendOnlyRegionAccessor that reads from your data store.

AppendOnlyRegionAccessor requires two methods:

  • readChunkPage(firstRowPosition, minimumSize, destination): Fills destination with at least minimumSize values starting at firstRowPosition.
  • size(): Returns the total number of rows at this location.

The following example stores column data in typed arrays. Each array is keyed by column name in a map passed from AbstractTableLocation:

TableLocation

A TableLocation represents one partition of a table. It reports the partition's row count to the engine and creates ColumnLocation objects on demand.

Extend AbstractTableLocation and implement:

  • makeColumnLocation(name): Returns a ColumnLocation for the named column.
  • refresh(): Re-checks row count; call handleUpdate(RowSet, timestamp) to report the current size.
  • activateUnderlyingDataSource(): Called on first subscriber; initialize subscription state and call activationSuccessful.
  • deactivateUnderlyingDataSource(): Called when all subscribers detach; clear subscription state.
  • matchSubscriptionToken(token): Returns true if the token matches the current subscription.
  • getSortedColumns(), getDataIndexColumns(), hasDataIndex(), loadDataIndex(): Return empty values if not supporting sorting or data indexes.

TableLocationProvider

A TableLocationProvider discovers and manages the set of TableLocation objects for a given TableKey. Extend AbstractTableLocationProvider and implement:

  • makeTableLocation(locationKey): Returns the TableLocation for the given TableLocationKey.
  • refresh(): Re-enumerates locations; call handleTableLocationKeyAdded for each known key.
  • activateUnderlyingDataSource(): Subscribe and register known locations; call activationSuccessful.
  • deactivateUnderlyingDataSource(): Unsubscribe.
  • matchSubscriptionToken(token): Returns true if token matches current subscription.

AbstractTableDataService

AbstractTableDataService is the service entry point. It caches TableLocationProvider instances and only requires implementing makeTableLocationProvider(TableKey):

Usage

The following code uses all six classes defined above to build a PartitionAwareSourceTable backed by an in-memory stock prices store with two date partitions: