Table data service
This guide covers using the Table Data Service API to integrate custom, partitioned, lazily-loaded data into Deephaven workflows.
In Groovy, you use the Java TableDataService interface and its implementations directly. The Python Table Data Service API is a higher-level wrapper built on top of the same interface.
Note
This feature is currently experimental. The API and its characteristics are subject to change.
TableDataService API
The TableDataService interface provides a way to integrate external data sources into Deephaven tables in both static and refreshing contexts. The data is partitioned, and each partition is loaded lazily — the engine only reads column data when a query needs it.
The API involves seven cooperating classes, each responsible for one layer of the data access stack:
| Class | Role |
|---|---|
TableKey | Unique identifier for a logical table |
TableLocationKey | Identifies a specific partition within a table |
AbstractTableDataService | Service entry point: maps a TableKey to a TableLocationProvider |
AbstractTableLocationProvider | Discovers and manages TableLocation objects for a given key |
AbstractTableLocation | Per-partition: reports row count and creates ColumnLocation objects |
AbstractColumnLocation | Per-column: provides typed data to the engine via ColumnRegion objects |
PartitionAwareSourceTable | Assembles everything into a queryable Deephaven table |
The following sections describe the implementation requirements for each class and show a complete working example using an in-memory data store.
TableKey
A TableKey is a unique identifier for a logical table. Implement ImmutableTableKey (a sub-interface of TableKey) to get a default makeImmutable() that returns this. You must provide:
getImplementationName(): A human-readable name used in logging.append(LogOutput): Describes the key in log output.hashCode()andequals(): Must be consistent with each other.
TableLocationKey
A TableLocationKey identifies a specific partition within a table. Extend PartitionedTableLocationKey, which handles partition-map storage, sorting, hashCode(), and equals(). Only getImplementationName() and append(LogOutput) need to be provided:
For unpartitioned tables, use the built-in singleton StandaloneTableLocationKey.getInstance().
ColumnLocation
A ColumnLocation provides the actual column data for one partition. It is created on demand by AbstractTableLocation.makeColumnLocation() and returns typed ColumnRegion objects that the engine uses to read data.
Extend AbstractColumnLocation and implement one makeColumnRegion* method per column type present in your table schema. For each, return an AppendOnlyFixedSizePageRegion* backed by an AppendOnlyRegionAccessor that reads from your data store.
AppendOnlyRegionAccessor requires two methods:
readChunkPage(firstRowPosition, minimumSize, destination): Fillsdestinationwith at leastminimumSizevalues starting atfirstRowPosition.size(): Returns the total number of rows at this location.
The following example stores column data in typed arrays. Each array is keyed by column name in a map passed from AbstractTableLocation:
TableLocation
A TableLocation represents one partition of a table. It reports the partition's row count to the engine and creates ColumnLocation objects on demand.
Extend AbstractTableLocation and implement:
makeColumnLocation(name): Returns aColumnLocationfor the named column.refresh(): Re-checks row count; callhandleUpdate(RowSet, timestamp)to report the current size.activateUnderlyingDataSource(): Called on first subscriber; initialize subscription state and callactivationSuccessful.deactivateUnderlyingDataSource(): Called when all subscribers detach; clear subscription state.matchSubscriptionToken(token): Returnstrueif the token matches the current subscription.getSortedColumns(),getDataIndexColumns(),hasDataIndex(),loadDataIndex(): Return empty values if not supporting sorting or data indexes.
TableLocationProvider
A TableLocationProvider discovers and manages the set of TableLocation objects for a given TableKey. Extend AbstractTableLocationProvider and implement:
makeTableLocation(locationKey): Returns theTableLocationfor the givenTableLocationKey.refresh(): Re-enumerates locations; callhandleTableLocationKeyAddedfor each known key.activateUnderlyingDataSource(): Subscribe and register known locations; callactivationSuccessful.deactivateUnderlyingDataSource(): Unsubscribe.matchSubscriptionToken(token): Returnstrueif token matches current subscription.
AbstractTableDataService
AbstractTableDataService is the service entry point. It caches TableLocationProvider instances and only requires implementing makeTableLocationProvider(TableKey):
Usage
The following code uses all six classes defined above to build a PartitionAwareSourceTable backed by an in-memory stock prices store with two date partitions:
Related documentation
AbstractColumnLocationJavadocAbstractTableDataServiceJavadocAbstractTableLocationJavadocAbstractTableLocationProviderJavadocAppendOnlyRegionAccessorJavadocColumnLocationJavadocColumnRegionJavadoc- Custom data sources
ImmutableTableKeyJavadocPartitionAwareSourceTableJavadocPartitionedTableLocationKeyJavadocStandaloneTableLocationKeyJavadocTableDataServiceJavadocTableKeyJavadocTableLocationJavadocTableLocationKeyJavadocTableLocationProviderJavadoc