deephaven.experimental.table_data_service

This module defines a table service backend interface TableDataServiceBackend that users can implement to provide external data in the format of pyarrow Table to Deephaven tables. The backend service implementation should be passed to the TableDataService constructor to create a new TableDataService instance. The TableDataService instance can then be used to create Deephaven tables backed by the backend service.

class TableDataService(backend, *, chunk_reader_factory=None, stream_reader_options=None, page_size=None)[source]

Bases: JObjectWrapper

A TableDataService serves as a wrapper around a tightly-coupled Deephaven TableDataService implementation (Java class PythonTableDataService) that delegates to a Python TableDataServiceBackend for TableKey creation, TableLocationKey discovery, and data subscription/retrieval operations. It supports the creation of Deephaven tables from the Python backend service that provides table data and table data locations to the Deephaven tables.

Creates a new TableDataService with the given user-implemented backend service.

Parameters:
  • backend (TableDataServiceBackend) – the user-implemented backend service implementation

  • chunk_reader_factory (Optional[jpy.JType]) – the Barrage chunk reader factory, default is None

  • stream_reader_options (Optional[jpy.JType]) – the Barrage stream reader options, default is None

  • page_size (int) – the page size for the table service, default is None, meaning to use the configurable jvm property: PythonTableDataService.defaultPageSize which defaults to 64K.

j_object_type

alias of PythonTableDataService

make_table(table_key, *, refreshing)[source]

Creates a Table backed by the backend service with the given table key.

Parameters:
  • table_key (TableKey) – the table key

  • refreshing (bool) – whether the table is live or static

Returns:

a new table

Return type:

Table

Raises:

DHError

class TableDataServiceBackend[source]

Bases: ABC

An interface for a backend service that provides access to table data.

abstract column_values(table_key, table_location_key, col, offset, min_rows, max_rows, values_cb, failure_cb)[source]

Provides the data values for the column with the given name for the table location with the given table key and table location key via the values_cb callback. The column values are provided as a pyarrow.Table that contains the data values for the column within the specified range requirement. The values_cb callback should be called with a single column pyarrow.Table that contains the data values for the given column within the specified range requirement.

The failure callback should be invoked when a failure to provide the column values occurs.

The column_values caller will block until one of the values or failure callbacks is called.

Note that asynchronous calls to any callback may block until this method has returned.

Parameters:
  • table_key (TableKey) – the table key

  • table_location_key (TableLocationKey) – the table location key

  • col (str) – the column name

  • offset (int) – the starting row index

  • min_rows (int) – the minimum number of rows to return, min_rows is always <= page size

  • max_rows (int) – the maximum number of rows to return

  • values_cb (Callable[[pa.Table], None]) – the callback function with one argument: the pyarrow.Table that contains the data values for the column within the specified range

  • failure_cb (Callable[[Exception], None]) – the failure callback function

Return type:

None

abstract subscribe_to_table_location_size(table_key, table_location_key, size_cb, success_cb, failure_cb)[source]

Provides the current and future sizes of the table location with the given table key and table location key via the size_cb callback. The size is the number of rows in the table location.

The success callback should be called when the subscription is established successfully and after the current table location size has been delivered to the size callback.

The failure callback should be invoked at initial failure to establish a subscription, or on a permanent failure to keep the subscription active (e.g. failure with no reconnection possible, or failure to reconnect/resubscribe before a timeout).

This is called for tables created when :meth:``TableDataService.make_table` is called with refreshing=True

Note that asynchronous calls to any callback will block until this method has returned.

Parameters:
  • table_key (TableKey) – the table key

  • table_location_key (TableLocationKey) – the table location key

  • size_cb (Callable[[int], None]) – the table location size callback function

  • success_cb (Callable[[], None]) – the success callback function

  • failure_cb (Callable[[Exception], None]) – the failure callback function

Returns:

a function that can be called to unsubscribe from this subscription

Return type:

Callable[[], None]

abstract subscribe_to_table_locations(table_key, location_cb, success_cb, failure_cb)[source]

Provides the table locations, existing and new, for the table with the given table key via the location_cb callback.

The location callback should be called with the table location key and an optional pyarrow.Table that contains the partitioning values for the location. The schema of the table must match the optional partitioning column schema returned by table_schema() for the table_key. The table must have a single row for the particular table location key provided in the 1st argument, with values for each partitioning column in the row.

The success callback should be called when the subscription is established successfully and after all existing table locations have been delivered to the table location callback.

The failure callback should be invoked at initial failure to establish a subscription, or on a permanent failure to keep the subscription active (e.g. failure with no reconnection possible, or failure to reconnect/resubscribe before a timeout).

This is called for tables created when TableDataService.make_table() is called with refreshing=True.

Note that asynchronous calls to any callback will block until this method has returned.

Parameters:
  • table_key (TableKey) – the table key

  • location_cb (Callable[[TableLocationKey, Optional[pa.Table]], None]) – the table location callback function

  • success_cb (Callable[[], None]) – the success callback function

  • failure_cb (Callable[[Exception], None]) – the failure callback function

Returns:

a function that can be called to unsubscribe from this subscription

Return type:

Callable[[], None]

abstract table_location_size(table_key, table_location_key, size_cb, failure_cb)[source]

Provides the size of the table location with the given table key and table location key via the size_cb callback. The size is the number of rows in the table location.

The failure callback should be invoked when a failure to provide the table location size occurs.

The table_location_size caller will block until one of the size or failure callbacks is called.

This is called for tables created when TableDataService.make_table() is called with refreshing=False.

Note that asynchronous calls to any callback may block until this method has returned.

Parameters:
  • table_key (TableKey) – the table key

  • table_location_key (TableLocationKey) – the table location key

  • size_cb (Callable[[int], None]) – the callback function

Return type:

None

abstract table_locations(table_key, location_cb, success_cb, failure_cb)[source]

Provides the existing table locations for the table with the given table via the location_cb callback.

The location callback should be called with the table location key and an optional pyarrow.Table that contains the partitioning values for the location. The schema of the table must match the optional partitioning column schema returned by table_schema() for the table_key. The table must have a single row for the particular table location key provided in the 1st argument, with values for each partitioning column in the row.

The success callback should be called when all existing table locations have been delivered to the table location callback.

The failure callback should be invoked when failure to provide existing table locations occurs.

The table_locations caller will block until one of the success or failure callbacks is called.

This is called for tables created when TableDataService.make_table() is called with refreshing=False

Note that asynchronous calls to any callback may block until this method has returned.

Parameters:
  • table_key (TableKey) – the table key

  • location_cb (Callable[[TableLocationKey, Optional[pa.Table]], None]) – the callback function

  • success_cb (Callable[[], None]) – the success callback function

  • failure_cb (Callable[[Exception], None]) – the failure callback function

Return type:

None

abstract table_schema(table_key, schema_cb, failure_cb)[source]

Provides the table data schema and the partitioning column schema for the table with the given table key via the schema_cb callback. The table data schema is not required to include the partitioning columns defined in the partitioning column schema.

The failure callback should be invoked when a failure to provide the schemas occurs.

The table_schema caller will block until one of the schema or failure callbacks is called.

Note that asynchronous calls to any callback may block until this method has returned.

Parameters:
  • table_key (TableKey) – the table key

  • schema_cb (Callable[[pa.Schema, Optional[pa.Schema]], None]) – the callback function with two arguments: the table data schema and the optional partitioning column schema

  • failure_cb (Callable[[Exception], None]) – the failure callback function

Return type:

None

class TableKey[source]

Bases: ABC

A key that identifies a table. The key should be unique for each table. The key can be any Python object and should include sufficient information to uniquely identify the table for the backend service. The __hash__ method must be implemented to ensure that the key is hashable.

class TableLocationKey[source]

Bases: ABC

A key that identifies a specific location of a table. The key should be unique for each table location of the table. The key can be any Python object and should include sufficient information to uniquely identify the location for the backend service to fetch the data values and data size. The __hash__ method must be implemented to ensure that the key is hashable.