deephaven

Deephaven Python Integration Package provides the ability to access the Deephaven’s query engine natively and thus unlocks the unique power of Deephaven to the Python community.

exception DHError(cause=None, message='')[source]

Bases: Exception

The custom exception class for the Deephaven Python package.

This exception can be raised due to user errors or system errors when Deephaven resources and functions are accessed, for example, during reading a CSV/Parquet file into a Deephaven table or performing an aggregation or join operation on Deephaven tables. It is a good practice for Python code to catch this exception and handle it appropriately.

property compact_traceback

The compact traceback of the exception.

property root_cause

The root cause of the exception.

property traceback

The traceback of the exception.

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class DynamicTableWriter(col_defs)[source]

Bases: JObjectWrapper

The DynamicTableWriter creates a new in-memory table and supports writing data to it.

This class implements the context manager protocol and thus can be used in with statements.

Initializes the writer and creates a new in-memory table.

Parameters:

col_defs (Dict[str, DTypes]) – a map of column names and types of the new table

Raises:

DHError

close()[source]

Closes the writer.

Raises:

DHError

Return type:

None

j_object_type

alias of DynamicTableWriter

write_row(*values)[source]

Writes a row to the newly created table.

The type of a value must be convertible (safely or unsafely, e.g. lose precision, overflow, etc.) to the type of the corresponding column.

Parameters:

*values (Any) – the values of the new row, the data types of these values must match the column definitions of the table

Raises:

DHError

Return type:

None

class SortDirection(value)[source]

Bases: Enum

An enum defining the sorting orders.

ASCENDING = 2
DESCENDING = 1
class TableReplayer(start_time, end_time)[source]

Bases: JObjectWrapper

The TableReplayer is used to replay historical data.

Tables to be replayed are registered with the replayer. The resulting dynamic replay tables all update in sync, using the same simulated clock. Each registered table must contain a timestamp column.

Initializes the replayer.

Parameters:
  • start_time (Union[dtypes.Instant, int, str, datetime.datetime, np.datetime64, pd.Timestamp]) – replay start time. Integer values are nanoseconds since the Epoch.

  • end_time (Union[dtypes.Instant, int, str, datetime.datetime, np.datetime64, pd.Timestamp]) – replay end time. Integer values are nanoseconds since the Epoch.

Raises:

DHError

add_table(table, col)[source]

Registers a table for replaying and returns the associated replay table.

Parameters:
  • table (Table) – the table to be replayed

  • col (str) – column in the table containing timestamps

Return type:

Table

Returns:

a replay Table

Raises:

DHError

j_object_type

alias of Replayer

shutdown()[source]

Shuts down and invalidates the replayer. After this call, the replayer can no longer be used.

Return type:

None

start()[source]

Starts replaying.

Raises:

DHError

Return type:

None

empty_table(size)[source]

Creates a table with rows but no columns.

Parameters:

size (int) – the number of rows

Return type:

Table

Returns:

a Table

Raises:

DHError

function_generated_table(table_generator, source_tables=None, refresh_interval_ms=None, exec_ctx=None, args=(), kwargs={})[source]

Creates an abstract table that is generated by running the table_generator() function. The function will first be run to generate the table when this method is called, then subsequently either (a) whenever one of the ‘source_tables’ ticks or (b) after refresh_interval_ms have elapsed. Either ‘refresh_interval_ms’ or ‘source_tables’ must be set (but not both).

Function-generated tables can be used to produce dynamic tables from sources outside Deephaven. For example, function-generated tables can create tables that are produced by arbitrary Python logic (including using Pandas or numpy). They can also be used to retrieve data from external sources (such as files or websites).

The table definition must not change between invocations of the ‘table_generator’ function, or an exception will be raised.

Note that the ‘table_generator’ may access data in the sourceTables but should not perform further table operations on them without careful handling. Table operations may be memoized, and it is possible that a table operation will return a table created by a previous invocation of the same operation. Since that result will not have been included in the ‘source_table’, it’s not automatically treated as a dependency for purposes of determining when it’s safe to invoke ‘table_generator’, allowing races to exist between accessing the operation result and that result’s own update processing. It’s best to include all dependencies directly in ‘source_table’, or only compute on-demand inputs under a LivenessScope.

Parameters:
  • table_generator (Callable[..., Table]) – The table generator function. This function must return a Table.

  • source_tables (Union[Table, List[Table]]) – Source tables used by the ‘table_generator’ function. The ‘table_generator’ is rerun when any of these tables tick.

  • refresh_interval_ms (int) – Interval (in milliseconds) at which the ‘table_generator’ function is rerun.

  • exec_ctx (ExecutionContext) – A custom execution context. If ‘None’, the current execution context is used. If there is no current execution context, a ValueError is raised.

  • args (Tuple) – Optional tuple of positional arguments to pass to table_generator. Defaults to ()

  • kwargs (Dict) – Optional dictionary of keyword arguments to pass to table_generator. Defaults to {}

Return type:

Table

Returns:

a new table

Raises:

DHError

garbage_collect()[source]

Runs full garbage collection in Python first and then requests the JVM to run its garbage collector twice due to the cross-referencing nature of the Python/Java integration in Deephaven. Since there is no way to force the Java garbage collector to run, the effect of calling this function is non-deterministic. Users also need to be mindful of the overhead that running garbage collection generally incurs.

Raises:

DHError

Return type:

None

input_table(col_defs=None, init_table=None, key_cols=None)[source]

Creates an in-memory InputTable from either column definitions or an initial table. When key columns are provided, the InputTable will be keyed, otherwise it will be append-only.

There are two types of in-memory InputTable - append-only and keyed.

The append-only input table is not keyed, all rows are added to the end of the table, and deletions and edits are not permitted.

The keyed input table has keys for each row and supports addition/deletion/modification of rows by the keys.

Parameters:
  • col_defs (Dict[str, DType]) – the column definitions

  • init_table (Table) – the initial table

  • key_cols (Union[str, Sequence[str]) – the name(s) of the key column(s)

Return type:

InputTable

Returns:

an InputTable

Raises:

DHError

merge(tables)[source]

Combines two or more tables into one aggregate table. This essentially appends the tables one on top of the other. Null tables are ignored.

Parameters:

tables (List[Table]) – the source tables

Returns:

a Table

Raises:

DHError

merge_sorted(tables, order_by)[source]

Combines two or more tables into one sorted, aggregate table. This essentially stacks the tables one on top of the other and sorts the result. Null tables are ignored. mergeSorted is more efficient than using merge followed by sort.

Parameters:
  • tables (List[Table]) – the source tables

  • order_by (str) – the name of the key column

Return type:

Table

Returns:

a Table

Raises:

DHError

new_table(cols)[source]

Creates an in-memory table from a list of input columns or a Dict (mapping) of column names and column data. Each column must have an equal number of elements.

When the input is a mapping, an intermediary Pandas DataFrame is created from the mapping, which then is converted to an in-memory table. In this case, as opposed to when the input is a list of InputColumns, the column types are determined by Pandas’ type inference logic.

Parameters:

cols (Union[List[InputColumn], Mapping[str, Sequence]]) – a list of InputColumns or a mapping of columns names and column data.

Return type:

Table

Returns:

a Table

Raises:

DHError

read_csv(path, header=None, headless=False, header_row=0, skip_rows=0, num_rows=9223372036854775807, ignore_empty_lines=False, allow_missing_columns=False, ignore_excess_columns=False, delimiter=',', quote='"', ignore_surrounding_spaces=True, trim=False)

Read the CSV data specified by the path parameter as a table.

Parameters:
  • path (str) – a file path or a URL string

  • header (Dict[str, DType]) – a dict to define the table columns with key being the name, value being the data type

  • headless (bool) – whether the csv file doesn’t have a header row, default is False

  • header_row (int) – the header row number, all the rows before it will be skipped, default is 0. Must be 0 if headless is True, otherwise an exception will be raised

  • skip_rows (long) – number of data rows to skip before processing data. This is useful when you want to parse data in chunks. Defaults to 0

  • num_rows (long) – max number of rows to process. This is useful when you want to parse data in chunks. Defaults to the maximum 64bit integer value

  • ignore_empty_lines (bool) – whether to ignore empty lines, default is False

  • allow_missing_columns (bool) – whether the library should allow missing columns in the input. If this flag is set, then rows that are too short (that have fewer columns than the header row) will be interpreted as if the missing columns contained the empty string. Defaults to false.

  • ignore_excess_columns (bool) – whether the library should allow excess columns in the input. If this flag is set, then rows that are too long (that have more columns than the header row) will have those excess columns dropped. Defaults to false.

  • delimiter (str) – the delimiter used by the CSV, default is the comma

  • quote (str) – the quote character for the CSV, default is double quote

  • ignore_surrounding_spaces (bool) – Indicates whether surrounding white space should be ignored for unquoted text fields, default is True

  • trim (bool) – indicates whether to trim white space inside a quoted string, default is False

Return type:

Table

Returns:

a table

Raises:

DHError

read_sql(conn, query, driver='connectorx')[source]

Executes the provided SQL query via a supported driver and returns a Deephaven table.

Parameters:
  • conn (Any) – must either be a connection string for the given driver or a Turbodbc/ADBC DBAPI Connection object; when it is a Connection object, the driver argument will be ignored.

  • query (str) – SQL query statement

  • driver (Literal['odbc', 'adbc', 'connectorx']) – (str): the driver to use, supported drivers are “odbc”, “adbc”, “connectorx”, default is “connectorx”

Return type:

Table

Returns:

a new Table

Raises:

DHError

ring_table(parent, capacity, initialize=True)[source]

Creates a ring table that retains the latest ‘capacity’ number of rows from the parent table. Latest rows are determined solely by the new rows added to the parent table, deleted rows are ignored, and updated rows are not expected and will raise an exception.

Ring table is mostly used with blink tables which do not retain their own data for more than an update cycle.

Parameters:
  • parent (Table) – the parent table

  • capacity (int) – the capacity of the ring table

  • initialize (bool) – whether to initialize the ring table with a snapshot of the parent table, default is True

Return type:

Table

Returns:

a Table

Raises:

DHError

time_table(period, start_time=None, blink_table=False)[source]

Creates a table that adds a new row on a regular interval.

Parameters:
  • period (Union[dtypes.Duration, int, str, datetime.timedelta, np.timedelta64, pd.Timedelta]) – time interval between new row additions, can be expressed as an integer in nanoseconds, a time interval string, e.g. “PT00:00:00.001” or “PT1s”, or other time duration types.

  • start_time (Union[None, Instant, int, str, datetime.datetime, np.datetime64, pd.Timestamp], optional) – start time for adding new rows, defaults to None which means use the current time as the start time.

  • blink_table (bool, optional) – if the time table should be a blink table, defaults to False

Return type:

Table

Returns:

a Table

Raises:

DHError

write_csv(table, path, cols=[])

Write a table to a standard CSV file.

Parameters:
  • table (Table) – the source table

  • path (str) – the path of the CSV file

  • cols (List[str]) – the names of the columns to be written out

Raises:

DHError

Return type:

None