Skip to main content
Version: Python

data_index

data_index gets the DataIndex for the given key columns on a table. It can also create a DataIndex when none is present. A DataIndex is an index that can make it faster for the engine to locate data by key. Some table operations have improved performance when a data index is present.

note

When a new data index is created, it is not immediately computed. The index is computed when a table operation first uses it, or when its table attribute is called. This is an important detail for performance considerations.

note

The Deephaven engine will only use a DataIndex when the keys exactly match what is needed for an operation. For example, if a data index is present for the columns X and Y, it will not be used if the engine only needs an index for column X.

Syntax

data_index(
table: Table,
key_cols: List[str],
create_if_absent: bool = True
) -> Optional[DataIndex]

Parameters

ParameterTypeDescription
tableTable

The table to index.

key_colsList[str]

The names of the key column(s) to index.

create_if_absent optionalbool

If True, create the index if it does not already exist. If False, return None if the index does not exist. Creating an index does not compute the index. The index is computed when first used by a table operation, or when the table attribute is called on the index. Default is True.

Returns

A DataIndex or None.

Examples

The following example demonstrates creating a data index on a table using a single key column.

from deephaven.experimental.data_index import data_index
from deephaven import empty_table

source = empty_table(10).update("X = randomInt(0, 10)")

index = data_index(table=source, key_cols="X")

source_index = index.table

The following example demonstrates creating a data index on a table using multiple key columns.

from deephaven.experimental.data_index import data_index
from deephaven import empty_table

source = empty_table(50).update(["Key1 = randomInt(0, 5)", "Key2 = randomInt(5, 8)"])

index = data_index(table=source, key_cols=["Key1", "Key2"])

source_index = index.table

The following example sets create_if_absent to False, so the operation returns None since the index does not yet exist.

from deephaven.experimental.data_index import data_index
from deephaven import empty_table

source = empty_table(10).update("X = randomInt(0, 10)")

index = data_index(table=source, key_cols="X", create_if_absent=False)

print(index)