data_index
data_index
gets the DataIndex
for the given key columns on a table. It can also create a DataIndex
when none is present. A DataIndex
is an index that can make it faster for the engine to locate data by key. Some table operations have improved performance when a data index is present.
When a new data index is created, it is not immediately computed. The index is computed when a table operation first uses it, or when its table
attribute is called. This is an important detail for performance considerations.
The Deephaven engine will only use a DataIndex
when the keys exactly match what is needed for an operation. For example, if a data index is present for the columns X
and Y
, it will not be used if the engine only needs an index for column X
.
Syntax
data_index(
table: Table,
key_cols: List[str],
create_if_absent: bool = True
) -> Optional[DataIndex]
Parameters
Parameter | Type | Description |
---|---|---|
table | Table | The table to index. |
key_cols | List[str] | The names of the key column(s) to index. |
create_if_absent optional | bool | If |
Returns
A DataIndex
or None
.
Examples
The following example demonstrates creating a data index on a table using a single key column.
from deephaven.experimental.data_index import data_index
from deephaven import empty_table
source = empty_table(10).update("X = randomInt(0, 10)")
index = data_index(table=source, key_cols="X")
source_index = index.table
- source_index
- source
The following example demonstrates creating a data index on a table using multiple key columns.
from deephaven.experimental.data_index import data_index
from deephaven import empty_table
source = empty_table(50).update(["Key1 = randomInt(0, 5)", "Key2 = randomInt(5, 8)"])
index = data_index(table=source, key_cols=["Key1", "Key2"])
source_index = index.table
- source_index
- source
The following example sets create_if_absent
to False
, so the operation returns None
since the index does not yet exist.
from deephaven.experimental.data_index import data_index
from deephaven import empty_table
source = empty_table(10).update("X = randomInt(0, 10)")
index = data_index(table=source, key_cols="X", create_if_absent=False)
print(index)
- Log
- source