SortOrderProvider
The SortOrderProvider class specifies the sort order to use when writing data to an Iceberg table from Deephaven. It allows users to select the sort order by sort order ID, use the default sort order of the table, specify unsorted data, or configure error handling for unmapped columns.
Constructors
A SortOrderProvider is constructed using one of the following class methods:
SortOrderProvider.from_sort_id(sort_order_id: int)
SortOrderProvider.use_table_default()
SortOrderProvider.unsorted()
Methods
from_sort_id: Selects a sort order by its ID from the Iceberg table metadata.use_table_default: Uses the table's default sort order.unsorted: Specifies that data should be written without any sort order.with_fail_on_unmapped: Determines if an existingSortOrderProviderwill fail or not if any columns in the sort order are not present in the data to be written.with_id: Returns a new sort order provider that uses the existing provider to determine the columns to sort on, but writes a provided sort order ID to the Iceberg table.
Parameters
| Parameter | Type | Description |
|---|---|---|
| sort_order_id | int | The ID of the sort order as defined in the Iceberg table's metadata. |
| fail_on_unmapped | bool | If |
Usage
A SortOrderProvider is used as an argument to TableParquetWriterOptions when writing Deephaven tables to Iceberg, allowing control over the sort order of the written data.
Examples
The following example creates a TableParquetWriterOptions that writes data using the table's default sort order:
from deephaven.experimental import iceberg
from deephaven import dtypes as dht
source_def = {"ID": dht.int32, "Name": dht.string, "Value": dht.double}
sort_order_provider = iceberg.SortOrderProvider.use_table_default()
writer_options = iceberg.TableParquetWriterOptions(
table_definition=source_def, sort_order_provider=sort_order_provider
)
To specify a particular sort order by ID and fail on unmapped columns:
from deephaven.experimental import iceberg
from deephaven import dtypes as dht
source_def = {"ID": dht.int32, "Name": dht.string, "Value": dht.double}
sort_order_provider = iceberg.SortOrderProvider.from_sort_id(1).with_fail_on_unmapped(
True
)
writer_options = iceberg.TableParquetWriterOptions(
table_definition=source_def, sort_order_provider=sort_order_provider
)
To write data without any sort order:
from deephaven.experimental import iceberg
from deephaven import dtypes as dht
source_def = {"ID": dht.int32, "Name": dht.string, "Value": dht.double}
sort_order_provider = iceberg.SortOrderProvider.unsorted()
writer_options = iceberg.TableParquetWriterOptions(
table_definition=source_def, sort_order_provider=sort_order_provider
)