Skip to main content
Version: Python

IcebergReadInstructions

The IcebergReadInstructions class specifies the instructions for reading Iceberg tables into Deephaven. These include column renames, table definitions, and special instructions for loading files from cloud storage.

Syntax

IcebergReadInstructions(
table_definition: Union[Dict[str, DType], List[Column]] = None,
data_instructions: S3Instructions = None,
column_renames: Dict[str, str] = None,
update_mode: IcebergUpdateMode = None,
snapshot_id: int = None
)

Parameters

ParameterTypeDescription
table_definitionUnion[Dict[str, DType], List[Column]]

The table definition. If not given, the definition is inferred from the Iceberg schema. Setting a definition guarantees the returned table has the given definition. This is mostly used to specify a subset of Iceberg schema columns.

data_instructionsS3Instructions

Special instructions for reading data files from S3 cloud storage.

column_renamesDict[str, str]

A mapping of old to new column names for the table. If not given, the column names are the same as the Iceberg schema.

update_modeIcebergUpdateMode

The update mode for the table. Options include:

  • IcebergUpdateMode.static(): Specifies that the Iceberg table should be loaded once and not refreshed.
  • IcebergUpdateMode.manual_refresh(): Specifies that the Iceberg table should be loaded once and refreshed manually.
  • IcebergUpdateMode.auto_refresh(): Specifies that the Iceberg table should be loaded once and refreshed automatically. The default refresh interval is 60 seconds, but can be changed with the auto_refresh_ms input parameter.
snapshot_idint

The snapshot ID to read. If not given, the most recent snapshot ID is used.

Methods

None.

Constructors

An IcebergReadInstructions is constructed directly from the class.

Examples

The following example creates an IcebergReadInstructions object that renames Iceberg columns region and item_type to Area and Category in Deephaven, respectively:

from deephaven.experimental import iceberg

custom_instructions = iceberg.IcebergReadInstructions(
column_renames={"region": "Area", "item_type": "Category"}
)

The following example creates an IcebergReadInstructions object that renames columns as well as specifies the table definition:

from deephaven.experimental import iceberg
from deephaven import dtypes as dht

custom_instructions = iceberg.IcebergReadInstructions(
column_renames={"region": "Area", "item_type": "Category", "unit_price": "Price"},
table_definition={
"Area": dht.string,
"Category": dht.string,
"Price": dht.double,
},
)

The following example creates four IcebergReadInstructions objects. The first is for static Iceberg tables, the second is for Iceberg tables that can be manually refreshed, and the third and fourth are for Iceberg tables that will be refreshed automatically. The third uses the default value of 60 seconds, whereas the fourth sets the interval to 30 seconds.

from deephaven.experimental import iceberg

static_mode = iceberg.IcebergUpdateMode.static()
manual_refresh_mode = iceberg.IcebergUpdateMode.manual_refresh()
auto_refresh_mode_60s = iceberg.IcebergUpdateMode.auto_refresh()
auto_refresh_mode_30s = iceberg.IcebergUpdateMode.auto_refresh(auto_refresh_ms=30000)

static_instructions = iceberg.IcebergReadInstructions(update_mode=static_mode)
manual_refresh_instructions = iceberg.IcebergReadInstructions(
update_mode=manual_refresh_mode
)
auto_refresh_instructions_60s = iceberg.IcebergReadInstructions(
update_mode=auto_refresh_mode_60s
)
auto_refresh_instructions_30s = iceberg.IcebergReadInstructions(
update_mode=auto_refresh_mode_30s
)

The following example creates an IcebergReadInstructions object that tells a catalog adapter about the region, access information, and endpoint for reading Iceberg tables from S3 cloud storage:

from deephaven.experimental import iceberg
from deephaven.experimental import s3

s3_instructions = s3.S3Instructions(
region_name="us-east-1",
access_key_id="admin",
secret_access_key="password",
endpoint_override="http://minio:9000",
)

iceberg_instructions = iceberg.IcebergReadInstructions(
data_instructions=s3_instructions
)