deephaven.experimental.iceberg¶
This module adds Iceberg table support into Deephaven.
- class IcebergCatalogAdapter(j_object)[source]¶
Bases:
JObjectWrapper
This class provides an interface for interacting with Iceberg catalogs. It allows listing namespaces, tables and snapshots, as well as reading Iceberg tables into Deephaven tables.
- j_object_type¶
alias of
IcebergCatalogAdapter
- namespaces(namespace=None)[source]¶
Returns information on the namespaces in the catalog as a Deephaven table. If a namespace is specified, the tables in that namespace are listed; otherwise the top-level namespaces are listed.
- Parameters:
namespace (Optional[str]) – the higher-level namespace from which to list namespaces; if omitted, the top-level namespaces are listed.
- Return type:
- Returns:
a table containing the namespaces.
- read_table(table_identifier, instructions=None, snapshot_id=None)[source]¶
Reads the table from the catalog using the provided instructions. Optionally, a snapshot id can be provided to read a specific snapshot of the table.
- Parameters:
table_identifier (str) – the table to read.
instructions (Optional[IcebergInstructions]) – the instructions for reading the table. These instructions can include column renames, table definition, and specific data instructions for reading the data files from the provider. If omitted, the table will be read with default instructions.
snapshot_id (Optional[int]) – the snapshot id to read; if omitted the most recent snapshot will be selected.
- Returns:
the table read from the catalog.
- Return type:
- class IcebergInstructions(table_definition=None, data_instructions=None, column_renames=None)[source]¶
Bases:
JObjectWrapper
This class specifies the instructions for reading an Iceberg table into Deephaven. These include column rename instructions and table definitions, as well as special data instructions for loading data files from the cloud.
Initializes the instructions using the provided parameters.
- Parameters:
table_definition (Optional[TableDefinitionLike]) – the table definition; if omitted, the definition is inferred from the Iceberg schema. Setting a definition guarantees the returned table will have that definition. This is useful for specifying a subset of the Iceberg schema columns.
data_instructions (Optional[s3.S3Instructions]) – Special instructions for reading data files, useful when reading files from a non-local file system, like S3.
column_renames (Optional[Dict[str, str]]) – A dictionary of old to new column names that will be renamed in the output table.
- Raises:
DHError – If unable to build the instructions object.
- j_object_type¶
alias of
IcebergInstructions
- adapter_aws_glue(catalog_uri, warehouse_location, name=None)[source]¶
Create a catalog adapter using an AWS Glue catalog.
- Parameters:
catalog_uri (str) – the URI of the REST catalog.
warehouse_location (str) – the location of the warehouse.
name (Optional[str]) – a descriptive name of the catalog; if omitted the catalog name is inferred from the catalog URI.
- Returns:
the catalog adapter for the provided AWS Glue catalog.
- Return type:
- Raises:
DHError – If unable to build the catalog adapter.
- adapter_s3_rest(catalog_uri, warehouse_location, name=None, region_name=None, access_key_id=None, secret_access_key=None, end_point_override=None)[source]¶
Create a catalog adapter using an S3-compatible provider and a REST catalog.
- Parameters:
catalog_uri (str) – the URI of the REST catalog.
warehouse_location (str) – the location of the warehouse.
name (Optional[str]) – a descriptive name of the catalog; if omitted the catalog name is inferred from the catalog URI.
region_name (Optional[str]) – the S3 region name to use; If not provided, the default region will be picked by the AWS SDK from ‘aws.region’ system property, “AWS_REGION” environment variable, the {user.home}/.aws/credentials or {user.home}/.aws/config files, or from EC2 metadata service, if running in EC2.
access_key_id (Optional[str]) – the access key for reading files. Both access key and secret access key must be provided to use static credentials, else default credentials will be used.
secret_access_key (Optional[str]) – the secret access key for reading files. Both access key and secret key must be provided to use static credentials, else default credentials will be used.
end_point_override (Optional[str]) – the S3 endpoint to connect to. Callers connecting to AWS do not typically need to set this; it is most useful when connecting to non-AWS, S3-compatible APIs.
- Returns:
the catalog adapter for the provided S3 REST catalog.
- Return type:
- Raises:
DHError – If unable to build the catalog adapter.