TableParquetWriterOptions

The TableParquetWriterOptions class provides specialized instructions for configuring your IcebergTableWriter instances.

Syntax

TableParquetWriterOptions(
    table_definition: Union[TableDefinition, Mapping[str, DType], Iterable[ColumnDefinition], JType],
    schema_provider: Optional[SchemaProvider] = None,
    field_id_to_column_name: Optional[Dict[int, str]] = None,
    compression_codec_name: Optional[str] = None,
    maximum_dictionary_keys: Optional[int] = None,
    maximum_dictionary_size: Optional[int] = None,
    target_page_size: Optional[int] = None,
)

Parameters

ParameterTypeDescription
table_definitionUnion[TableDefinition, Mapping[str, DType], Iterable[ColumnDefinition], JType]

The table definition to use when writing Iceberg data files. The definition can be used to skip some columns or add additional columns with null values. The provided definition should have at least one column.

schema_providerOptional[SchemaProvider]

Used to extract a schema from an Iceberg table. The schema will be used in conjunction with the field_id_to_column_name to map Deephaven column from the table definition to Iceberg columns. You can specify how to extract the schema in multiple ways (by ID, snapshot ID, initial schema, etc.). Defaults to None, which uses the current schema of the table.

field_id_to_column_nameOptional[Dict[int, str]]

A one-to-one mapping of Iceberg field IDs from the schema specification to Deephaven column names from the table definition. Defaults to None, which means map Iceberg columns to Deephaven columns using column names.

compression_codec_nameOptional[str]

The compression codec to use for writing the Parquet file. Allowed values include UNCOMPRESSED, SNAPPY, GZIP, LZO, LZ4, LZ4_RAW, and ZSTD. Defaults to None, which uses SNAPPY.

maximum_dictionary_keysOptional[int]

The maximum number of unique keys the Parquet writer should add to a dictionary page before switching to non-dictionary encoding. Never used for non-string columns. Defaults to None, which uses 2^20 (1,048,576).

maximum_dictionary_sizeOptional[int]

The maximum number of bytes the Parquet writer should add to the dictionary before switching to non-dictionary encoding. Never used for non-string columns. Defaults to None, which uses 2^20 (1,048,576).

target_page_sizeOptional[int]

The target Parquet file page size in bytes. Defaults to None, which uses 2^20 bytes (1 MB).

Methods

None.

Constructors

A TableParquetWriterOptions is constructed directly from the class.

Examples

The following example creates a TableParquetWriterOptions object that can be used to write Deephaven tables to an Iceberg table:

from deephaven.experimental import iceberg
from deephaven.experimental import s3
from deephaven import empty_table

source = empty_table(10).update(["X = i", "Y = 0.1 * X", "Z = pow(Y, 2)"])

source_def = source.definition

s3_instructions = s3.S3Instructions(
    region_name="us-east-1",
    endpoint_override="http://minio:9000",
    credentials=s3.Credentials.basic("admin", "password"),
)

writer_options = iceberg.TableParquetWriterOptions(
    table_definition=source_def, data_instructions=s3_instructions
)