TableParquetWriterOptions
The TableParquetWriterOptions
class provides specialized instructions for configuring your IcebergTableWriter
instances.
Syntax
TableParquetWriterOptions(
table_definition: Union[TableDefinition, Mapping[str, DType], Iterable[ColumnDefinition], JType],
schema_provider: Optional[SchemaProvider] = None,
field_id_to_column_name: Optional[Dict[int, str]] = None,
compression_codec_name: Optional[str] = None,
maximum_dictionary_keys: Optional[int] = None,
maximum_dictionary_size: Optional[int] = None,
target_page_size: Optional[int] = None,
)
Parameters
Parameter | Type | Description |
---|---|---|
table_definition | Union[TableDefinition, Mapping[str, DType], Iterable[ColumnDefinition], JType] | The table definition to use when writing Iceberg data files. The definition can be used to skip some columns or add additional columns with null values. The provided definition should have at least one column. |
schema_provider | Optional[SchemaProvider] | Used to extract a schema from an Iceberg table. The schema will be used in conjunction with the |
field_id_to_column_name | Optional[Dict[int, str]] | A one-to-one mapping of Iceberg field IDs from the schema specification to Deephaven column names from the table definition. Defaults to |
compression_codec_name | Optional[str] | The compression codec to use for writing the Parquet file. Allowed values include |
maximum_dictionary_keys | Optional[int] | The maximum number of unique keys the Parquet writer should add to a dictionary page before switching to non-dictionary encoding. Never used for non-string columns. Defaults to |
maximum_dictionary_size | Optional[int] | The maximum number of bytes the Parquet writer should add to the dictionary before switching to non-dictionary encoding. Never used for non-string columns. Defaults to |
target_page_size | Optional[int] | The target Parquet file page size in bytes. Defaults to |
Methods
None.
Constructors
A TableParquetWriterOptions
is constructed directly from the class.
Examples
The following example creates a TableParquetWriterOptions
object that can be used to write Deephaven tables to an Iceberg table:
from deephaven.experimental import iceberg
from deephaven.experimental import s3
from deephaven import empty_table
source = empty_table(10).update(["X = i", "Y = 0.1 * X", "Z = pow(Y, 2)"])
source_def = source.definition
s3_instructions = s3.S3Instructions(
region_name="us-east-1",
endpoint_override="http://minio:9000",
credentials=s3.Credentials.basic("admin", "password"),
)
writer_options = iceberg.TableParquetWriterOptions(
table_definition=source_def, data_instructions=s3_instructions
)