write_partitioned

The write_partitioned method writes a table to disk in Parquet format with partitioning columns written as key=value directories. For example, for a partitioning column Date, this creates a directory structure like Date=2021-01-01/<base_name>.parquet, Date=2021-01-02/<base_name>.parquet, etc.

Syntax

Parameters

ParameterTypeDescription
tableUnion[Table, PartitionedTable]

The source table or partitioned table to write.

destination_dirstr

The path or URI to the destination root directory in which the partitioned Parquet data is stored. Non-existing directories in the provided path are created.

table_definition optionalTableDefinitionLike

The table definition to use for writing, instead of the definitions implied by the table. Default is None, which uses the column definitions implied by the table. This definition can skip some columns or add additional columns with null values.

col_instructions optionallist[ColumnInstruction]

One or more optional ColumnInstruction objects that contain instructions for writing particular columns in the table.

compression_codec_name optionalstr

The compression codec to use. See the write parameters for available options. If not specified, defaults to SNAPPY.

max_dictionary_keys optionalint

The maximum number of unique dictionary keys the writer is allowed to add to a dictionary page before switching to non-dictionary encoding. If not specified, defaults to 2^20 (1,048,576).

max_dictionary_size optionalint

The maximum number of bytes the writer adds to the dictionary before switching to non-dictionary encoding. Only evaluated for String columns. If not specified, defaults to 2^20 (1,048,576).

target_page_size optionalint

The target page size in bytes. If not specified, defaults to 2^20 bytes (1 MiB).

base_name optionalstr

The base name for the individual partitioned files. If not specified, defaults to {uuid}, so files have names of the format <uuid>.parquet. The following tokens are available:

  • {uuid} — Replaced with a random UUID. For example, table-{uuid} results in table-8e8ab6b2-62f2-40d1-8191-1c5b70c5f330.parquet.
  • {partitions} — Replaced with an underscore-delimited, concatenated string of partition values. For example, for {partitions}-table with columns PC1 and PC2, the result is PC1=pc1_PC2=pc2-table.parquet.
  • {i} — Replaced with an auto-incremented integer for files in a directory. For example, table-{i} results in PC=partition1/table-0.parquet, PC=partition1/table-1.parquet, etc.
generate_metadata_files optionalbool

Whether to generate Parquet _metadata and _common_metadata files. Defaults to False. Generating these files speeds up reading partitioned data because they contain metadata (including schema) about the entire dataset.

index_columns optionalSequence[Sequence[str]]

Sequence of sequences containing the column names for indexes to persist. The write operation stores the index info as sidecar tables. For example, [["Col1"], ["Col1", "Col2"]] stores indexes for ["Col1"] and ["Col1", "Col2"]. By default, indexes to write are determined by those present on the source table.

row_group_info optionalRowGroupInfo

Requested RowGroup instructions, as returned by a call to RowGroupInfo.

special_instructions optionals3.S3Instructions

Special instructions for writing Parquet files to a non-local file system, like S3. Default is None.

Returns

None. Writes partitioned Parquet files to the specified directory.

Examples

Note

All examples in this document write data to the /data directory in Deephaven. For more information on this directory and how it relates to your local file system, see Docker data volumes.

Write a partitioned table

In this example, write_partitioned writes a partitioned table with the X column as the partitioning key:

This creates:

Write with custom base name

In this example, write_partitioned uses a custom base name with an incrementing index:

This creates files like X=A/data-0.parquet, X=B/data-0.parquet, etc.

Write with metadata files

Generate metadata files to speed up reading:

This creates the partitioned files plus _metadata and _common_metadata files in the root directory.