write

The write method will write a table to a standard Parquet file.

Syntax

Parameters

ParameterTypeDescription
tableTable

The table to write to file.

pathstr

Path name of the file where the table will be stored. The file name should end with the .parquet extension. If the path includes non-existing directories, they are created.

table_definition optionalTableDefinitionLike

The table definition to use for writing, instead of the definitions implied by the table. This definition can be used to skip some columns or add additional columns with null values.

col_instructions optionallist[ColumnInstruction]

One or more optional ColumnInstruction objects that contain instructions for how to write particular columns in the table.

compression_codec_name optionalstr

The compression codec to use.

Options are:

  • SNAPPY: Aims for high speed, and a reasonable amount of compression. Based on Google's Snappy compression format.
  • UNCOMPRESSED: The output will not be compressed.
  • LZ4_RAW: A codec based on the LZ4 block format. Should always be used instead of LZ4.
  • LZ4: Deprecated Use LZ4_RAW instead.
  • LZO: Compression codec based on or interoperable with the LZO compression library.
  • GZIP: Compression codec based on the GZIP format (not the closely-related "zlib" or "deflate" formats) defined by RFC 1952.
  • ZSTD: Compression codec with a high compression ratio based on the Zstandard format defined by RFC 8478.
  • BROTLI: Compression codec based on Brotli, offering high compression ratios.

If not specified, defaults to SNAPPY.

max_dictionary_keys optionalint

The maximum number of unique dictionary keys the writer is allowed to add to a dictionary page before switching to non-dictionary encoding. If not specified, the default value is 2^20 (1,048,576).

max_dictionary_size optionalint

The maximum number of bytes the writer should add to the dictionary before switching to non-dictionary encoding. If not specified, the default value is 2^20 (1,048,576).

target_page_size optionalint

The target page size in bytes. If not specified, defaults to 2^20 bytes (1 MiB).

generate_metadata_files optionalbool

Whether to generate Parquet _metadata and _common_metadata files. These files can help speed up reading of partitioned Parquet data. Defaults to False.

index_columns optionalSequence[Sequence[str]]

Sequence of sequences containing the column names for indexes to persist. The write operation will store the index info for the provided columns as sidecar tables. For example, if the input is [["Col1"], ["Col1", "Col2"]], the write operation will store the index info for ["Col1"] and for ["Col1", "Col2"]. By default, data indexes to write are determined by those present on the source table.

row_group_info optionalRowGroupInfo

The Row Group configuration for writing. Available options are:

  • RowGroupInfo.single_group(): All data is within a single Row Group. This is the default.
  • RowGroupInfo.max_rows(max_rows): Splits into a number of Row Groups, each of which has no more than the requested number of rows.
  • RowGroupInfo.max_groups(num_row_groups): Splits evenly into a pre-defined number of Row Groups.
  • RowGroupInfo.by_groups(groups, max_rows): Splits each unique group into a Row Group. If max_rows is set, Row Groups exceeding that size are split further.
special_instructions optionals3.S3Instructions

Special instructions for writing Parquet files to S3 or other remote storage. See S3Instructions.

Returns

A Parquet file located in the specified path.

Examples

Note

All examples in this document write data to the /data directory in Deephaven. For more information on this directory and how it relates to your local file system, see Docker data volumes.

Single Parquet file

In this example, write writes the source table to /data/output.parquet.

Compression codec

In this example, write writes the source table /data/output_GZIP.parquet with GZIP compression.