Skip to main content
Version: Python

write

The write method will write a table to a standard Parquet file.

Syntax

write(
table: Table,
path: str,
col_definitions: list[Column] = None,
col_instructions: list[ColumnInstruction] = None,
compression_codec_name: str = None,
max_dictionary_keys: int = None,
target_page_size: int = None,
)

Parameters

ParameterTypeDescription
tableTable

The table to write to file.

pathstr

Path name of the file where the table will be stored. The file name should end with the .parquet extension. If the path includes non-existing directories, they are created.

col_definitions optionallist[Column]

The column definitions to use. The default is None.

col_instructions optionallist[ColumnInstruction]

One or more optional ColumnInstruction objects that contain instructions for how to write particular columns in the table.

compression_codec_name optionalstr

The compression codec to use.

Options are:

  • SNAPPY: Aims for high speed, and a reasonable amount of compression. Based on Google's Snappy compression format.
  • UNCOMPRESSED: The output will not be compressed.
  • LZ4_RAW: A codec based on the LZ4 block format. Should always be used instead of LZ4.
  • LZ4: Deprecated Compression codec loosely based on the LZ4 compression algorithm, but with an additional undocumented framing scheme. The framing is part of the original Hadoop compression library and was historically copied first in parquet-mr, then emulated with mixed results by parquet-cpp. Note that LZ4 is deprecated; use LZ4_RAW instead.
  • LZO: Compression codec based on or interoperable with the LZO compression library.
  • GZIP: Compression codec based on the GZIP format (not the closely-related "zlib" or "deflate" formats) defined by RFC 1952.
  • ZSTD: Compression codec with the highest compression ratio based on the Zstandard format defined by RFC 8478.

If not specified, defaults to SNAPPY.

max_dictionary_keys optionalint

The maximum number of unique dictionary keys the writer is allowed to add to a dictionary page before switching to non-dictionary encoding. If not specified, the default value is 2^20 (1,048,576).

target_page_size optionalint

The target page size in bytes. If not specified, defaults to 2^20 bytes (1 MiB).

Returns

A Parquet file located in the specified path.

Examples

note

All examples in this document write data to the /data directory in Deephaven. For more information on this directory and how it relates to your local file system, see Docker data volumes.

Single Parquet file

In this example, write writes the source table to /data/output.parquet.

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.parquet import write

source = new_table([
string_col("X", ["A", "B", "B", "C", "B", "A", "B", "B", "C"]),
int_col("Y", [2, 4, 2, 1, 2, 3, 4, 2, 3]),
int_col("Z", [55, 76, 20, 4, 230, 50, 73, 137, 214]),
])

write(source, "/data/output.parquet")

Compression codec

In this example, write writes the source table /data/output_GZIP.parquet with GZIP compression.

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.parquet import write

source = new_table([
string_col("X", ["A", "B", "B", "C", "B", "A", "B", "B", "C"]),
int_col("Y", [2, 4, 2, 1, 2, 3, 4, 2, 3]),
int_col("Z", [55, 76, 20, 4, 230, 50, 73, 137, 214]),
])

write(source, "/data/output_GZIP.parquet", compression_codec_name="GZIP")