write

The write method will write a table to a standard Parquet file.

Syntax

write(
    table: Table,
    path: str,
    col_definitions: list[Column] = None,
    col_instructions: list[ColumnInstruction] = None,
    compression_codec_name: str = None,
    max_dictionary_keys: int = None,
    target_page_size: int = None,
    )

Parameters

Parameter	Type	Description
table	Table	The table to write to file.
path	str	Path name of the file where the table will be stored. The file name should end with the `.parquet` extension. If the path includes non-existing directories, they are created.
col_definitions optional	list[Column]	The column definitions to use. The default is `None`.
col_instructions optional	list[ColumnInstruction]	One or more optional `ColumnInstruction` objects that contain instructions for how to write particular columns in the table.
compression_codec_name optional	str	The compression codec to use. Options are: `SNAPPY`: Aims for high speed, and a reasonable amount of compression. Based on Google's Snappy compression format. `UNCOMPRESSED`: The output will not be compressed. `LZ4_RAW`: A codec based on the LZ4 block format. Should always be used instead of `LZ4`. `LZ4`: Deprecated Compression codec loosely based on the LZ4 compression algorithm, but with an additional undocumented framing scheme. The framing is part of the original Hadoop compression library and was historically copied first in parquet-mr, then emulated with mixed results by parquet-cpp. Note that `LZ4` is deprecated; use `LZ4_RAW` instead. `LZO`: Compression codec based on or interoperable with the LZO compression library. `GZIP`: Compression codec based on the GZIP format (not the closely-related "zlib" or "deflate" formats) defined by RFC 1952. `ZSTD`: Compression codec with the highest compression ratio based on the Zstandard format defined by RFC 8478. If not specified, defaults to `SNAPPY`.
max_dictionary_keys optional	int	The maximum number of unique dictionary keys the writer is allowed to add to a dictionary page before switching to non-dictionary encoding. If not specified, the default value is 2^20 (1,048,576).
target_page_size optional	int	The target page size in bytes. If not specified, defaults to 2^20 bytes (1 MiB).

Returns

A Parquet file located in the specified path.

Examples

Note

All examples in this document write data to the /data directory in Deephaven. For more information on this directory and how it relates to your local file system, see Docker data volumes.

Single Parquet file

In this example, write writes the source table to /data/output.parquet.

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.parquet import write

source = new_table(
    [
        string_col("X", ["A", "B", "B", "C", "B", "A", "B", "B", "C"]),
        int_col("Y", [2, 4, 2, 1, 2, 3, 4, 2, 3]),
        int_col("Z", [55, 76, 20, 4, 230, 50, 73, 137, 214]),
    ]
)

write(source, "/data/output.parquet")

Compression codec

In this example, write writes the source table /data/output_GZIP.parquet with GZIP compression.

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.parquet import write

source = new_table(
    [
        string_col("X", ["A", "B", "B", "C", "B", "A", "B", "B", "C"]),
        int_col("Y", [2, 4, 2, 1, 2, 3, 4, 2, 3]),
        int_col("Z", [55, 76, 20, 4, 230, 50, 73, 137, 214]),
    ]
)

write(source, "/data/output_GZIP.parquet", compression_codec_name="GZIP")