Skip to main content
Version: Python

ColumnInstruction

A ColumnInstruction specifies the instructions for reading or writing a Parquet column.

Syntax

ColumnInstruction(
column_name=None,
parquet_column_name=None,
codec_name=None,
codec_args=None,
use_dictionary=False,
) = ColumnInstruction

Parameters

ParameterTypeDescription
column_name optionalstr

The name of the column to apply these instructions to.

parquet_column_name optionalstr

The name of the column in the resulting Parquet file.

codec_name optionalstr

The compression codec to use.

Options are:

  • SNAPPY: (default) Aims for high speed and a reasonable amount of compression. Based on Google's Snappy compression format.
  • UNCOMPRESSED: The output will not be compressed.
  • LZ4_RAW: A codec based on the LZ4 block format. Should always be used instead of LZ4.
  • LZ4: Deprecated Compression codec loosely based on the LZ4 compression algorithm, but with an additional undocumented framing scheme. The framing is part of the original Hadoop compression library and was historically copied first in parquet-mr, then emulated with mixed results by parquet-cpp. Note that LZ4 is deprecated; use LZ4_RAW instead.
  • LZO: Compression codec based on or interoperable with the LZO compression library.
  • GZIP: Compression codec based on the GZIP format (not the closely-related "zlib" or "deflate" formats) defined by RFC 1952.
  • ZSTD: Compression codec with the highest compression ratio based on the Zstandard format defined by RFC 8478.
codec_args optionalstr

An implementation-specific string used to map types to/from bytes. Typically used in cases where there is no obvious language-agnostic representation in Parquet. Default is None.

use_dictionary optionalbool

Whether or not to use dictionary-based encoding for string columns.

Returns

A ColumnInstruction object that will give Deephaven instructions for handling a particular column.

Examples

In this example, we create a ColumnInstruction that can be passed into read or write.

from deephaven.parquet import ColumnInstruction

instruction = ColumnInstruction(
column_name="X",
parquet_column_name="X",
codec_name="GZIP",
codec_args=None,
use_dictionary=False,
)