write
The write
method will write a table to a standard Parquet file.
Syntax
write(
table: Table,
path: str,
col_definitions: list[Column] = None,
col_instructions: list[ColumnInstruction] = None,
compression_codec_name: str = None,
max_dictionary_keys: int = None,
target_page_size: int = None,
)
Parameters
Parameter | Type | Description |
---|---|---|
table | Table | The table to write to file. |
path | str | Path name of the file where the table will be stored. The file name should end with the |
col_definitions optional | list[Column] | The column definitions to use. The default is |
col_instructions optional | list[ColumnInstruction] | One or more optional |
compression_codec_name optional | str | The compression codec to use. Options are:
If not specified, defaults to |
max_dictionary_keys optional | int | The maximum number of unique dictionary keys the writer is allowed to add to a dictionary page before switching to non-dictionary encoding. If not specified, the default value is 2^20 (1,048,576). |
target_page_size optional | int | The target page size in bytes. If not specified, defaults to 2^20 bytes (1 MiB). |
Returns
A Parquet file located in the specified path.
Examples
All examples in this document write data to the /data
directory in Deephaven. For more information on this directory and how it relates to your local file system, see Docker data volumes.
Single Parquet file
In this example, write
writes the source table to /data/output.parquet
.
from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.parquet import write
source = new_table(
[
string_col("X", ["A", "B", "B", "C", "B", "A", "B", "B", "C"]),
int_col("Y", [2, 4, 2, 1, 2, 3, 4, 2, 3]),
int_col("Z", [55, 76, 20, 4, 230, 50, 73, 137, 214]),
]
)
write(source, "/data/output.parquet")
- source
Compression codec
In this example, write
writes the source table /data/output_GZIP.parquet
with GZIP
compression.
from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.parquet import write
source = new_table(
[
string_col("X", ["A", "B", "B", "C", "B", "A", "B", "B", "C"]),
int_col("Y", [2, 4, 2, 1, 2, 3, 4, 2, 3]),
int_col("Z", [55, 76, 20, 4, 230, 50, 73, 137, 214]),
]
)
write(source, "/data/output_GZIP.parquet", compression_codec_name="GZIP")
- source