Skip to main content

readTable

The readTable method will read a single Parquet file, metadata file, or directory with a recognized layout into an in-memory table.

Syntax

readTable(inputFile)
readTable(inputFile, parquetInstructions)

Parameters

ParameterTypeDescription
inputFileString

The file to load into a table. The file should exist and end with the .parquet extension.

parquetInstructions optionalString

Optional instructions for customizations while reading. Valid values are:

  • LZ4: Compression codec loosely based on the LZ4 compression algorithm, but with an additional undocumented framing scheme. The framing is part of the original Hadoop compression library and was historically copied first in parquet-mr, then emulated with mixed results by parquet-cpp.
  • LZO: Compression codec based on or interoperable with the LZO compression library.
  • GZIP: Compression codec based on the GZIP format (not the closely-related "zlib" or "deflate" formats) defined by RFC 1952.
  • ZSTD: Compression codec with the highest compression ratio based on the Zstandard format defined by RFC 8478.
  • LEGACY: Load any binary fields as strings. Helpful to load files written in older versions of Parquet that lacked a distinction between binary and string.

Returns

A new in-memory table from a Parquet file, metadata file, or directory with a recognized layout.

Examples

note

All examples in this document use data mounted in /data in Deephaven. For more information on the relation between this location in Deephaven and on your local file system, see Docker data volumes.

Single Parquet file

note

For the following examples, the example data found in Deephaven's example repository will be used. Follow the instructions in Launch Deephaven from pre-built images to download and manage the example data.

In this example, readTable is used to load the file /data/examples/Taxi/parquet/taxi.parquet into a Deephaven table.

from deephaven.ParquetTools import readTable

source = readTable("/data/examples/Taxi/parquet/taxi.parquet")

Compression codec

In this example, readTable is used to load the file /data/output_GZIP.parquet, with GZIP compression, into a Deephaven table.

caution

This file needs to exist for this example to work. To generate this file, see writeTable.

from deephaven.ParquetTools import readTable, writeTable
from deephaven.TableTools import newTable, intCol, stringCol

source = newTable(
stringCol("X", "A", "B", "B", "C", "B", "A", "B", "B", "C"),
intCol("Y",2, 4, 2, 1, 2, 3, 4, 2, 3),
intCol("Z", 55, 76, 20, 4, 230, 50, 73, 137, 214),
)

writeTable(source, "/data/output_GZIP.parquet", "GZIP")

source = readTable("/data/output_GZIP.parquet", "GZIP")

Partitioned datasets

_metadata and/or _common_metadata files are occasionally present in partitioned datasets. These files can be used to load Parquet data sets more quickly. These files are specific to only certain frameworks and are not required to read the data into a Deephaven table.

  • _common_metadata: File containing schema information needed to load the whole dataset faster.
  • _metadata: File containing (1) complete relative pathnames to individual data files, and (2) column statistics, such as min, max, etc., for the individual data files.
warning

For a directory of Parquet files, all sub-directories are also searched. Only files with a .parquet extension or _common_metadata and _metadata files should be located in these directories. All files ending with .parquet need the same schema.

note

The following examples use data in Deephaven's example repository. Follow the instructions in Launch Deephaven from pre-built images to download and manage the example data.

In this example, readTable is used to load the directory /data/examples/Pems/parquet/pems into a Deephaven table.

from deephaven.ParquetTools import readTable

source = readTable("/data/examples/Pems/parquet/pems")

img