Skip to main content

Parquet import via query

This guide whill show you how to read data from a Parquet file into a Deephaven table with both Python and Groovy, using the readTable method.

note

In this guide, we use files in locations relative to the base of the Docker container. See Docker data volumes to learn more about the relation between locations in the container and the local file system.

Standard Parquet Files#

The Deephaven Query Language makes importing and manipulating data easy and efficient. In this example, we will import a Parquet file into a new, in-memory Deephaven table.

First, you'll need a parquet file. One can be obtained from the Deephaven Examples repository. Follow the directions in the README to mount the data into your deephaven-core clone. For this guide, we'll use yellow taxi trip data publicly available on Microsoft Azure. In the Deephaven docker instance, the path to this data is:

/data/examples/taxi/parquet/taxi.parquet

The file taxi.parquet is a standard format parquet version 1 file with SNAPPY compression. We want to read the file into a new, in-memory Deephaven table.

from deephaven.ParquetTools import readTable
taxiTable = readTable("/data/examples/taxi/parquet/taxi.parquet")

You have successfully loaded the Parquet file as a Deephaven table, taxiTable. Now you can use the imported table like any other Deephaven table.

img

Related documentation#