Skip to main content
Version: Python

Deephaven and PyArrow

This guide covers the intersection of Deephaven and PyArrow. PyArrow is a Python library for Apache Arrow, which is a columnar memory format similar to Deephaven's table format. Deephaven's Arrow integration provides the ability to do two things:

  • Convert between Deephaven and Arrow tables.
  • Read Arrow feather files into Deephaven tables.

deephaven.arrow

note

Converting between Deephaven and PyArrow tables copies all the objects into memory. Be cautious when converting large datasets.

The deephaven.arrow submodule provides only three functions:

Read a feather file into a Deephaven table as iris:

from deephaven import arrow as dharrow

iris = dharrow.read_feather("/data/examples/Iris/feather/Iris.feather")

Then, convert iris to a PyArrow table. Once with all columns, then with only the Class column.

pa_iris = dharrow.to_arrow(iris)
pa_class_only = dharrow.to_arrow(iris, cols=["Class"])

Finally, convert pa_iris back to a Deephaven table. The first copies all columns; the second copies all but the Class column.

iris_from_pa = dharrow.to_table(pa_iris)
iris_no_class = dharrow.to_table(
pa_iris, cols=["SepalLengthCM", "SepalWidthCM", "PetalLengthCM", "PetalWidthCM"]
)