Deephaven and PyArrow
This guide covers the intersection of Deephaven and PyArrow. PyArrow is a Python library for Apache Arrow, which is a columnar memory format similar to Deephaven's table format. Deephaven's Arrow integration provides the ability to do two things:
- Convert between Deephaven and Arrow tables.
- Read Arrow feather files into Deephaven tables.
deephaven.arrow
note
Converting between Deephaven and PyArrow tables copies all the objects into memory. Be cautious when converting large datasets.
The deephaven.arrow
submodule provides only three functions:
Read a feather file into a Deephaven table as iris
:
from deephaven import arrow as dharrow
iris = dharrow.read_feather("/data/examples/Iris/feather/Iris.feather")
- iris
Then, convert iris
to a PyArrow table. Once with all columns, then with only the Class
column.
pa_iris = dharrow.to_arrow(iris)
pa_class_only = dharrow.to_arrow(iris, cols=["Class"])
Finally, convert pa_iris
back to a Deephaven table. The first copies all columns; the second copies all but the Class
column.
iris_from_pa = dharrow.to_table(pa_iris)
iris_no_class = dharrow.to_table(
pa_iris, cols=["SepalLengthCM", "SepalWidthCM", "PetalLengthCM", "PetalWidthCM"]
)
- iris_from_pa
- iris_no_class