Use pandas in Deephaven queries
This guide covers the intersection of Pandas and Deephaven in queries. Pandas is a popular Python library for data analysis and manipulation that centers around DataFrames, similar to how Deephaven centers around tables. Deephaven's Pandas integration is used to convert between tables and DataFrames.
deephaven.pandas
Converting between Deephaven tables and Pandas DataFrames copies the entire objects into memory. Be cautious when converting large datasets.
The deephaven.pandas
Python module provides only two functions:
to_pandas
: Converts a Deephaven table to a Pandas DataFrame.to_table
: Converts a Pandas DataFrame to a Deephaven table.
The following example creates a table, then converts it to a DataFrame and back.
from deephaven import column as dhcol
from deephaven import pandas as dhpd
from deephaven import new_table
source = new_table(
[
dhcol.string_col("Strings", ["Hello", "ABC", "123"]),
dhcol.int_col("Ints", [1, 100, 10000]),
dhcol.bool_col("Booleans", [True, False, True]),
dhcol.double_col("Doubles", [-3.14, 2.73, 9999.999]),
dhcol.char_col("Chars", "qrs"),
]
)
df = dhpd.to_pandas(source)
result = dhpd.to_table(df)
- source
- result
The resultant data types in the Pandas DataFrame and table are correct:
result_meta = result.meta_table
print(df.dtypes)
- result_meta
- Log
to_pandas
and to_table
can convert only a subset of available columns:
df_onecol = dhpd.to_pandas(source, cols=["Strings", "Doubles", "Chars"])
result_onecol = dhpd.to_table(df, cols=["Booleans", "Doubles"])
- result_onecol
to_pandas
, by default, converts a table to a Pandas DataFrame that is backed by NumPy arrays with no nullable dtypes. Instead, NumPy nullable and PyArrow backends can be used.
df_numpy_nullable = dhpd.to_pandas(source, dtype_backend="numpy_nullable")
df_pyarrow = dhpd.to_pandas(source, dtype_backend="pyarrow")
If dtype_backend
is None
, conv_null
can be set to False
, which will not convert null values to pandas.NA
.
df_no_na = dhpd.to_pandas(source, dtype_backend=None, conv_null=False)
conv_null=False
will result in an error if dtype_backend
is not None
.
Pandas DataFrames sometimes contain generic Object
columns that don't directly translate to Deephaven column types. For instance, the following DataFrame has an Object
column. By default, Deephaven calls pandas.DataFrame.convert_dtypes()
prior to conversion. This can be turned off by setting infer_objects=False
.
from deephaven.pandas import to_table
import pandas as pd
df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2.1, 3], "C": [1, pd.NA, 3]})
result_infer = to_table(df)
result_no_infer = to_table(df, infer_objects=False)
infer_meta = result_infer.meta_table
no_infer_meta = result_no_infer.meta_table
- infer_meta
- no_infer_meta
- result_infer
- result_no_infer
- df