Skip to main content
Version: Python

Deephaven's Python package

The Deephaven Python package is how users interact with the Deephaven engine. Deephaven's Python API offers various cool ways for users to manipulate their data. This guide gives a brief overview of just what you can do.

What can you do with deephaven?

In short: a lot. Here are a few cool things that only scratch the surface:

Create tables from scratch

Tables can be created from scratch in a variety of ways. The code below creates tables with empty_table, new_table, and time_table.

from deephaven.column import int_col, double_col, string_col
from deephaven import empty_table, new_table, time_table

t1_static = empty_table(50).update(["X = 0.1 * i", "Y = sin(X)", "Z = cos(X)"])
t2_static = new_table(
[
string_col(
"Strings",
["A", "Hello world!", "The quick brown fox jumps over the lazy dog."],
),
int_col("Integers", [5, 17, -33422]),
double_col("Doubles", [3.14, -1.1111, 99.999]),
]
)
t_ticking = time_table("PT1s").update("X = ii")

Ingest data from external sources

Ingest data into tables from external sources such as CSV, Apache Parquet, Kafka, and SQL:

# Consume data from CSV and Parquet
from deephaven.parquet import read as read_pq
from deephaven import read_csv

t_from_csv = read_csv(
"https://media.githubusercontent.com/media/deephaven/examples/main/Insurance/csv/insurance.csv"
)
t_from_parquet = read_pq("/data/examples/SensorData/parquet/SensorData_gzip.parquet")
# Consume data from Kafka
from deephaven.stream.kafka import consumer as kc
from deephaven import dtypes as dht

t = kc.consume(
{"bootstrap.servers": "redpanda:9092"},
"test.topic",
table_type=kc.TableType.append(),
key_spec=kc.KeyValueSpec.IGNORE,
value_spec=kc.simple_spec("Command", dht.string),
)
# Consume data from SQL
from deephaven.dbc import read_sql
import os

my_query = "SELECT t_ts as Timestamp, CAST(t_id AS text) as Id, " +
"CAST(t_instrument as text) as Instrument, " +
"t_exchange as Exchange, t_price as Price, t_size as Size " +
"FROM CRYPTO TRADES"

username = os.environ["POSTGRES_USERNAME"]
password = os.environ["POSTGRES_PASSWORD"]
url = os.environ["POSTGRES_URL"]
port = os.environ["POSTGRES_PORT"]

sql_uri = f"postgresql://{url}:{port}/postgres?user={username}&password={password}"

crypto_trades = read_sql(conn=sql_uri, query=my_query, driver="connectorx")

Export tables to Parquet, CSV, Kafka, and more

Table data can be exported to a wide variety of formats including Parquet, CSV, Kafka, Uniform Resource Identifiers (URIs), and many more. The following code block specifically writes a table to CSV and Parquet for later use.

from deephaven.parquet import write as write_pq
from deephaven import empty_table
from deephaven import write_csv

my_table = empty_table(100).update(["X = 0.1 * i", "Y = sin(X)", "Z = cos(X)"])

write_pq(my_table, "/data/my_table.parquet")
write_csv(my_table, "/data/my_table.csv")

Filter data

Data can be filtered out of tables based on conditions being met or by row numbers, such as those at the top or bottom of a table.

from deephaven import empty_table


def filter_func(y: float, z: str) -> bool:
if y < 2 or y > 8 and z == "B":
return True
else:
return False


t = empty_table(50).update(
["X = i", "Y = randomDouble(0.0, 10.0)", "Z = (X % 2 == 0) ? `A` : `B`"]
)

t_filtered = t.where(["Y > 5.0", "Z = `B`"])
t_function_filtered = t.where("filter_func(Y, Z)")
t_head = t.head(20)
t_tail = t.tail(5)

Calculate cumulative and rolling aggregations

Aggregations can be cumulative or windowed (by either time or rows).

from deephaven.updateby import rolling_avg_tick
from deephaven import empty_table
from deephaven import agg

t = empty_table(100).update(
[
"Sym = (i % 2 == 0) ? `A` : `B`",
"X = randomDouble(0.0, 10.0)",
"Y = randomDouble(50.0, 150.0)",
]
)

t_agg = t.agg_by(aggs=agg.avg(["AvgX = X", "AvgY = Y"]), by="Sym")
t_updateby = t.update_by(
ops=rolling_avg_tick(cols=["RollingAvgX = X", "RollingAvgY = Y"], rev_ticks=5),
by="Sym",
)

Join tables together

Deephaven offers a wide variety of ways to join tables together. Joins can be exact or relational or inexact or time-series. Even ticking tables can be joined with no extra work.

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT

t_left = new_table(
[
string_col(
"LastName", ["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers"]
),
int_col("DeptID", [31, 33, 33, 34, 34, NULL_INT]),
string_col(
"Telephone",
[
"(303) 555-0162",
"(303) 555-0149",
"(303) 555-0184",
"(303) 555-0125",
None,
None,
],
),
]
)

t_right = new_table(
[
int_col("DeptID", [31, 33, 34, 35]),
string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
string_col("DeptManager", ["Martinez", "Williams", "Garcia", "Lopez"]),
int_col("DeptGarage", [33, 52, 22, 45]),
]
)

t = t_left.natural_join(table=t_right, on=["DeptID"])

Convert to and from NumPy and Pandas

NumPy and Pandas are two of Python's most popular packages. They get used often in Deephaven Python queries. So, there are mechanisms available that make converting between tables and Python data structures a breeze.

from deephaven import pandas as dhpd
from deephaven import numpy as dhnp
from deephaven import empty_table
import pandas as pd
import numpy as np

t = empty_table(10).update("X = randomDouble(-10.0, 10.0)")
np_arr_from_table = dhnp.to_numpy(t)
df_from_table = dhpd.to_pandas(t)

print(np_arr_from_table)
print(df_from_table)

t_from_np = dhnp.to_table(np_arr_from_table, cols="X")
t_from_pd = dhpd.to_table(df_from_table)

What's in the deephaven package

The deephaven package is used to interact with tables and other Deephaven objects in queries. Each of the following subsections discusses a single submodule within the deephaven package. They are presented in alphabetical order.

agg

The agg submodule defines the combined aggregations that are usable in queries.

appmode

The appmode submodule supports writing Deephaven application mode Python scripts.

arrow

The arrow submodule supports conversions to and from PyArrow tables and Deephaven tables.

barrage

The barrage submodule provides functions for accessing resources on remote Deephaven servers.

calendar

The calendar submodule defines functions for working with business calendars.

column

The column submodule implements columns in tables. It is primarily used when creating new tables.

constants

The constants submodule defines the global constants, including Deephaven's special numerical values. Values include the maximum, minimum, NaN, null, and infinity values for different data types.

csv

The csv submodule supports reading an external CSV file into a Deephaven table and writing a table to a CSV file.

dbc

The dbc submodule enables users to connect to and execute queries on external databases in Deephaven.

dherror

The dherror submodule defines a custom exception for the Deephaven Python package. See the triage errors guide for more information.

dtypes

The dtypes submodule defines the data types supported by the Deephaven engine.

execution_context

The execution_context submodule gives users the ability to directly manage the Deephaven query execution context on threads.

experimental

The experimental submodule contains experimental features. Current experimental features include outer joins and AWS S3 support.

filters

The filters submodule implements various filters that can be used in filtering operations.

html

The html submodule supports exporting Deephaven data in HTML format.

jcompat

The jcompat submodule provides Java compatibility support and convenience functions to create Java data structures from corresponding Python ones.

learn

The learn submodule provides utilities for efficient data transfer between tables and Python objects. It serves as a framework for using machine learning libraries in Deephaven Python queries.

liveness_scope

The liveness_scope submodule gives a finer degree of control over when to clean up unreferenced nodes in the query update graph instead of solely relying on garbage collection.

numpy

The numpy submodule supports the conversion between Deephaven tables and NumPy arrays.

pandas

The pandas submodule supports the conversion between Deephaven tables and Pandas DataFrames.

pandasplugin

The pandasplugin submodule is used to display Pandas DataFrames as Deephaven tables in the UI.

parquet

The parquet submodule supports reading external Parquet files into Deephaven tables and writing Deephaven tables out as Parquet files.

perfmon

The perfmon submodule contains tools to analyze performance of the Deephaven system and queries.

plot

The plot submodule contains the framework for Deephaven's built-in plotting API.

plugin

The plugin submodule contains the framework for using plugins and bidirectional plugins in Deephaven.

query_library

The query_library submodule allows users to import Java classes or packages into the query library, which they can then use in queries.

replay

The replay submodule provides support for replaying historical data.

server

The server submodule contains the framework for running Python operations from within a JVM.

stream

The stream submodule contains Deephaven's Apache Kafka integration, Table Publisher, and a utility for converting between table types.

table

The table submodule implements the Deephaven table, partitioned table, and partitioned table proxy objects.

table_factory

The table_factory submodule provides various ways to create Deephaven tables, including empty tables, input tables, new tables, function-generated tables, time tables, ring tables, and DynamicTableWriter.

table_listener

The table_listener submodule implements Deephaven's table listeners.

time

The time submodule defines functions for handling Deephaven date-time data.

update_graph

The update_graph submodule provides access to the update graph's locks that must be acquired to perform certain table operations.

updateby

The updateby submodule supports building various cumulative and rolling aggregations to be used in the update_by table operation.

uri

The uri submodule implements tools for resolving Uniform Resource Identifiers (URIs) into objects.