Deephaven's Python package

The Deephaven Python package is how users interact with the Deephaven engine. Deephaven's Python API offers various cool ways for users to manipulate their data. This guide gives a brief overview of just what you can do.

What can you do with `deephaven`?

In short: a lot. Here are a few cool things that only scratch the surface:

Create tables from scratch

Tables can be created from scratch in a variety of ways. The code below creates tables with empty_table, new_table, and time_table.

from deephaven.column import int_col, double_col, string_col
from deephaven import empty_table, new_table, time_table

t1_static = empty_table(50).update(["X = 0.1 * i", "Y = sin(X)", "Z = cos(X)"])
t2_static = new_table(
    [
        string_col(
            "Strings",
            ["A", "Hello world!", "The quick brown fox jumps over the lazy dog."],
        ),
        int_col("Integers", [5, 17, -33422]),
        double_col("Doubles", [3.14, -1.1111, 99.999]),
    ]
)
t_ticking = time_table("PT1s").update("X = ii")

Ingest data from external sources

Ingest data into tables from external sources such as CSV, Apache Parquet, Kafka, and SQL:

# Consume data from CSV and Parquet
from deephaven.parquet import read as read_pq
from deephaven import read_csv

t_from_csv = read_csv(
    "https://media.githubusercontent.com/media/deephaven/examples/main/Insurance/csv/insurance.csv"
)
t_from_parquet = read_pq("/data/examples/SensorData/parquet/SensorData_gzip.parquet")

# Consume data from Kafka
from deephaven.stream.kafka import consumer as kc
from deephaven import dtypes as dht

t = kc.consume(
    {"bootstrap.servers": "redpanda:9092"},
    "test.topic",
    table_type=kc.TableType.append(),
    key_spec=kc.KeyValueSpec.IGNORE,
    value_spec=kc.simple_spec("Command", dht.string),
)

# Consume data from SQL
from deephaven.dbc import read_sql
import os

my_query = "SELECT t_ts as Timestamp, CAST(t_id AS text) as Id, " +
    "CAST(t_instrument as text) as Instrument, " +
    "t_exchange as Exchange, t_price as Price, t_size as Size " +
    "FROM CRYPTO TRADES"

username = os.environ["POSTGRES_USERNAME"]
password = os.environ["POSTGRES_PASSWORD"]
url = os.environ["POSTGRES_URL"]
port = os.environ["POSTGRES_PORT"]

sql_uri = f"postgresql://{url}:{port}/postgres?user={username}&password={password}"

crypto_trades = read_sql(conn=sql_uri, query=my_query, driver="connectorx")

Export tables to Parquet, CSV, Kafka, and more

Table data can be exported to a wide variety of formats including Parquet, CSV, Kafka, Uniform Resource Identifiers (URIs), and many more. The following code block specifically writes a table to CSV and Parquet for later use.

from deephaven.parquet import write as write_pq
from deephaven import empty_table
from deephaven import write_csv

my_table = empty_table(100).update(["X = 0.1 * i", "Y = sin(X)", "Z = cos(X)"])

write_pq(my_table, "/data/my_table.parquet")
write_csv(my_table, "/data/my_table.csv")

Filter data

Data can be filtered out of tables based on conditions being met or by row numbers, such as those at the top or bottom of a table.

from deephaven import empty_table


def filter_func(y: float, z: str) -> bool:
    if y < 2 or y > 8 and z == "B":
        return True
    else:
        return False


t = empty_table(50).update(
    ["X = i", "Y = randomDouble(0.0, 10.0)", "Z = (X % 2 == 0) ? `A` : `B`"]
)

t_filtered = t.where(["Y > 5.0", "Z = `B`"])
t_function_filtered = t.where("filter_func(Y, Z)")
t_head = t.head(20)
t_tail = t.tail(5)

Calculate cumulative and rolling aggregations

Aggregations can be cumulative or windowed (by either time or rows).

from deephaven.updateby import rolling_avg_tick
from deephaven import empty_table
from deephaven import agg

t = empty_table(100).update(
    [
        "Sym = (i % 2 == 0) ? `A` : `B`",
        "X = randomDouble(0.0, 10.0)",
        "Y = randomDouble(50.0, 150.0)",
    ]
)

t_agg = t.agg_by(aggs=agg.avg(["AvgX = X", "AvgY = Y"]), by="Sym")
t_updateby = t.update_by(
    ops=rolling_avg_tick(cols=["RollingAvgX = X", "RollingAvgY = Y"], rev_ticks=5),
    by="Sym",
)

Join tables together

Deephaven offers a wide variety of ways to join tables together. Joins can be exact or relational or inexact or time-series. Even ticking tables can be joined with no extra work.

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT

t_left = new_table(
    [
        string_col(
            "LastName", ["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers"]
        ),
        int_col("DeptID", [31, 33, 33, 34, 34, NULL_INT]),
        string_col(
            "Telephone",
            [
                "(303) 555-0162",
                "(303) 555-0149",
                "(303) 555-0184",
                "(303) 555-0125",
                None,
                None,
            ],
        ),
    ]
)

t_right = new_table(
    [
        int_col("DeptID", [31, 33, 34, 35]),
        string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
        string_col("DeptManager", ["Martinez", "Williams", "Garcia", "Lopez"]),
        int_col("DeptGarage", [33, 52, 22, 45]),
    ]
)

t = t_left.natural_join(table=t_right, on=["DeptID"])

Convert to and from NumPy and Pandas

NumPy and Pandas are two of Python's most popular packages. They get used often in Deephaven Python queries. So, there are mechanisms available that make converting between tables and Python data structures a breeze.

from deephaven import pandas as dhpd
from deephaven import numpy as dhnp
from deephaven import empty_table
import pandas as pd
import numpy as np

t = empty_table(10).update("X = randomDouble(-10.0, 10.0)")
np_arr_from_table = dhnp.to_numpy(t)
df_from_table = dhpd.to_pandas(t)

print(np_arr_from_table)
print(df_from_table)

t_from_np = dhnp.to_table(np_arr_from_table, cols="X")
t_from_pd = dhpd.to_table(df_from_table)

What's in the `deephaven` package

The deephaven package is used to interact with tables and other Deephaven objects in queries. Each of the following subsections discusses a single submodule within the deephaven package. They are presented in alphabetical order.

`agg`

The agg submodule defines the combined aggregations that are usable in queries.

`appmode`

The appmode submodule supports writing Deephaven application mode Python scripts.

`arrow`

The arrow submodule supports conversions to and from PyArrow tables and Deephaven tables.

`barrage`

The barrage submodule provides functions for accessing resources on remote Deephaven servers.

`calendar`

The calendar submodule defines functions for working with business calendars.

`column`

The column submodule implements columns in tables. It is primarily used when creating new tables.

`constants`

The constants submodule defines the global constants, including Deephaven's special numerical values. Values include the maximum, minimum, NaN, null, and infinity values for different data types.

`csv`

The csv submodule supports reading an external CSV file into a Deephaven table and writing a table to a CSV file.

`dbc`

The dbc submodule enables users to connect to and execute queries on external databases in Deephaven.

`dherror`

The dherror submodule defines a custom exception for the Deephaven Python package. See the triage errors guide for more information.

`dtypes`

The dtypes submodule defines the data types supported by the Deephaven engine.

`execution_context`

The execution_context submodule gives users the ability to directly manage the Deephaven query execution context on threads.

`experimental`

The experimental submodule contains experimental features. Current experimental features include outer joins and AWS S3 support.

`filters`

The filters submodule implements various filters that can be used in filtering operations.

`html`

The html submodule supports exporting Deephaven data in HTML format.

`jcompat`

The jcompat submodule provides Java compatibility support and convenience functions to create Java data structures from corresponding Python ones.

`learn`

The learn submodule provides utilities for efficient data transfer between tables and Python objects. It serves as a framework for using machine learning libraries in Deephaven Python queries.

`liveness_scope`

The liveness_scope submodule gives a finer degree of control over when to clean up unreferenced nodes in the query update graph instead of solely relying on garbage collection.

`numpy`

The numpy submodule supports the conversion between Deephaven tables and NumPy arrays.

`pandas`

The pandas submodule supports the conversion between Deephaven tables and Pandas DataFrames.

`pandasplugin`

The pandasplugin submodule is used to display Pandas DataFrames as Deephaven tables in the UI.

`parquet`

The parquet submodule supports reading external Parquet files into Deephaven tables and writing Deephaven tables out as Parquet files.

`perfmon`

The perfmon submodule contains tools to analyze performance of the Deephaven system and queries.

`plot`

The plot submodule contains the framework for Deephaven's built-in plotting API.

`plugin`

The plugin submodule contains the framework for using plugins and creating your own plugins in Deephaven.

`query_library`

The query_library submodule allows users to import Java classes or packages into the query library, which they can then use in queries.

`replay`

The replay submodule provides support for replaying historical data.

`server`

The server submodule contains the framework for running Python operations from within a JVM.

`stream`

The stream submodule contains Deephaven's Apache Kafka integration, Table Publisher, and a utility for converting between table types.

`table`

The table submodule implements the Deephaven table, partitioned table, and partitioned table proxy objects.

`table_factory`

The table_factory submodule provides various ways to create Deephaven tables, including empty tables, input tables, new tables, function-generated tables, time tables, ring tables, and DynamicTableWriter.

`table_listener`

The table_listener submodule implements Deephaven's table listeners.

`time`

The time submodule defines functions for handling Deephaven date-time data.

`update_graph`

The update_graph submodule provides access to the update graph's locks that must be acquired to perform certain table operations.

`updateby`

The updateby submodule supports building various cumulative and rolling aggregations to be used in the update_by table operation.

`uri`

The uri submodule implements tools for resolving Uniform Resource Identifiers (URIs) into objects.

Pydoc

Deephaven's Python package

What can you do with deephaven?

Create tables from scratch

Ingest data from external sources

Export tables to Parquet, CSV, Kafka, and more

Filter data

Calculate cumulative and rolling aggregations

Join tables together

Convert to and from NumPy and Pandas

What's in the deephaven package

agg

appmode

arrow

barrage

calendar

column

constants

csv

dbc

dherror

dtypes

execution_context

experimental

filters

html

jcompat

learn

liveness_scope

numpy

pandas

pandasplugin

parquet

perfmon

plot

plugin

query_library

replay

server

stream

table

table_factory

table_listener

time

update_graph

updateby

uri

Related documentation