Skip to main content

Release Notes for Deephaven Version 0.24.0

· 3 min read
Stable Diffusion prompt: many bits of paper falling in a red room, 3d render highly detailed studio lighting Seed-8077135 Steps-50 Guidance-7

The next release will include an alpha feature that allows users to address Deephaven’s real-time Streaming Tables with SQL, automatically inheriting updates and changes via familiar declarative patterns. While the team lays that groundwork, version 0.24 delivers some exciting new table methods, improvements to the Python client, and performance-related enhancements.

Full release notes are found on GitHub.

range_join

Many users have asked for range_join table operations (sometimes preferring the term window_join). Though versatile in many ways, this is often used to join records from a right table within a particular range of time associated with an event in a left table – “market trade events between my order send time and my order fill time” for a stock trader, or “website activity during the time range user-X was on the site” for a marketing-tech analyst.

A full description of range_join is found in the PyDoc. The script below provides an artificial, demonstrative example.

from deephaven import empty_table
import random

#Create tables to work with
left_table = empty_table(100)\
.update(["Row_Num = ii", "Start_Time = now()", "End_Time = Start_Time + 'PT00:00:00.500' * Row_Num"])

right_table = empty_table(1000)\
.update(["Row_Num = ii", "Event_Time = now() + 'PT00:00:00.100' * Row_Num", "Event_Measure = (int)random.randint(0,100)"])

#Define the aggregation to address the set of joined data
#Today only 'group' works. Other aggs to come soon.
from deephaven.agg import group
aggs = [
group(cols=["Grouped_Events=Event_Measure"]),
]

rj_example = left_table.range_join(table=right_table,on="Start_Time < Event_Time < End_Time", aggs=aggs)

#You can do aggregations on the grouped results via vector manipulations
rj_with_aggs = rj_example.where("!isNull(Grouped_Events)")\
.update(["Joined_Row_Count = len(Grouped_Events)",\
"Last_Joined_Event = last(Grouped_Events)",\
"Joined_Sum = sum(Grouped_Events)"])

More update_by operations

As highlighted in recent release blogs, Deephaven introduced update_by() as an operation on which to deliver cumulative, rolling, and window-based operations. In 0.24.0, the team has added the following operators to the update_by universe:

The PyDocs detail dozens of available update_by operators.

Other enhancements

Pandas 2.0

Deephaven has upgraded its pandas integration to support Python pandas 2.0. Of the list of upgrades inherent in the 2.0 library, our users will appreciate that pandas can now return PyArrow-backed tables. Formerly, NumPy arrays were the only option. Given the nice integrations between Arrow and Deephaven, this is an empowering inherited upgrade.

Vector iteration for better performance

When accessing vectors in UDFs embedded in queries, users benefit from the engine’s use of vector iteration. Historically the implementation relied on direct access.

Parallelized where

Users now automatically inherit multi-threading in their filtering. Even when using Python, the core engine will parallelize the application of the where operation across the table. This also applies to real-time tables inheriting updates. The execution of user scripts and applications inherits this multi-threading automatically.

Looking forward to Release 0.25

Contributors are working on a slew of enhancements for the next release. Alpha SQL features and a beta-version R client lead a pack of exciting developments. Stay tuned.

We look forward to interacting with you via Deephaven’s Slack or GitHub Discussions.