Skip to main content

Release notes for Deephaven version 0.34

· 10 min read
AI prompt: Friendly python slithering over several sheets of paper

The wait is over, and Deephaven Community Core version 0.34.0 is out. This is a big release with significant enhancements and new features. Are you ready to explore the latest updates? Let's dive in and discover what's new!

New features

Command line interface for pip-installed Deephaven

Do you run Deephaven from Python without Docker? If so, chances are it's because:

  • You don't like Docker.
  • You want to keep everything in Python.
  • You like the Jupyter experience.

Well, we have good news. It just got even easier to start Deephaven from Python with the introduction of a new command line interface.

If you pip install Deephaven 0.34.0 or later via pip install deephaven-server, you can start a Deephaven server with a single deephaven command:

# Start a server on port 8080 with a random PSK
deephaven server
# Start a server on port 10000 with a random PSK
deephaven server --port 10000
# Start a server on port 9999 with 12GB of heap
deephaven server --port 9999 --jvm-args="-Xmx12g"
# Start a server with `/tmp/deephaven` as the data directory
deephaven server --jvm-args="-Ddeephaven.dataDir=/tmp/deephaven"
# Start a server with the console disabled
deephaven server --jvm-args="-Ddeephaven.console.disable=true"
# Get help with the Deephaven command
deephaven --help
# Get help with the Deephaven server command
deephaven server --help

For more information about installing Deephaven with pip, see the pip install guide.

Use the Python client to ingest tables from remote servers

Have you ever wanted to use the Deephaven Python client from within a server? Now you can! Client tables can be made available for use in server-side queries. By running a Python client on a server, you can create and ingest tables from remote servers and use them in your own queries. This is exciting because:

  • Distributing workloads across multiple Deephaven servers just got a lot easier.
  • It leverages gRPC to support large and complex queries.

To demonstrate this new feature, consider the following configuration:

  • You have a Deephaven server up and running with Python locally on port 10000. It has pydeephaven installed.
  • You have another server running locally on port 9999 with anonymous authentication. You want to create a table on this other instance, and subscribe to it on the Deephaven server at port 10000.

From the Deephaven server running on port 10000, run:

from deephaven.barrage import barrage_session
from pydeephaven.session import SharedTicket
from pydeephaven import Session

# Create a client session connected to the server running on port 9999
client_session = Session(port=9999)
# Create a table on that server with the client session
client_table = client_session.time_table("PT1s").update(["X = 0.1 * i", "Y = sin(X)"])
# Create a ticket through which `client_table` can be published
client_ticket = SharedTicket.random_ticket()
# Publish the ticket (table)
client_session.publish_table(client_ticket, client_table)

# Create a barrage session that listens to the server on port 9999
my_barrage_session = barrage_session(port=9999)
# Subscribe to the client table ticking data
local_table = my_barrage_session.subscribe(client_ticket.bytes)
# Perform operations on this now-local table
new_local_table = local_table.last_by()

img

Parquet

Read partitioned Parquet datasets from AWS S3

In the 0.33 release, we added support to read single Parquet files from AWS S3. That has now been expanded to read partitioned datasets from S3. The best part? It's just as easy to do! Check out this code, which reads a folder of publicly available Parquet data from an S3 bucket directly into Deephaven as a table:

from deephaven import parquet
from deephaven.experimental import s3
from datetime import timedelta

ookla_performance = parquet.read(
"s3://ookla-open-data/parquet/performance/type=mobile/year=2023",
special_instructions=s3.S3Instructions(
region_name="us-east-1",
anonymous_access=True,
read_ahead_count=8,
fragment_size=65536,
read_timeout=timedelta(seconds=10),
),
).coalesce()

Write partitioned Parquet files and metadata files

You can now not only read these files into Deephaven tables, but also write them from Deephaven tables. The following code block writes a partitioned table to a partitioned Parquet dataset.

from deephaven.parquet import write_partitioned
from deephaven import empty_table
import os

t = empty_table(10).update(["X = (i % 2 == 0) ? `A` : `B`", "Y = i"])
pt = t.partition_by("X")

write_partitioned(pt, "/data/PartitionedParquet")

print(os.listdir("/data/PartitionedParquet"))

Data indexing for tables

A DataIndex allows users to improve speed of data access operations in a table. It applies to one or more indexed key columns. It's now available in the Python API through deephaven.experimental.data_index module. Sort, join, aggregation, and filter operations all benefit from this new feature.

Keep an eye out for additions to our documentation on this topic soon!

Built-in query library functions for array columns

Two new built-in query library functions, diff and rank have been added that can be used on array columns. They:

  • Compute differences between values in an array.
  • Rank values in an array.

The code block below uses these two on a table with an array column.

from deephaven import empty_table

t = (
empty_table(10)
.update(["X = randomInt(0, 10)"])
.group_by()
.update(["DiffX = diff(1, X)", "RankX = rank(X)"])
)

Improvements

Batch formula compilation

A core port of the Deephaven engine has been reworked to be significantly more performant when batching formulas together. For instance, the following code runs over 10x faster in 0.34.0 than it does in 0.33.3.

from deephaven import empty_table

formulas = [""] * 1000
values = [0] * 1000
for idx in range(1000):
values[idx] = idx * 1024
formulas[idx] = f"C{idx} = (long)values[{idx}]"

t = empty_table(1).update(formulas)

If you create tables with a lot of columns created from formulas, you'll see a noticeable difference.

Improved error messages

Error messages in Deephaven now contain the query string that caused them, which makes them more searchable and easier to understand.

Server-side APIs have been able to create blink input tables for some time, so it's about time the Python client caught up.

The C++ client now works on Windows

The Deephaven C++ client previously only worked on Linux distributions, but that is no longer the case! It can now be built on both Windows 10 and Windows 11. For full instructions on building on Windows, see here.

Null to NaN conversions for NumPy arrays

Users whose Deephaven queries leverage NumPy have likely converted a table with null values to a NumPy array, and found the null values to be frustrating to deal with. Deephaven now offers a helper function to convert those to NumPy NaN values, which are much easier to handle from Python.

note

Before version 0.34.0, this conversion was done automatically for user-defined functions. That is no longer the case. For more information on this breaking change, see engine handling of type hints.

from deephaven.jcompat import dh_null_to_nan
from deephaven.numpy import to_numpy
from deephaven import empty_table

t = empty_table(10).update("X = (i % 2 == 0) ? 0.1 * i : NULL_DOUBLE")

np_t = dh_null_to_nan(to_numpy(t).squeeze())
print(np_t)

Simple date formatting in Python

A helper function, simple_date_format, has been added to the Python API. It makes date parsing easier in Python if your date-time data isn't in an ISO-8601 format:

from deephaven import new_table
from deephaven.column import string_col
from deephaven.time import simple_date_format

source = new_table(
[
string_col(
"Timestamp",
["20230101-12:30:01 CST", "20230101-12:30:02 CST", "20230101-12:30:03 CST"],
)
]
)

input_format = simple_date_format("yyyyMMdd-HH:mm:ss z")

result = source.update("NewTimestamp = input_format.parse(Timestamp).toInstant()")

source_meta = source.meta_table
result_meta = result.meta_table

Time of day methods properly handle daylight savings

New time of day methods have been added to the Deephaven Query Language. These new methods take an additional boolean value depending on the desired behavior with respect to daylight savings time. See secondOfDay as an example.

danger

The older ___ofDay methods are considered deprecated moving forward and will be removed within the next few releases.

Time binning methods now accept durations

Popular time binning methods upperBin and lowerBin now accept Java Durations as bin size and offset. These methods work all the same as the ones that take an integer number of nanoseconds:

from deephaven import empty_table

t = empty_table(10).update(
[
"Timestamp = '2024-05-01T09:00:00 ET' + i * MINUTE",
"LowerBin2Min = lowerBin(Timestamp, 2 * MINUTE)",
"UpperBin3Min = upperBin(Timestamp, 'PT3m')",
]
)

Ticking Python client on PyPi

pydeephaven-ticking, the Python client API that works with ticking data, is now available on PyPi!

Breaking changes

Engine handling of type hints

The NumPy null to NaN conversion is no longer done automatically for user-defined functions that use NumPy arrays in their type hints. Users now must perform this conversion if their data contains null values where NaN is correct, for instance when the data type in the array is np.double.

Additionally, the data types specified in type hints are checked against those used in the corresponding query string to ensure compatibility. If they are not, an error is thrown with information about any incompatibility. This ensures safe usage of type hints in functions called in query strings.

These breaking changes result user-defined functions being significantly more performant when called in query strings.

Other breaking changes

  • Parquet read/write Java APIs that used File objects have been replaced by ones that use strings instead.

  • The internal public class io.deephaven.engine.table.impl.locations.local.KeyValuePartitionLayout has been renamed to io.deephaven.engine.table.impl.locations.local.FileKeyValuePartitionLayout.

Bug fixes

  • Blink tables previously could not use the special variable row indices i, ii, and k. This has been fixed, and these can all now be used in blink tables.

  • In certain rare cases, the ungroup operation produced a null pointer exception where it shouldn't have. This has been fixed.

  • FilterComparison now works with string literals:

FilterComparison.geq(ColumnName.of("ColumnName"), Literal.of("A"))
  • A bug has been fixed that could cause searching for a value with the UI to not work as expected on sorted columns.

  • Arrays of Instant strings are now properly handled by deephaven.dtypes.instant_array.

  • Fixed a bug where duplicate headers could be passed between the ticking Python client and server, resulting in an error.

  • move_columns could previously erroneously remove columns from tables. This has been fixed, and the method now inserts new columns at the specified locations without losing existing data.

  • deephaven.pandas.to_pandas now supports the numpy_nullable option when the Pandas version is > 1.5.0.

Time zone conversions in Python now handle more cases. Previously, it was possible for time zone conversions from Python types to Java ZoneId types to throw errors when the conversion should have worked. This has been fixed.

  • A bug in deephaven.learn was found that could cause null pointer exceptions; it has been fixed.

Other notable changes

Required Java version

If you run the embedded Deephaven server, it will raise an error on startup if your Java version is below a threshold:

The error will notify you that the outdated version is the cause.

Docker Compose v2

Deephaven officially supports Docker Compose v2 by default. All of the pre-built Docker Compose files we publish have been updated to use V2.

Reach out

Our Slack community continues to grow! Join us there for updates and help with your queries.