Skip to main content

Release notes for Deephaven Core version 0.36

· 6 min read
AI prompt: A robot with headphones watching several monitors of table data, digital art 38617
Multi-table listeners, native table iteration in Python, and more

Deephaven Community Core version 0.36.0 is available now, with several new features, improvements, bug fixes, and more. We've rounded up the highlights below.

New features

Native table iteration in Python

Four new table operations are now available that allow you to iterate over table data in Python efficiently. They are:

The first two iterate over the table one row at a time, while the latter iterate over chunks of rows. All four methods use efficient chunked operations on the backend and return generators to minimize data copies and memory usage, making them ideal for large tables. Take a look at how they're used below:

from deephaven import empty_table

source = empty_table(4096).update(["I=i", "D=(double)i", "S=String.valueOf(i)"])

n_rows_dict = 0
n_rows_tuple = 0
n_chunks_dict = 0
n_chunks_tuple = 0

# One row at a time
## As a dictionary
for row_dict in source.iter_dict():
n_rows_dict += 1
## As a tuple, and only iterate over one row
for row_tuple in source.iter_tuple("D"):
n_rows_tuple += 1
# Chunked
## As a dictionary
for chunk_dict in source.iter_chunk_dict():
# Default chunk size is 2048 rows - this table has two chunks
n_chunks_dict += 1
## As a tuple, and change the default chunk size
for chunk_tuple in source.iter_chunk_tuple(chunk_size=1024):
# This table now has four chunks
n_chunks_tuple += 1

print(f"Rows: {n_rows_dict}, {n_rows_tuple}, Chunks: {n_chunks_dict}, {n_chunks_tuple}")

Multi-table merged listeners

Prior to 0.36.0, you could only listen to a single at a time with a table listener. If you wanted to listen to multiple tables, you had two options: use multiple listeners or combine the tables. Merged listeners now allow you to listen to an arbitrary number of tables, giving you added, modified, and removed rows from each one of them on every update cycle. Here's how you can listen to multiple tables at once:

from deephaven.table_listener import merged_listen
from deephaven import time_table

t1 = time_table("PT2s").update("RowNum = i")
t2 = time_table("PT3s").update("X = randomDouble(0, 10)")
t3 = time_table("PT5s").update("Y = randomBool()")

def listener_function(updates, is_replay):
# Tables are keys and added, modified, and/or removed rows are values
# Not every table has data every update cycle - make sure it does
if tu1 := updates[t1]:
added = tu1.added()
row = added["RowNum"].item()
print(f"t1: {row}")
if tu2 := updates[t2]:
added = tu2.added()
x = added["X"].item()
print(f"t2: {x}")
if tu3 := updates[t3]:
added = tu3.added()
y = added["Y"].item()
print(f"t3: {y}")

handle = merged_listen([t1, t2, t3], listener_function)

img

Table definitions in Python

Want to export a table definition from Python? Now, tables have a definition attribute that returns a JSON table definition:

from deephaven.table import TableDefinition
from deephaven import empty_table

source = empty_table(10).update(["X = i", "Y = randomDouble(5, 10)"])

print(source.definition)

Compare tables more easily

There's a new method that makes comparing tables easier. Use it to find differences in tables, such as columns, size, and more. Here's how it's used:

from deephaven.table import table_diff
from deephaven import empty_table

t1 = empty_table(10).update(["X = i", "Y = randomDouble(0, 10)"])
t2 = empty_table(3).update(["Z = randomBool()", "M = `This is a string!`"])

print(table_diff(t1, t2, max_diffs=1))
print(table_diff(t1, t2, max_diffs=5))

Parquet and S3

Two new features have been added to Deephaven's Parquet integration:

  • It now supports reading Parquet files from S3 that include metadata files.
  • It now supports writing Parquet files to S3.

pip-installed Deephaven CLI

In release 0.34, a command line interface was added for pip-installed Deephaven. This would always automatically open a browser window. Now, the boolean config flags --no-browser and --browser have been added to control this behavior. The default behavior is still the same.

Improvements

Iceberg

  • Deephaven can now get a table definition for an Iceberg table without having to read the table first.
  • Iceberg tables with invalid Deephaven column names will automatically be renamed to follow Deephaven conventions when consumed into tables.
  • Iceberg snapshot tables produce Timestamp columns of Instant data type.

Performance

  • Improved performance and memory use of naturalJoin in incremental cases where there are no responsive rows in either table.
  • Increased parallelism in partition-aware source tables, as well as an option to assume partitions are non-empty.
  • Parallel table snapshots, which can improve performance particularly in cases when reading tables with many columns from S3.

Dependencies

  • Upgraded to jedi autocomplete 0.19.1. See the jedi changelog here.

Client APIs

  • The Java client now has a gRPC user agent, which includes relevant version information by default.

Bug fixes

Server-side APIs: Python

  • Liveness scopes can now manage table listeners in Python.
  • Errors raised by table listeners in Python now properly notify any applications used by the server.

Server-side APIs: General

  • Sorting dictionary-encoded string columns with null values will now work as expected.
  • URI path conversions now work correctly on Windows.
  • Floating point comparisons are now consistent with floating point hash code standards.
  • Java and Python wheel artifacts now have the same dependencies.
  • Reading from Parquet with a millis- or micros-since-epoch timestamp column no longer fails with a null pointer exception.

Client APIs

  • A bug in the Go and JS client authentication that could erroneously require entering login information twice has been fixed.

Parquet

  • Parquet files with missing dictionary page offsets are now read correctly.
  • Deephaven's Parquet reader now correctly handles dictionary-encoded strings in Parquet files.

Kafka

  • Deephaven's Kafka JSON specification now correctly propagates null values for integer fields.

Breaking changes

Reach out

Our Slack community continues to grow! Join us there for updates and help with your queries.