Release notes for Deephaven Core version 0.38

Deephaven Community Core version 0.38.0 is out now! It's been a few months since the last Community release, so there's a lot to cover. Let's get into it!

New features

Enhanced natural join behavior

You can now specify behavior when performing a natural_join on two tables. Available choices are:

ERROR_ON_DUPLICATE: This is the default behavior of the operation if type is not given.
FIRST_MATCH: Equivalent to running a first_by prior to the join.
LAST_MATCH: Equivalent to running a last_by prior to the join.
EXACTLY_ONE_MATCH: Equivalent to running an exact_join operation.

The following code shows all four of these options in action:

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT
from deephaven.table import NaturalJoinType

source_left = new_table(
    [
        string_col("LastName", ["Rafferty", "Jones", "Steiner", "Robins", "Smith"]),
        int_col("DeptID", [31, 33, 33, 34, 34]),
        string_col(
            "Telephone",
            [
                "(303) 555-0162",
                "(303) 555-0149",
                "(303) 555-0184",
                "(303) 555-0125",
                "",
            ],
        ),
    ]
)

source_right = new_table(
    [
        int_col("DeptID", [31, 33, 34, 35]),
        string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
        string_col(
            "DeptTelephone",
            ["(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"],
        ),
    ]
)

# Default
result_error_on_duplicates = source_left.natural_join(table=source_right, on=["DeptID"], type=NaturalJoinType.ERROR_ON_DUPLICATE)
# First match
result_first_match = source_left.natural_join(table=source_right, on=["DeptID"], type=NaturalJoinType.FIRST_MATCH)
# Last match
result_last_match = source_left.natural_join(table=source_right, on=["DeptID"], type=NaturalJoinType.LAST_MATCH)
# Exactly one match
result_exact_match = source_left.natural_join(table=source_right, on=["DeptID"], type=NaturalJoinType.EXACTLY_ONE_MATCH)

The `count_where` aggregation

Both Python and Groovy APIs now support the count_where/countWhere aggregation. This operation counts and aggregates the number of instances where one or more conditions are true. The filter conditions can be either conjunctive (AND) or disjunctive (OR). Like with other aggregations, the calculations can be bucketed by one or more grouping columns. Here's an example in Python:

from deephaven import empty_table
from deephaven.agg import count_where
from deephaven.filters import or_

source = empty_table(100).update(["X = i", "Y = randomDouble(0, 1)", "Z = i % 3", "String = i % 2 == 0 ? `even` : `odd`"])

# An aggregation with no keys (grouping columns). The filters here are conjunctive (AND).
result_zerokeys = source.agg_by(aggs=count_where(col="count", filters=["X < 42", "Y >= 0.58"]))

# An aggregation with two grouping columns. The filters here are disjunctive (OR).
result_twokeys = source.agg_by(aggs=count_where(col="count", filters="X >= 29 || Y < 0.7"), by=["Z", "String"])

This feature is available for update_by in both cumulative and rolling contexts as well:

from deephaven import empty_table
from deephaven.updateby import cum_count_where, rolling_count_where_tick

source = empty_table(100).update(["Key=randomInt(0,5)", "IntCol=randomInt(0,100)"])

# zero-key
result_zerokeys = source.update_by([
    cum_count_where(col="CumulativeCountOver50", filters="IntCol > 50"),
    rolling_count_where_tick(rev_ticks=50, col="RollingCountOver50", filters="IntCol > 50"),
    ])

# bucketed
result_onekey = source.update_by([
    cum_count_where(col="CumulativeCountOver50", filters="IntCol > 50"),
    rolling_count_where_tick(rev_ticks=50, col="RollingCountOver50", filters="IntCol > 50"),
    ], by="Key")

Additional features

JS API

A custom gRPC transport layer for JS API consumers.
JS API support for creating and consuming shared tickets.

Python

A new systemic_obj_tracker Python module that allows users to enable/disable systemic object marking.
Partitioned table support in the Python Table Data Service.
An is_failed property on tables in Python to check if a table has failed and is no longer usable.

Parquet and Iceberg

Custom resolution of Parquet file columns into Deephaven table columns based on arbitrary criteria.
Support for reading Parquet data and Iceberg tables from URIs that use the s3a and s3n schemas.

The engine

The ability to disable data index usage in the engine.

Improvements

Improvements include both general enhancements and bug fixes. Here are some of the most notable ones.

General improvements

Iceberg

Deephaven now verifies supported data types before writing to Iceberg, failing early if it encounters any unsupported types.
The Iceberg writing API has been simplified, making S3-specific instructions optional by defaulting to settings from the catalog.
Sort order for Iceberg tables is now properly recognized and handled.

JS API

Median is now available to the JS API as an aggregation option.

The engine

Better Data Index performance.

Bug fixes

The bug fixes in the subsections below are not comprehensive but cover the most significant issues resolved in this release.

Iceberg and Parquet

Iceberg tables partitioned by date can now properly be read from.
Deephaven no longer resolves credentials on every S3 read if no credentials were given.

Python

The Python gRPC client no longer fails calls that had half-closed successfully.
The Python Table Data Service now properly handles Optional parameters in callback signatures.
Deephaven now properly handles Python's shape typing in UDF parsing.

UI

Plots on aggregated tables will now tick properly in lock-step with the source table.
An error should no longer occur when switching between UI tabs with Deephaven Express plots.
Charts via both Deephaven Express and the built-in plotting API should no longer fail on certain table types.
Null boolean cells now accurately portray the underlying data in the UI.

Integrations

The Flight SQL server now properly acquires the shared lock, enabling all table operations against refreshing tables.

General

Left and full outer joins now work properly when the right hand table is initially empty.
Minimum and Maximum update_by operations now return column types that match the input data.
Performing a count_where on a rollup table will no longer improperly perform a regular count.
Rollup tables will no longer print DEPENDENCY_RELEASED errors to the console when groupings are added.
update_by will no longer incorrectly order resultant columns.
Snapshotting a sorted rollup table will now produce the correct results.

Reach out

Our Slack community continues to grow! Join us there for updates and help with your queries.

The first release of 2025

New features

Enhanced natural join behavior

The count_where aggregation

Additional features

JS API

Python

Parquet and Iceberg

The engine

Improvements

General improvements

Iceberg

JS API

The engine

Bug fixes

Iceberg and Parquet

Python

UI

Integrations

General

Reach out

The `count_where` aggregation