Deephaven Community Core version 0.40.0 is now available. Let's see what's new!
Release highlights
- Predicate pushdown filtering enables you to filter data closer to its source (Parquet in particular), minimizing the amount of data being processed, thereby improving performance.
- A keyed transpose operation now allows you to implement aggregations and transpose a resultant table, giving new perspective and insights into your data.
- Array and vector columns can now be sorted and are handled by the search bar.
- The ungroup operation is now shift-aware.
New features
Keyed transpose
There's a new table operation, the keyed transpose, which takes a source table and a set of aggregations and produces a new table where the specified row_by_cols
are the keys for the aggregation, and the value col_by_cols
are used for the column names.
Consider a table of mood data for given days:
from deephaven import empty_table
from deephaven import agg
from random import choice
def random_date() -> str:
return choice([
"2025-09-01",
"2025-09-02",
"2025-09-03",
"2025-09-04",
"2025-09-05",
])
def random_mood() -> str:
return choice(["Happy", "Sad", "Angry", "Surprised", "Jubilant"])
source = empty_table(20).update([
"Date = parseLocalDate(random_date())",
"Mood = random_mood()",
])
Say you want to count the number of times each mood occurs per day. You could do this with any of Deephaven's aggregations. But you want a different view of the results, where each row corresponds to a different day and each column corresponds to a different mood. The following keyed transpose does just that:
from deephaven.table import keyed_transpose
result = keyed_transpose(
source, [agg.count_("Count")], row_by_cols=["Date"], col_by_cols=["Mood"]
)
Predicate pushdown filtering
Predicate pushdown is a powerful feature that filters data closer to its source, reducing the amount of data that needs to be processed and improving performance.
Tables composed of regions and/or sub-tables can get a performance boost on stateless filters by pushing down filters to individual regions. This means that filters applied to certain tables, such as merged tables, can be pushed down and executed on individual constituent tables, thereby reducing the amount of data that needs to be processed.
Filters are prioritized for pushdown in the following order:
- Filtering single-value column sources (columns that contain only a single value).
- Range and match filtering of Parquet data columns via row group metadata.
- Filtering columns with a data index.
Filter ordering and barriers
A new filter barrier API has been added that allows you to annotate filters to produce and respect barriers. There are two key annotations:
withBarrier(name)
: Declares a named barrier produced by a filter.withRespectsBarrier(name)
: Declares that a filter respects a named barrier.
This API ensures that the relative order of barrier production and consumption is preserved, which can be important for maintaining the integrity of related tables.
Additionally, a serial filter mechanism has been added such that certain filters can be explicitly told to execute serially (one at a time) rather than concurrently. This provides the foundation for barrier semantics and makes it explicit which filters need deterministic serial order.
Some filter operations require strict ordering (e.g. stateful logic and auth-aware transforms). These new features allow you to declare named barriers on filters and mark other filters as respecting those barriers. When present, the engine will not reorder a filter that respects a barrier across it. This builds on the serial filter API so that the user has greater control over filter ordering for these types of operations.
Server-side APIs
General
- Deephaven has migrated from
java.util.TimeZone
tojava.time.ZoneId
on the backend. - A
LeaderTableFilter
class, which allows you to coordinate multiple related tables by using one leader to filter one or more follower tables. - Input tables now have an
isValue
attribute. - Custom Java
Comparator
instances can now be used when sorting tables on Object columns. - Deephaven tables now have
assertAppendOnly
,assertAddOnly
andassertBlink
methods.
Data I/O
- Newly added support for writing data in multiple Parquet row groups in a single operation.
- Iceberg catalog adapters are now closeable.
- The S3 integration can now create custom AWS client factories for Iceberg, enabling asynchronous HTTP client properties for S3.
- Added support to read and write list data types from and to Iceberg.
- Added a write timeout to the S3 integration.
Client APIs
JavaScript
- A new set of JS API methods have been added to the Deephaven Table object to request data from the server.
C#
- The
SharableDict
implementation has been simplified.
Improvements
General
- The
ungroup
table operation has been completely rewritten, improving performance and reliability around edge cases. - Deephaven's quick filter backend can now process Vector and Array column types.
- Safety checks for file handles have been added such that they both ensure that the handle points to the same physical file as well as immediately throwing an error if the underlying file key changes.
- The Deephaven engine now performs hardened chunk pooling, improved semantics, and configurable poolable capacities to reduce memory churn and improve performance.
- The
select
andupdate
operations now release unused sparse array blocks at the end of the update cycle to reduce memory usage.
Python
- The Python Arrow integration has been expanded to include more type mappings so that more Arrow data types can be ingested into Deephaven tables.
Breaking changes
FloatVector
andDoubleVector
column types no longer treat positive and negative zero as different values when calling theequals
method.
Bug fixes
- A table's
assertAppendOnly
now properly checks for append-only tables. WindowCheck
no longer forces tables to be refreshing if they will not change.- Rollup tables now immediately raise an error if it would have to sort an unsortable column.
- All export errors are now consistently logged to improve visibility and debugging.
- Fixed a bug where lazy
MergedDataIndex
objects could omit key columns, leading to incorrect data index behavior for merged tables. - Added a null safety check so that errors won't be raised when transforming filters on columns with unexpected null values.
where_in
now coalesces a set table, ensuring correct filtering across partitioned inputs.- Fixed a bug where case-insensitive utilities could cause inconsistent hashing, impacting joins, grouping, and lookups.
- Fixed a bug where context-less sorts could occur during update propagation.
- Vector grouping columns now work in rollup tables.
- Array-typed columns are no longer allowed as key columns in aggregations.
- Fixed a bug where invoking a filter condition could cause a null pointer exception.
- Performing a
last_by
operation on a blink table with vector columns now reports the correct previous values. - The engine no longer returns the incorrect next key across some iterator implementations.
- Fixed an issue where generated hasher filenames could exceed OS limits for wide tables.
- Fixed a bug where a null pointer exception could arise from merging multiple barrage-subscribed tables.
- A failed filter operation will no longer attempt to free the same chunk of memory twice.
- Snapshotting a sorted rollup table no longer fails if keys are removed.
Server-side APIs: Python
format_columns
now properly respects the query scope.
Client APIs: JavaScript and C++
- Rollup constituents should no longer appear when multiple aggregation columns reference the same underlying constituent data.
- The C++ client build for Fedora and RHEL OS has been fixed.
Reach out
Our Slack community continues to grow! Join us there for updates and help with your queries.