Deephaven continues to evolve its core engine, APIs, and user experiences to support data developers, scientists, and analysts. The release notes for 0.12.0 and 0.13.0 provide the full color of the team’s work. The front-end crew continues to augment the range and depth of interactive experiences. Refer to the Web-UI project’s release notes to track the details of their progress.
Deephaven empowers people to work with tables that change. Over the past months, the team has extended those capabilities to the concept of sub-tables. The engine now supports a first-class method called
partition_by which splits a table into sub-tables based on key columns. The set of sub-tables as well as their shape, data composition, and dependent operations update dynamically as new data flows through the directed acyclic graph.
We humbly suggest that partitioned tables are a big deal. These constructs can materially improve performance for some use cases and make it easy for users to parallelize queries across multiple threads, quickly retrieve sub-tables to support user interfaces, and speed up filters interactively called within loops.
Python pandas plug-in
From a user's point of view, Deephaven methods can be applied identically on static or dynamically updating tables. This unifying capability uniquely fuels many use cases. However, it’s understood that, at times, Python data scientists will want to move from Deephaven tables to pandas dataframes. This conversion (
pandas.to_pandas()) has long been supported, with users often coupling it with snapshots. With this release, however, pandas dataframes are now viewable in the Web IDE and dashboards, with identical capabilities for interacting with the experience (via filtering, sorting, plotting, etc.) as is the case with standard Deephaven tables.
Ring tables or real-time ingestion
Kafka and other streaming ingestors provide vital interoperability. Historically, you could configure event-driven ingestors one of two ways: (i) by receiving appending data, to be stored in the Deephaven worker’s memory; or (ii) by using
TableType.stream(), where the worker uses new data once and then discards it (for example, in support of aggregations).
Now you can import streams a middling way. A third method called
TableType.ring() let’s you configure how many records to retain. This capability provides retention of recent history while relieving you of worry about memory bloat.
The Web-UI’s data grid documentation has been updated
The Web-IDE, as well as its grid and dashboarding utilities are a modular, independent project. Though supporting Deephaven experiences, we see them as relevant for integration with other independent, data-driven platforms as well. We encourage JS developers to consider them. Accordingly, the documentation for the JS Grid was updated. It now provides an interactive experience to help you explore your own customizations.
The Deephaven engine is now using open addressed hashing for the
aj(), and aggregation-driven table methods. As part of this change, we are using Javapoet to dynamically generate typed hash tables - much like C++ templates - to allow the JVM to use the right primitive operators for a particular column. This new implementation improves performance over our existing chained bucket hash tables that used chunks to avoid per-row virtual calls. We continue to use an incremental rehashing strategy so that the latency of an incremental computation remains consistent even when a rehash is required.
Barrage, the gRPC API for streaming tables (for deephaven-core and beyond), now supports pub-sub for snapshots for very large tables as well as viewports that grow in a substantial way. Before this release, snapshots were restricted to handle sizes no larger than 2^31 (~2.1 billion) rows, array or string column accumulations, or byte lengths. That constraint no longer exists. Further, for viewport subscriptions that grow, there is now an initial target number of cells for snapshot, followed by an adjustment based on the time-lapse as a fraction of the update graph processor (UGP) cycle time. Internally this mechanism is now used for downloading very large CSVs from via the Web IDE.
Barrage performance has now been documented, with infrastructure available for you to perform similar measurements – both for static data (via Arrow’s
DoGet()) and dynamically updating, streaming table subscriptions (-- i.e. “table deltas”)
Primitives rewrite: Many functions are auto-imported for use in the query language. This library of functions has been rewritten to more comprehensively support all relevant types, have better naming, provide previously missing functionality, and have high test coverage.
The Python wrapper of the table API evolved further, improving the handling of application mode,
format_column/row_where()for Python users. Further, array and vector creation are now supported data types.
The team continues to evolve the Python-Java bidirectional bridge, the jpy project, in support of Deephaven’s users. In this case, it now allows Java to hold and keep the GIL when it wants it, such as for cleanup work. Java can grab the GIL without a roundtrip to Python.
Initiatives and other work
Other important initiatives underway include:
- Full delivery of dynamic Python plotting experiences. For example, with the Deephaven plug-in, one will be able to support Matplotlib and Seaborn experiences that update in real-time or support front-end interactivity with streaming tables.
- Deephaven as a Python library. It is intended that in early July you will be able to pip install Deephaven from a Python experience like Jupyter and use its services in full.
- Subscriptions to dynamic tables from the C++ and Python client APIs. This will open a significant range of use cases for app developers currently available only in the Java client API.
- A native Go client.
Changes and improvements to the Deephaven user documentation can be found in this blog post.