Skip to main content

Release Notes for Deephaven v0.9.0

· 4 min read
DALL·E prompt: highly detailed notes of paper falling from the sky in a yellow room, digital art
Ryan Caudy

The Deephaven team dedicated its January efforts to interoperability. The use of open formats, the integration of popular tools, and the customization of experiences remain fundamental priorities. The Deephaven projects – deephaven-core, barrage, and web-client-ui – evolved in a full embrace of those principles.

Highlights

  • Developed a new (stand-alone) CSV reader – focused on speed and type inference.
  • Introduced multi-thread parallelization of select() and update().
  • Delivered an integration with Change Data Capture (CDC).
  • Established Docker images of Deephaven paired with popular Python AI libraries.
  • Changed backend technology to align with best practices for virtual environment management.
  • Improved the Kafka Avro integration.

End-of-February deliveries will center around:

  • Infrastructure to support plugins for both server-side and JS-client-side extendability.
  • Useful Python visualization libraries delivered as Deephaven plugins, starting with matplotlib and seaborn.
  • Pandas DataFrames rendered in Deephaven as table widgets in the browser UI, complete with interactive experiences.
  • A more idiomatic and interoperable Python architecture.
  • Structuring to support using Deephaven as a library in Java and Python.
  • Use of table maps to support grouped-plotting (i.e. plotBy()) and other experiences.
  • Performance benchmarking and testing for both incremental updates and batch operations.
  • WIP for the C++ client API: Dynamic data delivered from server to client using Barrage.

All release details can be explored in the Pull-Requests and related GitHub Issues itemized here.

Full Release Summary

Query engine

Parallel select() implementation

The query engine can now be configured to use multiple threads to support independent column decorations (via the operations of select() and update()), both across multiple columns or within a single one. #1749, #1855


Python, ML, AI, and Data Science

"Base+" images that pair Deephaven with Python AI modules are now available

PyTorch, TensorFlow, SciKit-Learn are important libraries for many Deephaven users, so easing their deployment with the core engine was paramount. You can now find these respective Docker images in the main README or QuickStart. #1803

Efficiency improvements to the DH Learn Library

The Deephaven Learn Library enables you to easily marry the power of the Deephaven engine – and its streaming tables – to Python libraries. RowSet generation, an important part of its value proposition was made more efficient by introducing a builder. #1655

More Python-idiomatic implementation of Deephaven’s datetimeutils

The dtypes module was refactored into a package with new wrapper classes, proper docstrings, and unit tests. #1812

Other fixes

  • Solution for the JPY library’s mishandling of long values. #1461
  • Bump to the latest Python 3.7. #1906

Data Sources and Sinks

New CSV reader

Deephaven was relying on the Apache Common CSV reader, but users found its performance unsatisfactory. After exploring other parsers available in open source, we decided to write a new one from scratch. Though alternatives had some interesting capabilities, Deephaven use cases rely on type inference and often need good handling of date-time fields, so a new solution was necessary. The implementation focused on delivering best-of-class performance and is available in its own repo under an Apache 2.0 license. #1629, #1837

CDC ingestion support

Deephaven now supports integrating sources with Change Data Capture software patterns. CDC maps elegantly to Deephaven’s real-time capabilities generally and its incremental update model specifically. We will soon post an example integration with Debezium, using a MySQL source database. #1819

Other fixes

  • Support of BigDecimal publishing to Kafka Avro. #1894, #1899
  • Improvements for publishing to Kafka related to schema registry and DateTime types. #1877
  • Parquet cleanup for flat array sources as it relates to select() of static data. #1793
  • Fix to batch export response handling for the pyclient. #1859

Further reading

Enhancements to the Deephaven docs can be found here. (DH docs are run as a software project in GitHub. It will soon be made public.)