Deephaven Release Notes: Version 1.20240517
Note
See this release's accompanying Version Log. The Version Log includes links to change-specific release notes that supplement these release notes.
If upgrading from a previous Deephaven release, please refer to the additional Upgrade documentation.
Deephaven 1.20240517 (Grizzly) provides significant improvements to query resiliency, administrative tools, and Kubernetes deployment. The Core+ workers in Grizzly now support persistent input tables, replay queries, and tools for reliably creating derived data.
Deephaven no longer supports Java 8 on the server. The minimum required version is Java 11, and 17 is recommended. Legacy Java clients must execute the same version of Java as the server; however, Core+ and SBE clients can still run Java 8 programs.
Grizzly supports Python 3.9 and 3.10; support for Python 3.8 has been dropped. In a future release, Core+ workers will support Python 3.12 or later. Legacy workers will not be updated beyond Python 3.10, and Legacy Python support will be removed in 2026 when Python 3.10 is EOL.
SAML authentication is no longer distributed as a separate plugin, but is rather included in the Deephaven Enterprise installation eliminating the need for a separate installation step. The SAML Configuration properties remain unchanged.
Query Replicas and Controller Failover
Persistent Query replicas provide load-balancing and redundancy to Deephaven applications. In the query settings panel, the owner can configure a number of replicas and spares. When users log into Deephaven, they are assigned to one of the query replicas. Deephaven includes a simple round-robin load balancer that assigns each user to one of the replicas in turn. Developers can implement the pluggable AssignmentPolicy interface to make more complex decisions for their own use cases.
If one of the active replica queries fails, then a spare query takes its place. Users connected to the failed query are redirected to the spare without the need for the query to reinitialize. The controller then starts a new worker to act as the spare. Query owners and administrators can see all of a query’s replicas and spares from the web UI in a tree view.
The controller process can now execute on more than one node in the Deephaven cluster. At startup, each controller process campaigns to be the leader. The controller that is elected leader manages all of the persistent queries, and the other controllers continue their campaign for leader. If the primary controller terminates, then one of the other controllers takes over as leader. Core+ queries automatically register with the new controller without disruption. Legacy queries are terminated by the new controller and restarted according to their schedule.
Query Server Selection
In Jackson (1.20221001), Deephaven introduced automated query server selection. When creating a console or Persistent Query in Grizzly, the default server is now “Auto_Query” or “Auto_Merge” (as appropriate for the query type). With the default server selection algorithm, when a console or query is started, it is assigned to the host with the most free memory. This removes the burden from the user and administrators for balancing resources across the query cluster. For some types of queries that store data persistently on a single host (e.g. an in-worker ingestion query) consider using static assignment instead.
DIS/TDCP Responsive to Routing Changes
When the data routing configuration is changed, either by using dhconfig routing to import a new file or dhconfig dis to add a new data import server with claims, the DIS and TDCP processes now reload the routing configuration. This permits you to add a new Data Import Server without restarting other system components.
dhconfig acls
and dhconfig pq
The iris_db_user_mod
and controller_tool
programs have been integrated into the dhconfig command with data types of acls
and pq
. The old tools are still present in Grizzly, but will be removed in the next release.
The command line syntax, help, and argument handling are much improved in dhconfig. For example, you can use quoted command line arguments to get the status for a query with spaces in the name or add an ACL with characters like “*” without shell expansion. Unlike the old tools, dhconfig does not automatically invoke sudo - so non-privileged users can run the tool. This opens up a variety of use cases:
- dhconfig pq can be used to get a query’s status (either as a table or formatted as JSON)
- dhconfig pq can export or import a query on behalf of a user
- dhconfig acl allows a user to add their own public keys to the authentication server
Core+ Input Tables
Core+ workers now support persistent input tables. You can create and edit these tables programmatically using the Core+ WritableDatabase object. When loaded into the Deephaven Web UI, users can add, edit, and delete rows.
The format of Core+ input tables is different from Legacy's, with the key column definitions and other attributes stored inside the table schema. A conversion tool can convert Existing Legacy input tables to Core+.
DerivedTableWriter
Core+ workers now include the DerivedTableWriter, which allows you to ingest add-only or blink Deephaven tables persistently to a DataImportServer. A data developer can ensure exactly once delivery by using a sequence number stored in the checkpoint record or with custom logic. This enables several use cases:
- Translating a data feed from a raw format like FIX into a parsed columnar format.
- Downsampling a data feed and storing the result persistently.
- Persisting in-memory tables generated by custom logic to disk.
Core+ Replay Queries
Replay Script Persistent Queries now support Core+ workers, enabling you to test a query using historical data.
Kubernetes Configuration
The Deephaven helm chart has migrated configuration off of NFS in /etc/sysconfig to first-class Kubernetes objects. Plugins are now included as part of the image-building process rather than requiring the installation to reference code outside of the immutable containers. Only the “/db/Users” and “/db/Systems” persistent volumes for historical data are now required.
When starting a worker from a Persistent Query or Code Studio, you can choose to allocate additional memory to account for non-heap usage. For example, when executing a Python worker, Python objects are not part of the Java heap, but are instead allocated and garbage collected by the Python interpreter's memory management subsystem. Although the bare metal query server can also take advantage of this to more properly account for memory usage, it is especially important on Kubernetes. Deephaven must assign a memory request and a limit to each worker pod; if the pod exceeds the memory limit then the pod is terminated by the OS kernel.
When starting a worker, the Kubernetes node must allocate resources, download the worker image, and finally initialize the container. To make the worker’s startup state more transparent, Deephaven now provides additional feedback from the Kubernetes engine in the “StatusDetails” column of the persistent query monitor and for the Core+ Python client. In a future release, Code Studios will include similar feedback.