Deephaven is rolling. The team is spending the summer focused on workflows for data scientists: Python pip installation, Jupyter integrations, upgraded table operations, and more real-time plotting capabilities. As fall arrives, we’ll begin releasing developer and application-focused features, like authentication hooks and authorization.
The complete 0.14.0 release notes can be found on GitHub.
Deephaven as a Python library
Deephaven is now available as a simple Python library, making it easy for you to launch it within your existing Python workflow and IDE. All you need is a
pip install deephaven-server from your Python environment. (We recommend installing in a virtual environment.)
Running Deephaven then simply requires:
from deephaven_server import Server
server = Server()
# Here is a simple example to create a new (growing) time table
from deephaven import time_table
table_t = time_table('00:00:02').update(formulas = ["Row = 1", "Squared = Row^2"])
Once you run this in your Python environment, output like the GIF below can be found on your http://localhost:8080/ide/.
Using subtables in clients
In v0.13.0 we released a method called partition_by that enables you to easily split a table into subtables based on keys. This takes advantage of a powerful server-side PartitionedTable construct. This release extends this construct to Deephaven API clients, enabling clients to receive updates (or snapshots) of just one subtable. (You could pull several subtables.)
Plot grouped data with a simple “by” extension
As a follow-on feature of
partitionedTables, you can split data on the server and create a plot that presents the respective subtables as independent series. The following script demonstrates the
by extension of the plotting API:
from deephaven import *
from deephaven.plot.figure import Figure
# populate a table of fake data
t_pb = empty_table(100).update_view(formulas = ["X=i", "Z=i%3", "Y=X*X + Z * 10000", "Series_Id = `series_` + Z"]).move_columns(idx=0, cols=["Series_Id"])
# create the partitioned plot
figure = Figure()
p = figure.plot_xy(series_name="partitioned", t=t_pb, by=["Series_Id"], x="X", y="Y").show()
Projecting columns based on grouped data
We have introduced a new table operation,
updateBy . This allows columns to be derived from aggregations over a range of rows within a group, producing an output table with the same structure and rows as the input table, but for added columns (as in
update). Currently supported aggregations include exponential moving averages (EMAs), cumulative sums, cumulative products, cumulative minimums, and cumulative maximums. In general,
updateBy allows for rows to be decorated with results derived from multiple input rows, qualified by the
In the next release,
updateBy will be wrapped for Python. Below is a simple Java Groovy example:
cumulative_sum_grouped=emptyTable(10).update("RowNum=ii", "Group=RowNum%3").updateBy(CumSum("RowNumSum=RowNum"), "Group")
cumulative_multi_grouped=emptyTable(10).update("RowNum=ii", "Group=RowNum%3").updateBy([CumSum("RowNumSum=RowNum"),CumMin("RowNumMin=RowNum")], "Group")
A better EMA method for Python users
EMAs and other moving averages will inherit more improvements in the next release as the result of incorporating the
updateBy into the Python table API. However, as a step in that direction, the syntax around EMAs is simpler:
from deephaven.experimental.ema import ByEmaSimple
from deephaven import time_table
# create some data
return random.choice(['A', 'B', 'C', 'D', 'E', 'F'])
result = time_table('00:00:01').update(formulas=["Sym = (String)random_char()", "Numbers = (int)random.randint(0, 100)"])
# define the EMA
ema1sec = ByEmaSimple('BD_SKIP','BD_SKIP','TIME',10,'SECONDS', type='LEVEL')
# use the EMA on the data
ema_data = result.view(formulas=["EMA_data = ema1sec.update(Timestamp, Numbers)"]).last_by()
ema_data_grouped = result.view(formulas=["Sym", "EMA_data = ema1sec.update(Timestamp, Numbers, Sym)"]).last_by(by=["Sym"]).sort(order_by=["Sym"])
TensorBoard integration is now available
Knowing the popularity of TensorBoard for Python ML and AI users, we have paved the way for its integration with Deephaven. Please reference our how-to’s for using TensorBoard with PyTorch and with TensorFlow, respectively.
We now support configuring SSL on the Deephaven server for secure environments. You can deploy jetty, netty, or run the server from Python with secure connections from your clients to the server, using certificates you make available.
The development team is prioritizing the following initiatives:
- An integration with Jupyter and Visual Studio Code to support Deephaven’s updating, real-time widgets, such as its interactive, ticking plots and tables.
- A native Go client.
- Dynamic table subscriptions and updates for the C++ client.
- Improved performance of
updateBy(), its wrapping for Python, and its application to other common data manipulation patterns.
- Better proxy engineering to minimize browser resource consumption.
- Authentication and authorization hooks and example integrations.
- Research of a direct integration with plotly, removing the need for a Deephaven-specific plotting API.
You can find changes and improvements to the Deephaven user documentation in this blog post.
Please direct feedback, questions, or perspective about product development to Deephaven’s Slack or its GitHub Discussions.
We welcome your direction and look forward to supporting your use cases.