Use NumPy in Deephaven queries

This guide will cover the intersection of NumPy and Deephaven.

NumPy is an open-source Python module that includes a library of powerful numerical capabilities. These capabilities include support for multi-dimensional data structures, mathematical functions, and an API that enables calls to functions written in C for faster performance. It is one of the most popular and widely used Python modules.

NumPy is one of Deephaven's Python dependencies. Deephaven's Python API comes stock with NumPy.

Intersection with Deephaven

Deephaven intersects with NumPy in a few different ways. Each of the following subsections will cover one of those intersections.

deephaven.numpy

deephaven.numpy is a submodule of the Deephaven Python package that provides three functions:

The following code block converts a Deephaven table to a NumPy array and back:

Important

  • NumPy arrays can only have one data type. If the table has multiple data types, the conversion will fail.
  • to_numpy copies an entire table into memory, so copy only the data you need.

The following code block imports one of Deephaven's example business calendars and converts it to a NumPy busdaycalendar. For more on calendars in Deephaven, see business calendars.

NumPy in query strings

NumPy can be used in query strings like any other Python module. Deephaven recommends wrapping NumPy function calls in Python functions when called in query strings.

Important

  • Python functions called in query strings should use type hints to ensure the resultant column(s) are of the correct data type.
  • Python functions are usually slower than equivalent built-in methods.

The following code block calculates the cube root of an array column with NumPy:

The following code block calculates the great circle distance between different coordinates on the earth with NumPy:

deephaven.learn

deephaven.learn is a submodule of the Deephaven Python package that provides functions to facilitate data transfer between Deephaven tables and Python objects, namely NumPy arrays. It is typically used for AI/ML applications, but can be used for any data science workflow. It follows a gather-compute-scatter paradigm, in which data is gathered from tables into NumPy arrays, computations are performed, and the results are scattered back into a table.

The code below uses deephaven.learn to calculate the sine of a set of values.

Three functions are defined:

  • compute_sin, which applies computations to gathered data.
  • table_to_numpy, which gathers rows and columns from a table into a NumPy array.
  • numpy_to_table, which scatters the results of the computations back into a table.

At the end, the learn function calls these functions on the source table.

For a deeper dive on this subject, see the deephaven.learn guide.