Use NumPy in Deephaven queries

This guide will cover the intersection of NumPy and Deephaven.

NumPy is an open-source Python module that includes a library of powerful numerical capabilities. These capabilities include support for multi-dimensional data structures, mathematical functions, and an API that enables calls to functions written in C for faster performance. It is one of the most popular and widely used Python modules.

NumPy is one of Deephaven's Python dependencies. Deephaven's Python API comes stock with NumPy.

Intersection with Deephaven

Deephaven intersects with NumPy in a few different ways. Each of the following subsections will cover one of those intersections.

`deephaven.numpy`

deephaven.numpy is a submodule of the Deephaven Python package that provides three functions:

to_np_busdaycalendar: Converts a Deephaven BusinessCalendar to a NumPy busdaycalendar.
to_numpy: Converts a Deephaven table to a NumPy array.
to_table: Converts a NumPy array to a Deephaven table.

The following code block converts a Deephaven table to a NumPy array and back:

from deephaven import numpy as dhnp
from deephaven import empty_table

source = empty_table(10).update(
    ["X = randomDouble(0, 10)", "Y = randomDouble(100, 200)"]
)

np_source = dhnp.to_numpy(source)

print(np_source)

result = dhnp.to_table(np_source, cols=["X", "Y"])

Important

NumPy arrays can only have one data type. If the table has multiple data types, the conversion will fail.
to_numpy copies an entire table into memory, so copy only the data you need.

The following code block imports one of Deephaven's example business calendars and converts it to a NumPy busdaycalendar. For more on calendars in Deephaven, see business calendars.

from deephaven import calendar as dhcal
from deephaven import numpy as dhnp

usnyse_example = dhcal.calendar("USNYSE_EXAMPLE")
print(type(usnyse_example))

np_usnyse_example = dhnp.to_np_busdaycalendar(usnyse_example)
print(type(np_usnyse_example))

NumPy in query strings

NumPy can be used in query strings like any other Python module. Deephaven recommends wrapping NumPy function calls in user-defined functions when called in query strings.

Important

Python functions called in query strings should use type hints to ensure the resultant column(s) are of the correct data type.
Python functions are usually slower than equivalent built-in methods.

The following code block calculates the cube root of an array column with NumPy:

from deephaven import empty_table
from typing import Sequence
import numpy as np


def np_cuberoot(arr) -> Sequence[float]:
    return np.cbrt(arr)


source = empty_table(4).update("X = pow(i + 1, 3)").group_by()

result = source.update("Y = np_cuberoot(X)").ungroup()

The following code block calculates the great circle distance between different coordinates on the earth with NumPy:

from deephaven.column import double_col
from deephaven import new_table
import numpy as np


def gcd_earth(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat / 2.0) ** 2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2.0) ** 2

    return 2 * 6378.137 * np.arcsin(np.sqrt(a))


source = new_table(
    [
        double_col("Lat1", [15.5, 45.2, 75.3]),
        double_col("Lon1", [156.4, -121.1, 90.0]),
        double_col("Lat2", [30.9, 62.8, 79.3]),
        double_col("Lon2", [-60.1, 26.6, -18.7]),
    ]
)

result = source.update("DistanceKM = gcd_earth(Lat1, Lon1, Lat2, Lon2)")

deephaven.learn

deephaven.learn is a submodule of the Deephaven Python package that provides functions to facilitate data transfer between Deephaven tables and Python objects, namely NumPy arrays. It is typically used for AI/ML applications, but can be used for any data science workflow. It follows a gather-compute-scatter paradigm, in which data is gathered from tables into NumPy arrays, computations are performed, and the results are scattered back into a table.

The code below uses deephaven.learn to calculate the sine of a set of values.

Three functions are defined:

compute_sin, which applies computations to gathered data.
table_to_numpy, which gathers rows and columns from a table into a NumPy array.
numpy_to_table, which scatters the results of the computations back into a table.

At the end, the learn function calls these functions on the source table.

from deephaven import empty_table
from deephaven.learn import gather
from deephaven import learn
import numpy as np

source = empty_table(101).update(formulas=["X = (i / 101) * 2 * Math.PI"])


# Calculate the sine of a value or sequence of values
def compute_sin(x):
    return np.sin(x)


# Convert table data to a 2d NumPy array
def table_to_numpy(rows, cols):
    return gather.table_to_numpy_2d(rows, cols, np_type=np.double)


# Return the model's answer so that it can be scattered back into a table
def numpy_to_table(data, idx):
    return data[idx]


result = learn.learn(
    table=source,
    model_func=compute_sin,
    inputs=[learn.Input("X", table_to_numpy)],
    outputs=[learn.Output("SinX", numpy_to_table, "double")],
    batch_size=101,
)

For a deeper dive on this subject, see the deephaven.learn guide.

Use NumPy in Deephaven queries

Intersection with Deephaven

deephaven.numpy

NumPy in query strings

deephaven.learn

Related documentation

`deephaven.numpy`