Skip to main content
Version: Python

How to use NumPy in Deephaven queries

This guide will show you how to use NumPy on its own and in Deephaven Python queries.

NumPy is an open-source Python module that includes a library of powerful numerical capabilities. These capabilities include support for multi-dimensional data structures, mathematical functions, and an API that enables calls to functions written in C for faster performance. It is one of the most popular and widely used Python modules currently available.

NumPy is a part of Deephaven's base Docker image. Most of the code in this guide will assume that the module has been imported using the following import statement.

import numpy as np

The N-dimensional array

The foundation upon which NumPy is built is its N-dimensional array, also called an ndarray. This data structure is similar to that of a Python list, with the most notable exception being that every value in the ndarray must be of the same type. For instance, a Python list can be created that contains both numbers and characters, but that is not possible with an ndarray.

Array creation

Creating an ndarray is simple. The code below creates a one-dimensional array (a row vector) with three elements. Then, both the array itself and its type are printed.

import numpy as np
new_array = np.array([1, 2, 3])
print(new_array)
print(type(new_array))

Multi-dimensional arrays are created in similar fashion. The code below creates two and three-dimensional arrays.

import numpy as np
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(array_2d)

array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(array_3d)

Arrays can be created using a variety of different methods. These methods expect the dimensions of the array in a list, and an optional data type.

import numpy as np
# Create an empty array with 2 rows and 4 columns
empty_array_2d = np.empty([2, 4])

# Create an array of zeros (floats) with 3 columns and 2 rows
zeros_array_2d = np.zeros([3, 2], dtype=float)

# Create an array of values from 0 to 9 and reshape it to be 2d
count_array_2d = np.arange(10).reshape(5, 2)

# Create an array of complex number values from 0 to 10 in steps of 2
count_array = np.arange(0, 11, 2, dtype=np.complex64)

# Create a 3d array of random numbers
random_array_3d = np.random.rand(2, 2, 2)
note

The above code does not cover every array creation method. For a comprehensive list, see Array creation routines in Numpy or our guide on How to work with arrays.

There are a large number of supported data types (dtype). For the full list, see NumPy's documentation.

Array attributes

The ndarray object has many attributes that can be checked. Understanding the attributes of an ndarray is important when creating technical portions of code; these attributes are incredibly useful for checking the accuracy of your data.

The code below creates a two-dimensional array. Then, it prints the following array attributes:

AttributeDescriptionMeaning
shapeThe size of each array dimensionHow many rows, columns, pages, etc.
ndimThe number of array dimensionsThe length of the shape
sizeThe number of array elementsThe product of the shape
nbytesThe number of bytes of memory consumed by the array elementsThe product of the shape times the number of bytes per dtype
dtypeThe data type of the elements in the arrayGives information on the memory footprint and data limitations
import numpy as np
new_array = np.array([[1, 2, 3], [4, 5, 6]])
# Print the shape of new_array
print(new_array.shape)
# Print the number of dimensions of new_array
print(new_array.ndim)
# Print the size of the array
print(new_array.size)
# Print the number of bytes of memory consumed by the array's elements
print(new_array.nbytes)
# Print the data type of the array elements
print(new_array.dtype)
note

The above table and code do not print a comprehensive list of all arrays attributes.

Array manipulation

Arrays can be manipulated in many ways. This sub-section will cover some important ones. The code blocks in this subsection will manipulate one or more of the following arrays.

import numpy as np
first_array = np.array([[9, 8, 7, 6, 5], [4, 3, 2, 1, 0]])
second_array = np.random.rand(5, 2)
row_one = np.array([1, 9, 2, 8, 3])
row_two = np.array([7, 4, 6, 5, 10])
col_one = np.array([[1], [2], [3], [4], [5]])
col_two = np.array([[6], [7], [8], [9], [10]])

The shape of an array can be changed without modifying the array's data using reshape.

reshaped_first_array = first_array.reshape(5, 2)
print(reshaped_first_array)

An array can be flattened into a one dimension using ravel.

flattened_first_array = np.ravel(first_array)
print(flattened_first_array)

An array can be transposed using either T or transpose.

transposed_first_array_1 = first_array.T
transposed_first_array_2 = np.transpose(col_one)
print(transposed_first_array_1)
print(transposed_first_array_2)

Arrays can be vertically stacked (on top of one another) using vstack or horizontally stacked (next to one another) using hstack.

stacked_rows = np.vstack((row_one, row_two))
stacked_cols = np.hstack((col_one, col_two))

Arrays can have dimensions with size 1 removed by using squeeze.

print(col_one)
print(col_one.shape)
squeezed_col_one = np.squeeze(col_one)
print(squeezed_col_one)
print(squeezed_col_one.shape)

Similarly, the dimensions of an array can be expanded using expand_dims.

print(row_one)
print(row_one.shape)
expanded_row_one = np.expand_dims(row_one, axis=0)
print(expanded_row_one)
print(expanded_row_one.shape)

Array operations

Array operations done on ndarrays will use numpy.linalg.

The code in this section will use the following two-dimensional arrays.

import numpy as np
array_one = np.array([[1, 0, 1], [0, 2, 0], [-1, 3, 0]])
array_two = np.array([[1, 0, 0], [1, 2, 1], [0, 0, 1]])

Access rows or columns of matrices.

third_row_of_array_one = array_one[2, :]
print(third_row_of_array_one)
first_col_of_array_two = array_two[:, 0]
print(first_col_of_array_two)

Compute the dot product of two vectors using the @ operator or np.dot.

# First way to do it
dot_product_1 = third_row_of_array_one @ first_col_of_array_two
# Second way to do it
dot_product_2 = np.dot(third_row_of_array_one, first_col_of_array_two)

Compute the eigenvalues and eigenvectors of a square array using np.linalg.eig.

w1, v1 = np.linalg.eig(array_one)
w2, v2 = np.linalg.eig(array_two)

The array norm can be computed using np.linalg.norm. The type of norm can be specified with a second argument. Otherwise, the two-norm (Frobenius) is the default.

# The Frobenius norm of array one
frobenius_norm1 = np.linalg.norm(array_one)
# The infinity norm of array two
inf_norm2 = np.linalg.norm(array_two, np.inf)

Solving a system of equations can be done by using np.linalg.solve. Specify a matrix and vector of the appropriate size to find its solution.

solution = np.linalg.solve(array_one, first_col_of_array_two)

print(np.allclose(np.dot(array_one, solution), first_col_of_array_two))

Basic math operations can be applied to arrays. The code below applies sums, differences, products, exponentials, and element-wise operations to both array one and array two.

# Column product and row sum of array one
col_prod_array_one = np.prod(array_one, axis=0)
row_sum_array_one = np.sum(array_one, axis=1)

# Difference between each element of the third row of array one
third_row_diff = np.diff(third_row_of_array_one)

# The exponential (e^x) of each element of array two
exp_array_two = np.exp(array_two)

# Element-wise sum, product, and division of the two arrays
element_wise_sum = np.add(array_one, array_two)
element_wise_prod = np.multiply(array_one, array_two)
element_wise_div = np.divide(array_one, array_two)

Math functions

NumPy hosts a large library of math functions. The full list of these functions can be found here. The code below uses several of these math functions.

import numpy as np
x = np.linspace(0, 2 * np.pi, 101)

# Trigonometric functions and conversions
sin_x, cos_x, tan_x = np.sin(x), np.cos(x), np.tan(x)
x_deg = np.degrees(x)
x_rad = np.radians(x_deg)

# Rounding
x_rounded_whole = np.around(x)
x_rounded_2dec = np.around(x, 2)
x_rounded_down = np.floor(x)
x_rounded_up = np.ceil(x)

Integration with Deephaven tables

Deephaven Python queries can utilize NumPy. There are a few ways to do so.

Use in query strings

NumPy can be used in query strings.

from deephaven import empty_table
import numpy as np

def use_numpy(x):
return np.exp(x)

source = empty_table(5).update(formulas=["X = i"])
result = source.update(formulas=["ExpX = use_numpy(X)"])

Tables to and from DataFrames

Deephaven provides two functions in the base Python package - pandas.to_pandas and dataFrameToTable. These methods convert table data to and from Pandas DataFrames. Pandas DataFrames also have a method called values that will convert data to an ndarray.

The code below converts a table to a NumPy array, and then back to another table by using NumPy and Pandas.

from deephaven import empty_table, pandas
import pandas as pd
import numpy as np

source = empty_table(5).update(formulas=["X = i"])

converted_table = pandas.to_pandas(source)


result = pandas.to_table(pd.DataFrame(converted_table, columns=["X"]))

deephaven.learn

The learn function facilitates the easy transfer of table data to and from Deephaven tables.

The code below uses learn to see the results of calculations made using NumPy in a table.

We define three functions:

  • One that applies calculations to input data.
  • One that gathers Deephaven table data into a NumPy array.
  • A third that scatters the results of calculations back into a table.
#not yet implemented for v2

from deephaven import empty_table
from deephaven.learn import gather
from deephaven import learn
import numpy as np

source = empty_table(101).update(formulas=["X = (i / 101) * 2 * Math.PI"])


def compute_sin(x):
return np.sin(x)


def table_to_numpy(rows, cols):
return gather.table_to_numpy_2d(rows, cols, dtype=np.double)


def numpy_to_table(data, idx):
return data[idx]

result = learn.learn(
table=source,
model_func=compute_sin,
inputs=[learn.Input("X", table_to_numpy)],
outputs=[learn.Output("SinX", numpy_to_table)],
batch_size=101
)