Skip to main content
Version: Python

Use SciPy

This guide will show you how to use SciPy in your Python queries in Deephaven.

Refer to our guide How to install Python packages for instructions on installing this package in Deephaven.

SciPy is an open-source scientific computing library for Python. It contains a large assortment of sub-packages with methods that can perform tasks in the following areas:

  • Optimization
  • Linear algebra
  • Calculus
  • Interpolation
  • Transforms
  • Signal processing
  • Differential equations
  • Multi-dimensional array processing

The above list is not comprehensive; SciPy's many sub-packages cover an even broader range of topics (see the Appendix at the bottom of this guide for a brief description of each). SciPy was built to operate on NumPy arrays, which we'll be using extensively in this guide. Check out How to use NumPy to learn more about how they work.

Examples

The examples below demonstrate using SciPy in a traditional Python setting, then extend them to use Deephaven. SciPy is a natural pair for deephaven.learn, so some of the corresponding Python queries with Deephaven use it. For more information on the learn submodule, check out How to use deephaven.learn.

Describe data

In this example, we'll use scipy.stats.describe to give basic insight into some data. The data we'll be describing is an ndarray with integer values from 0 to 99.

from scipy.stats import describe
import numpy as np

source = np.arange(100)
result = describe(source)
print(result)

To get these insights on data in a Deephaven table, we'll convert it to a NumPy array by using pandas.to_pandas and values. We can then use the method just as before.

from deephaven import empty_table, pandas
from scipy.stats import describe
import pandas as pd
import numpy as np

source = empty_table(100).update(formulas=["X = i"])
result = describe(np.squeeze(pandas.to_pandas(source).values))
print(result)

img

Compute the Airy function

In this example, we'll use scipy.special.airy to compute the Airy function of some values. In this case, we'll calculate the Airy function on values ranging from -20 to 0 in increments of 0.2.

from scipy.special import airy
import numpy as np

x = np.linspace(-20, 0, 101)
ai, aip, bi, bip = airy(x)

Let's extend this code to use Deephaven tables!

from deephaven import empty_table
from scipy.special import airy
import numpy as np

source = empty_table(101).update(formulas=["x = -20 + 20 * i / 100"])


def compute_airy(x):
ai, aip, bi, bip = airy(x)
return ai, aip, bi, bip


result = source.update(formulas=["y = compute_airy(x)"])

img

Filter a noisy signal

In this example, we'll generate a Sine wave and add noise. Then, we'll apply filters to reduce the noise.

from scipy.signal import medfilt, savgol_filter
import numpy as np

x = np.linspace(0, 2 * np.pi, 101)

noisy_sine_wave = np.sin(x) + np.random.normal(0, 0.25, 101)

median_filtered = medfilt(noisy_sine_wave)
savgol_filtered = savgol_filter(noisy_sine_wave, 27, 3)

The above example can be extended to use deephaven.learn in a couple of ways.

from deephaven.plot.figure import Figure
from deephaven.learn import gather
from deephaven import empty_table
from deephaven import learn
from scipy.signal import medfilt, savgol_filter
import numpy as np


def create_noisy_sine_wave(x) -> np.double:
return np.sin(x) + np.random.normal(0, 0.25)


source = empty_table(101).update(
formulas=[
"X = i * (2 * Math.PI) / 100",
"Noisy_Sine_Wave = create_noisy_sine_wave(X)",
]
)


def apply_median_filter(signal):
return medfilt(signal)


def apply_savgol_filter(signal):
return savgol_filter(signal, 27, 3)


def table_to_numpy(rows, cols):
return np.squeeze(gather.table_to_numpy_2d(rows, cols, np_type=np.double))


def numpy_to_table(data, idx):
return data[idx]


median_filtered = learn.learn(
table=source,
model_func=apply_median_filter,
inputs=[learn.Input("Noisy_Sine_Wave", table_to_numpy)],
outputs=[learn.Output("Median_Filtered", numpy_to_table, "double")],
batch_size=source.size,
)

savitzky_golay_filtered = learn.learn(
table=source,
model_func=apply_savgol_filter,
inputs=[learn.Input("Noisy_Sine_Wave", table_to_numpy)],
outputs=[learn.Output("Savitzky_Golay_Filtered", numpy_to_table, "double")],
batch_size=source.size,
)

plt = (
Figure()
.plot_xy(series_name="Noisy Signal", t=source, x="X", y="Noisy_Sine_Wave")
.plot_xy(
series_name="Median Filtered", t=median_filtered, x="X", y="Median_Filtered"
)
.plot_xy(
series_name="Savitzky-Golay Filtered",
t=savitzky_golay_filtered,
x="X",
y="Savitzky_Golay_Filtered",
)
.show()
)

img

Compute the nearest neighbor of every point in a set

In this example, we'll use a K-d tree to compute the nearest neighbor of every point in a set. SciPy's implementation is called scipy.spatial.KDTree. We construct a two-dimensional data set with a few points, and we'll find the index of each point's nearest neighbor in the set.

from scipy.spatial import KDTree as kdtree
import numpy as np

x = np.random.randint(-10, 10, 10)
y = np.random.randint(-10, 10, 10)

data = np.vstack((x, y)).T

print(data)

tree = kdtree(data)

distances, indices = tree.query(data, k=2)

nearest_neighbor_indices = indices[:, 1]

print(nearest_neighbor_indices)

Computing the nearest-neighbor distances and indices can also be done using deephaven.learn.

from deephaven.pandas import to_table
from deephaven.learn import gather
from deephaven import learn

from scipy.spatial import KDTree as kdtree
import pandas as pd
import numpy as np

indices = np.arange(10)
x = np.random.randint(-10, 10, 10)
y = np.random.randint(-10, 10, 10)

source = to_table(pd.DataFrame({"Index": indices, "X": x, "Y": y}))


def calculate_nearest_neighbor_indices(data):
tree = kdtree(data)
distances, indices = tree.query(data, k=2)
return indices[:, 1]


def table_to_numpy(rows, cols):
return gather.table_to_numpy_2d(rows, cols, np_type=np.int_)


def numpy_to_table(data, idx):
return data[idx]


result = learn.learn(
table=source,
model_func=calculate_nearest_neighbor_indices,
inputs=[learn.Input(["X", "Y"], table_to_numpy)],
outputs=[learn.Output("Nearest_Neighbor_Index", numpy_to_table, "int")],
batch_size=source.size(),
)

img

Compute distances of various metrics between two vectors

In this example, we create two three-dimensional vectors. We then compute the distance between the two vectors using metrics supported by scipy.spatial.distance.

from scipy.spatial import distance
import numpy as np

x = np.array([1, 3, -1])
y = np.array([-2, 0, 4])


def print_distances(x, y):
print("x: " + str(x))
print("y: " + str(y))
print("Bray-Curtis distance between x and y: " + str(distance.braycurtis(x, y)))
print("Chebyshev distance between x and y: " + str(distance.chebyshev(x, y)))
print("Cosine distance between x and y: " + str(distance.cosine(x, y)))
print("Euclidean distance between x and y: " + str(distance.euclidean(x, y)))
print("City block distance between x and y: " + str(distance.cityblock(x, y)))


print_distances(x, y)

We can once again use deephaven.learn to calculate these distances.

from deephaven import new_table, learn
from deephaven.column import int_col
from deephaven.learn import gather

from scipy.spatial import distance
import numpy as np

source = new_table([int_col("X", [1, 3, -1]), int_col("Y", [-2, 0, 4])])


def table_to_numpy(rows, cols):
return np.squeeze(gather.table_to_numpy_2d(rows, cols, np_type=np.intc))


def numpy_to_table(data, idx):
return data[idx]


def print_distances(data):
x = data[:, 0]
y = data[:, 1]
print("x: " + str(x))
print("y: " + str(y))
print("Bray-Curtis distance between x and y: " + str(distance.braycurtis(x, y)))
print("Chebyshev distance between x and y: " + str(distance.chebyshev(x, y)))
print("Cosine distance between x and y: " + str(distance.cosine(x, y)))
print("Euclidean distance between x and y: " + str(distance.euclidean(x, y)))
print("City block distance between x and y: " + str(distance.cityblock(x, y)))


learn.learn(
table=source,
model_func=print_distances,
inputs=[learn.Input(["X", "Y"], table_to_numpy)],
outputs=None,
batch_size=source.size(),
)

img

Appendix: Sub-packages of SciPy

Here is a comprehensive list of the sub-packages available in SciPy, and a brief description of what each one does:

Sub-package nameDescription
clusterClustering routines
constantsA collection of constants and conversion utilities
fftFourier transform methods
fftpackDeprecated legacy Fourier transform methods
ingegrateNumerical integration algorithms
interpolateTools for data interpolation
ioData I/O
linalgLinear algebra routines
miscExtra miscellaneous utilities
ndimageImage processing library
odrOrthogonal distance regression library
optimizeAlgorithms for optimizing code
signalSignal processing routines
sparseTools for sparse matrix creation, handling, and procesing
spatialSpatial structure processing algorithms
specialSpecial functions
statsStatistical function library
weaveDeprecated legacy C/C++ code writing tool

Each is described below. We skip fftpack and weave because they are deprecated.## Appendix

scipy.cluster

scipy.cluster is split into two further sub-modules:

  • scipy.cluster.vq supports vector quantization and K-means algorithms.
  • scipy.cluster.heirarchy provides hierarchical and agglomerative clustering.

scipy.constants

scipy.constants provides an extensive number of mathematical and physical constants including (but not limited to) pi, the golden ratio, the Planck constant, subatomic particle masses, SI prefixes, binary prefixes, and many others. The full list of constants is far, far too long to include here.

scipy.fft

scipy.fft contains methods that apply the forward and reverse discrete Fourier, Sine, Cosine, and Hankel transformations to input data in N dimensions.

scipy.integrate

scipy.integrate contains a slew of methods to calculate integrals and solutions for differential equations.

scipy.interpolate

scipy.interpolate is a sub-package for objects used in data interpolation. It provides univariate and multivariate interpolation techniques, and spline interpolators.

scipy.io

scipy.io provides utilities to read and write data to and from various file formats including MATLAB, Fortran, WAV, and others.

scipy.linalg

scipy.linalg is a library of linear algebra functions. You can use this sub-module to solve eigenvalue problems, perform matrix decompositions, solve matrix equations, and construct special matrices among other things. Matrices constructed using this sub-package are of type numpy.ndarray.

This sub-module contains two further sub-packages for low-level routines: scipy.linalg.blas and scipy.linalg.lapack. These two sub-packages provide low-level functions from the Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) libraries. Cython implementations of these low-level libraries exist as well.

Additionally, it provides routines for interpolative matrix decomposition via scipy.linalg.interpolative.

scipy.misc

The miscellaneous routines sub-module, scipy.misc, is the home of some example data sets and two special derivative calculators.

scipy.ndimage

scipy.ndimage contains a library of methods to perform image processing. Methods include filters, interpolation routines, measurements, and morphology routines.

scipy.odr

scipy.odr contains routines to perform orthogonal distance regression (ODR) on input data. ODR extends least squares fitting, which generally results in more accuracy in fitted curves.

scipy.optimize

The scipy.optimize sub-module provides univariate and multivariate optimization routines. These optimization routines will minimize or maximize objective functions, and can handle constraints on said functions. Solvers for both linear and nonlinear problems is supported.

There are underlying C functions for four different root finders that can be accessed via Cython.

scipy.signal

scipy.signal is SciPy's signal processing sub-module. It supports convolutions, B-splines, filters, filter designing routines, continuous time linear systems, discrete time linear systems, waveform generators, window functions, wavelet generators, maxima/minima finers, and spectral analyzers.

scipy.sparse

scipy.sparse contains functions for building and identifying sparse matrices in various formats. Additionally, it contains sparse matrix classes for each individual sparse matrix format.

This sub-module contains two sub-packages: scipy.sparse.csgraph and scipy.sparse.linalg. The first contains a library of compressed sparse graph routines, and the second contains a library of sparse linear algebra routines.

scipy.spatial

scipy.spatial contains spatial algorithms and data structures. Features include nearest-neighbor query routines, triangulation, convex hulls, Voronoi diagrams, and plotting helpers.

One sub-package of scipy.spatial is scipy.spatial.distance, which provides utilities for computing distance matrices, cheing their validity, and distance functions in numeric and boolean vectors.

There is another sub-pacakage within scipy.spatial called scipy.spatial.transform, which provides functions for spatial transformations including rotations, spherical linear interpolations of rotations, and rotation splines.

scipy.special

SciPy special functions are defined in scipy.special, which are extended to Cython via scipy.special.cython_special. Special functions include (but are not limited to) Airy, Bessel, Struve, Legendre, and statistical distributions.

scipy.stats

The sub-module, scipy.stats, contains probability distributions, summary statistics, frequency statistics, and functions for calculating correlations between distributions among other things. Methods exist for continuous, discrete, and multivariate distributions. There are are also some summary methods that can be used to describe data with little prior knowledge.

There are two sub-packages within scipy.stats: scipy.stats.mstats and scipy.stats.qmc. The former is used for calculating statistics on masked arrays, and the latter is used for describing quasi-Monte Carlo capabilities.