Version: Python

# How to use SciPy

This guide will show you how to use SciPy in your Python queries in Deephaven.

Refer to our guide How to install Python packages for instructions on installing this package in Deephaven.

SciPy is an open-source scientific computing library for Python. It contains a large assortment of sub-packages with methods that can perform tasks in the following areas:

• Optimization
• Linear algebra
• Calculus
• Interpolation
• Transforms
• Signal processing
• Differential equations
• Multi-dimensional array processing

The above list is not comprehensive; SciPy's many sub-packages cover an even broader range of topics (see the Appendix at the bottom of this guide for a brief description of each). SciPy was built to operate on NumPy arrays, which we'll be using extensively in this guide. Check out How to use NumPy to learn more about how they work.

## Examples​

The examples below demonstrate using SciPy in a traditional Python setting, then extend them to use Deephaven. SciPy is a natural pair for `deephaven.learn, so some of the corresponding Python queries with Deephaven use it. For more information on the `learn` submodule, check out How to use deephaven.learn.

### Describe data​

In this example, we'll use `scipy.stats.describe` to give basic insight into some data. The data we'll be describing is an `ndarray` with integer values from 0 to 99.

``from scipy.stats import describeimport numpy as npsource = np.arange(100)result = describe(source)print(result)``

To get these insights on data in a Deephaven table, we'll convert it to a NumPy array by using `pandas.to_pandas` and `values`. We can then use the method just as before.

``from deephaven import empty_table, pandasfrom scipy.stats import describeimport pandas as pdimport numpy as npsource = empty_table(100).update(formulas=["X = i"])result = describe(np.squeeze(pandas.to_pandas(source).values))print(result)`` ### Compute the Airy function​

In this example, we'll use `scipy.special.airy` to compute the Airy function of some values. In this case, we'll calculate the Airy function on values ranging from -20 to 0 in increments of 0.2.

``from scipy.special import airyimport numpy as npx = np.linspace(-20, 0, 101)ai, aip, bi, bip = airy(x)``

Let's extend this code to use Deephaven tables!

``from deephaven import empty_tablefrom scipy.special import airyimport numpy as npsource = empty_table(101).update(formulas=["x = -20 + 20 * i / 100"])def compute_airy(x):    ai, aip, bi, bip = airy(x)    return ai, aip, bi, bipresult = source.update(formulas=["y = compute_airy(x)"])`` ### Filter a noisy signal​

In this example, we'll generate a Sine wave and add noise. Then, we'll apply filters to reduce the noise.

``from scipy.signal import medfilt, savgol_filterimport numpy as npx = np.linspace(0, 2 * np.pi, 101)noisy_sine_wave = np.sin(x) + np.random.normal(0, 0.25, 101)median_filtered = medfilt(noisy_sine_wave)savgol_filtered = savgol_filter(noisy_sine_wave, 27, 3)``

The above example can be extended to use `deephaven.learn` in a couple of ways.

``from deephaven import empty_tablefrom deephaven.learn import gatherfrom deephaven import learnfrom deephaven import Plotfrom scipy.signal import medfilt, savgol_filterimport numpy as npdef create_noisy_sine_wave(x):    return np.sin(x) + np.random.normal(0, 0.25)source = empty_table(101).update(formulas=[    "X = i * (2 * Math.PI) / 100",    "Noisy_Sine_Wave = (double)create_noisy_sine_wave(X)"])def apply_median_filter(signal):    return medfilt(signal)def apply_savgol_filter(signal):    return savgol_filter(signal, 27, 3)def table_to_numpy(rows, cols):    return np.squeeze(gather.table_to_numpy_2d(rows, cols, dtype=np.double))def numpy_to_table(data, idx):    return data[idx]median_filtered = learn.learn(    table=source,    model_func=apply_median_filter,    inputs=[learn.Input("Noisy_Sine_Wave", table_to_numpy)],    outputs=[learn.Output("Median_Filtered", numpy_to_table, "double")],    batch_size=source.size())savitzky_golay_filtered = learn.learn(    table=source,    model_func=apply_savgol_filter,    inputs=[learn.Input("Noisy_Sine_Wave", table_to_numpy)],    outputs=[learn.Output("Savitzky_Golay_Filtered", numpy_to_table, "double")],    batch_size=source.size())filtered_signals = Plot.\plot("Noisy Signal", source, "X", "Noisy_Sine_Wave").\plot("Median Filtered", median_filtered, "X", "Median_Filtered").\plot("Savitzky-Golay Filtered", savitzky_golay_filtered, "X", "Savitzky_Golay_Filtered").\show()`` ### Compute the nearest neighbor of every point in a set​

In this example, we'll use a K-d tree to compute the nearest neighbor of every point in a set. SciPy's implementation is called `scipy.spatial.KDTree`. We construct a two-dimensional data set with a few points, and we'll find the index of each point's nearest neighbor in the set.

``from scipy.spatial import KDTree as kdtreeimport numpy as npx = np.random.randint(-10, 10, 10)y = np.random.randint(-10, 10, 10)data = np.vstack((x, y)).Tprint(data)tree = kdtree(data)distances, indices = tree.query(data, k=2)nearest_neighbor_indices = indices[:, 1]print(nearest_neighbor_indices)``

Computing the nearest-neighbor distances and indices can also be done using `deephaven.learn`.

``from deephaven.pandas import to_tablefrom deephaven.learn import gatherfrom deephaven import learnfrom scipy.spatial import KDTree as kdtreeimport pandas as pdimport numpy as npindices = np.arange(10)x = np.random.randint(-10, 10, 10)y = np.random.randint(-10, 10, 10)source = to_table(pd.DataFrame({"Index": indices, "X": x, "Y": y}))def calculate_nearest_neighbor_indices(data):    tree = kdtree(data)    distances, indices = tree.query(data, k=2)    return indices[:, 1]def table_to_numpy(rows, cols):    return gather.table_to_numpy_2d(rows, cols, np_type=np.int_)def numpy_to_table(data, idx):    return data[idx]result = learn.learn(    table=source,    model_func=calculate_nearest_neighbor_indices,    inputs=[learn.Input(["X", "Y"], table_to_numpy)],    outputs=[learn.Output("Nearest_Neighbor_Index", numpy_to_table, "int")],    batch_size=source.size())`` ### Compute distances of various metrics between two vectors​

In this example, we create two three-dimensional vectors. We then compute the distance between the two vectors using metrics supported by `scipy.spatial.distance`.

``from scipy.spatial import distanceimport numpy as npx = np.array([1, 3, -1])y = np.array([-2, 0, 4])def print_distances(x, y):    print("x: " + str(x))    print("y: " + str(y))    print("Bray-Curtis distance between x and y: " + str(distance.braycurtis(x, y)))    print("Chebyshev distance between x and y: " + str(distance.chebyshev(x, y)))    print("Cosine distance between x and y: " + str(distance.cosine(x, y)))    print("Euclidean distance between x and y: " + str(distance.euclidean(x, y)))    print("City block distance between x and y: " + str(distance.cityblock(x, y)))print_distances(x, y)``

We can once again use `deephaven.learn` to calculate these distances.

``from deephaven import new_table, learnfrom deephaven.column import int_colfrom deephaven.learn import gatherfrom scipy.spatial import distanceimport numpy as npsource = new_table([    int_col("X", [1, 3, -1]),    int_col("Y", [-2, 0, 4])])def table_to_numpy(rows, cols):    return np.squeeze(gather.table_to_numpy_2d(rows, cols, np_type=np.intc))def numpy_to_table(data, idx):    return data[idx]def print_distances(data):    x = data[:,0]    y = data[:,1]    print("x: " + str(x))    print("y: " + str(y))    print("Bray-Curtis distance between x and y: " + str(distance.braycurtis(x, y)))    print("Chebyshev distance between x and y: " + str(distance.chebyshev(x, y)))    print("Cosine distance between x and y: " + str(distance.cosine(x, y)))    print("Euclidean distance between x and y: " + str(distance.euclidean(x, y)))    print("City block distance between x and y: " + str(distance.cityblock(x, y)))learn.learn(    table=source,    model_func=print_distances,    inputs=[learn.Input(["X", "Y"], table_to_numpy)],    outputs=None,    batch_size=source.size())`` ## Appendix: Sub-packages of SciPy​

Here is a comprehensive list of the sub-packages available in SciPy, and a brief description of what each one does:

Sub-package nameDescription
`cluster`Clustering routines
`constants`A collection of constants and conversion utilities
`fft`Fourier transform methods
`fftpack`Deprecated legacy Fourier transform methods
`ingegrate`Numerical integration algorithms
`interpolate`Tools for data interpolation
`io`Data I/O
`linalg`Linear algebra routines
`misc`Extra miscellaneous utilities
`ndimage`Image processing library
`odr`Orthogonal distance regression library
`optimize`Algorithms for optimizing code
`signal`Signal processing routines
`sparse`Tools for sparse matrix creation, handling, and procesing
`spatial`Spatial structure processing algorithms
`special`Special functions
`stats`Statistical function library
`weave`Deprecated legacy C/C++ code writing tool

Each is described below. We skip `fftpack` and `weave` because they are deprecated.## Appendix

### scipy.cluster​

`scipy.cluster` is split into two further sub-modules:

• `scipy.cluster.vq` supports vector quantization and K-means algorithms.
• `scipy.cluster.heirarchy` provides hierarchical and agglomerative clustering.

### scipy.constants​

`scipy.constants` provides an extensive number of mathematical and physical constants including (but not limited to) pi, the golden ratio, the Planck constant, subatomic particle masses, SI prefixes, binary prefixes, and many others. The full list of constants is far, far too long to include here.

### scipy.fft​

`scipy.fft` contains methods that apply the forward and reverse discrete Fourier, Sine, Cosine, and Hankel transformations to input data in N dimensions.

### scipy.integrate​

`scipy.integrate` contains a slew of methods to calculate integrals and solutions for differential equations.

### scipy.interpolate​

`scipy.interpolate` is a sub-package for objects used in data interpolation. It provides univariate and multivariate interpolation techniques, and spline interpolators.

### scipy.io​

`scipy.io` provides utilities to read and write data to and from various file formats including MATLAB, Fortran, WAV, and others.

### scipy.linalg​

`scipy.linalg` is a library of linear algebra functions. You can use this sub-module to solve eigenvalue problems, perform matrix decompositions, solve matrix equations, and construct special matrices among other things. Matrices constructed using this sub-package are of type `numpy.ndarray`.

This sub-module contains two further sub-packages for low-level routines: `scipy.linalg.blas` and `scipy.linalg.lapack`. These two sub-packages provide low-level functions from the Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) libraries. Cython implementations of these low-level libraries exist as well.

Additionally, it provides routines for interpolative matrix decomposition via `scipy.linalg.interpolative`.

### scipy.misc​

The miscellaneous routines sub-module, `scipy.misc`, is the home of some example data sets and two special derivative calculators.

### scipy.ndimage​

`scipy.ndimage` contains a library of methods to perform image processing. Methods include filters, interpolation routines, measurements, and morphology routines.

### scipy.odr​

`scipy.odr` contains routines to perform orthogonal distance regression (ODR) on input data. ODR extends least squares fitting, which generally results in more accuracy in fitted curves.

### scipy.optimize​

The `scipy.optimize` sub-module provides univariate and multivariate optimization routines. These optimization routines will minimize or maximize objective functions, and can handle constraints on said functions. Solvers for both linear and nonlinear problems is supported.

There are underlying C functions for four different root finders that can be accessed via Cython.

### scipy.signal​

`scipy.signal` is SciPy's signal processing sub-module. It supports convolutions, B-splines, filters, filter designing routines, continuous time linear systems, discrete time linear systems, waveform generators, window functions, wavelet generators, maxima/minima finers, and spectral analyzers.

### scipy.sparse​

`scipy.sparse` contains functions for building and identifying sparse matrices in various formats. Additionally, it contains sparse matrix classes for each individual sparse matrix format.

This sub-module contains two sub-packages: `scipy.sparse.csgraph` and `scipy.sparse.linalg`. The first contains a library of compressed sparse graph routines, and the second contains a library of sparse linear algebra routines.

### scipy.spatial​

`scipy.spatial` contains spatial algorithms and data structures. Features include nearest-neighbor query routines, triangulation, convex hulls, Voronoi diagrams, and plotting helpers.

One sub-package of `scipy.spatial` is `scipy.spatial.distance`, which provides utilities for computing distance matrices, cheing their validity, and distance functions in numeric and boolean vectors.

There is another sub-pacakage within `scipy.spatial` called `scipy.spatial.transform`, which provides functions for spatial transformations including rotations, spherical linear interpolations of rotations, and rotation splines.

### scipy.special​

SciPy special functions are defined in `scipy.special`, which are extended to Cython via `scipy.special.cython_special`. Special functions include (but are not limited to) Airy, Bessel, Struve, Legendre, and statistical distributions.

### scipy.stats​

The sub-module, `scipy.stats`, contains probability distributions, summary statistics, frequency statistics, and functions for calculating correlations between distributions among other things. Methods exist for continuous, discrete, and multivariate distributions. There are are also some summary methods that can be used to describe data with little prior knowledge.

There are two sub-packages within `scipy.stats`: `scipy.stats.mstats` and `scipy.stats.qmc`. The former is used for calculating statistics on masked arrays, and the latter is used for describing quasi-Monte Carlo capabilities.