Skip to main content
Version: Python

std_by

std_by returns the standard deviation for each group. Null values are ignored.

caution

Applying this aggregation to a column where the standard deviation can not be computed will result in an error. For example, the standard deviation is not defined for a column of string values.

Syntax

table.std_by(by: List[str]=[])

Parameters

ParameterTypeDescription
by optionalList[str]

The column(s) by which to group data.

  • [] returns the standard deviation for all non-key columns (default).
  • ["X"] will output the standard deviation of each group in column X.
  • ["X", "Y"] will output the standard deviation of each group designated from the X and Y columns.

Returns

A new table containing the standard deviation for each group.

How to calculate standard deviation

Standard deviation is a measure of the dispersion of data values from the mean. The formula for standard deviation is the square root of the sum of squared differences from the mean divided by the size of the data set. For example:

s=i=1n(xixˉ)2n1s = \sqrt{\frac{\sum_{i=1}^{n}{(x_i - \bar{x})^2}}{n-1}}

Examples

In this example, std_by returns the standard deviation of the whole table. Because the standard deviation can not be computed for the string columns X and Y, these columns are dropped before applying std_by.

from deephaven import new_table
from deephaven.column import string_col, int_col

source = new_table([
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
])

result = source.drop_columns(cols=["X", "Y"]).std_by()

In this example, std_by returns the standard deviation, as grouped by X. Because the standard deviation can not be computed for the string column Y, this column is dropped before applying std_by.

from deephaven import new_table
from deephaven.column import string_col, int_col

source = new_table([
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
])

result = source.drop_columns(cols=["Y"]).std_by(by=["X"])

In this example, std_by returns the standard deviation, as grouped by X and Y.

from deephaven import new_table
from deephaven.column import string_col, int_col

source = new_table([
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
])

result = source.std_by(by=["X", "Y"])