std_by
std_by
returns the sample standard deviation for each group. Null values are ignored.
Sample standard deviation is calculated as the square root of the Bessel-corrected sample variance, which can be shown to be an unbiased estimator of population variance under some conditions. However, sample standard deviation is a biased estimator of population standard deviation.
Applying this aggregation to a column where the sample standard deviation can not be computed will result in an error. For example, the sample standard deviation is not defined for a column of string values.
Syntax
table.std_by(by: Union[str, list[str]]) -> Table
Parameters
Parameter | Type | Description |
---|---|---|
by optional | Union[str, list[str]] | The column(s) by which to group data.
|
Returns
A new table containing the sample standard deviation for each group.
How to calculate sample standard deviation
Sample standard deviation is a measure of the average dispersion of data values from the mean. Unlike sample variance, it is on the same scale as the data, meaning that sample standard deviation can be readily interpreted in the same units as the data. The formula for sample standard deviation is as follows:
Examples
In this example, std_by
returns the sample standard deviation of the whole table. Because the sample standard deviation can not be computed for the string columns X
and Y
, these columns are dropped before applying std_by
.
from deephaven import new_table
from deephaven.column import string_col, int_col
source = new_table(
[
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
]
)
result = source.drop_columns(cols=["X", "Y"]).std_by()
- source
- result
In this example, std_by
returns the sample standard deviation, as grouped by X
. Because the sample standard deviation can not be computed for the string column Y
, this column is dropped before applying std_by
.
from deephaven import new_table
from deephaven.column import string_col, int_col
source = new_table(
[
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
]
)
result = source.drop_columns(cols=["Y"]).std_by(by=["X"])
- source
- result
In this example, std_by
returns the sample standard deviation, as grouped by X
and Y
.
from deephaven import new_table
from deephaven.column import string_col, int_col
source = new_table(
[
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
]
)
result = source.std_by(by=["X", "Y"])
- source
- result