var_by
var_by
returns the sample variance for each group. Null values are ignored.
Sample variance is calculated using the Bessel correction, which ensures it is an unbiased estimator of population variance under some conditions.
Applying this aggregation to a column where the sample variance can not be computed will result in an error. For example, the sample variance is not defined for a column of string values.
Syntax
table.var_by(by: Union[str, list[str]]) -> Table
Parameters
Parameter | Type | Description |
---|---|---|
by optional | Union[str, list[str]] | The column(s) by which to group data.
|
Returns
A new table containing the sample variance for each group.
How to calculate sample variance
Sample variance is a measure of the average dispersion of data values from the mean. Unlike sample standard deviation, it is not on the same scale as the data, meaning that sample variance cannot be readily interpreted in the same units as the data. The formula for sample variance is as follows:
Examples
In this example, var_by
returns the sample variance of the whole table. Because the sample variance can not be computed for the string columns X
and Y
, these columns are dropped before applying var_by
.
from deephaven import new_table
from deephaven.column import string_col, int_col
source = new_table(
[
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
]
)
result = source.drop_columns(cols=["X", "Y"]).var_by()
- source
- result
In this example, var_by
returns the sample variance, as grouped by X
. Because the sample variance can not be computed for the string column Y
, this column is dropped before applying var_by
.
from deephaven import new_table
from deephaven.column import string_col, int_col
source = new_table(
[
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
]
)
result = source.drop_columns(cols=["Y"]).var_by(by=["X"])
- source
- result
In this example, var_by
returns the sample variance, as grouped by X
and Y
.
from deephaven import new_table
from deephaven.column import string_col, int_col
source = new_table(
[
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
]
)
result = source.var_by(by=["X", "Y"])
- source
- result