Skip to main content
Version: Python

var_by

var_by returns the variance for each group. Null values are ignored.

caution

Applying this aggregation to a column where the variance can not be computed will result in an error. For example, the variance is not defined for a column of string values.

Syntax

table.var_by(by: List[str]=[])

Parameters

ParameterTypeDescription
by optionalList[str]

The column(s) by which to group data.

  • [] returns the variance for all non-key columns (default).
  • "X" will output the variance of each group in column X.
  • "X", "Y" will output the variance of each group designated from the X and Y columns.

Returns

A new table containing the variance for each group.

How to calculate variance

  1. Find the mean of the data set. Add all data values and divide by the sample size nn.
xˉ=i=1nxin\bar{x} = \frac{\sum_{i=1}^{n}{x_i}}{n}
  1. Find the squared difference from the mean for each data value. Subtract the mean from each data value and square the result.
(xixˉ)2(x_i - \bar{x})^2
  1. Find the sum of all the squared differences. The sum of squares is all the squared differences added together.
SS=i=1n(xixˉ)2SS = \sum_{i=1}^{n}{(x_i - \bar{x})^2}
  1. Calculate the variance. Variance is the sum of squares divided by the number of data points. The formula for variance for a sample set of data is:
s2=Σ(xixˉ)2n1s^2 = \frac{\Sigma (x_i - \bar{x})^2 }{n-1}

Examples

In this example, var_by returns the variance of the whole table. Because the variance can not be computed for the string columns X and Y, these columns are dropped before applying var_by.

from deephaven import new_table
from deephaven.column import string_col, int_col

source = new_table([
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
])

result = source.drop_columns(cols=["X", "Y"]).var_by()

In this example, var_by returns the variance, as grouped by X. Because the variance can not be computed for the string column Y, this column is dropped before applying var_by.

from deephaven import new_table
from deephaven.column import string_col, int_col

source = new_table([
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
])

result = source.drop_columns(cols=["Y"]).var_by(by=["X"])

In this example, var_by returns the variance, as grouped by X and Y.

from deephaven import new_table
from deephaven.column import string_col, int_col

source = new_table([
string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
])

result = source.var_by(by=["X", "Y"])