pydeephaven.agg

This module defines the Aggregation class and provides factory functions to create specific Aggregation instances.

class Aggregation[source]

Bases: ABC

An Aggregation object represents an aggregation operation.

Note: It should not be instantiated directly by user code but rather through the factory functions in the module.

abs_sum(cols=None)[source]

Creates an Absolute-sum aggregation.

Parameters:

cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

avg(cols=None)[source]

Creates an Average aggregation.

Parameters:

cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

count_(col)[source]

Creates a Count aggregation. This is not supported in ‘Table.agg_all_by’.

Parameters:

col (str) – the column to hold the counts of each distinct group

Return type:

Aggregation

Returns:

an aggregation

count_distinct(cols=None, count_nulls=False)[source]

Creates a Count Distinct aggregation which computes the count of distinct values within an aggregation group for each of the given columns.

Parameters:
  • cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

  • count_nulls (bool) – whether null values should be counted, default is False

Return type:

Aggregation

Returns:

an aggregation

distinct(cols=None, include_nulls=False)[source]

Creates a Distinct aggregation which computes the distinct values within an aggregation group for each of the given columns and stores them as vectors.

Parameters:
  • cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

  • include_nulls (bool) – whether nulls should be included as distinct values, default is False

Return type:

Aggregation

Returns:

an aggregation

first(cols=None)[source]

Creates a First aggregation.

Parameters:

cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

formula(formula, formula_param=None, cols=None)[source]
Creates a user defined formula aggregation. This formula can contain a combination of any of the following:
Built-in functions such as min, max, etc.
Mathematical arithmetic such as *, +, /, etc.
User-defined functions

There are two variants of this call. The preferred variant requires the formula to provide the output column name and specific input column names in the following format:

formula(‘output_col=(input_col1 + input_col2) * input_col3’)

This form does not accept formula_param or cols arguments because the input and output columns are explicitly set within the formula string.

The second (deprecated) variant allows the user to apply a formula expression to one input column, producing one output column. In this call the formula_param is used as a placeholder for the input column name and the cols argument is used to identify the output column name and the input source column when applying the formula. If multiple input/output pairs are specified in the cols argument, the formula will be applied to each column in the list.

Parameters:
  • formula (str) – the user defined formula to apply

  • formula_param (Optional[str]) – If provided, supplies the parameter name for the input column’s vector within the formula. If formula is max(each), then each should be the formula_param. This must be set to None (the default when omitted) when the `formula`argument specifies the input and output columns.

  • cols (Optional[Union[str, List[str]]]) – If provided, supplies the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”. This must be set to None (the default when omitted) when the formula argument specifies the input and output columns.

Return type:

Aggregation

Returns:

an aggregation

group(cols=None)[source]

Creates a Group aggregation.

Parameters:

cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

last(cols=None)[source]

Creates a Last aggregation.

Parameters:

cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

max_(cols=None)[source]

Creates a Max aggregation.

Parameters:

cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

median(cols=None, average_evenly_divided=True)[source]

Creates a Median aggregation which computes the median value within an aggregation group for each of the given columns.

Parameters:
  • cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

  • average_evenly_divided (bool) – when the group size is an even number, whether to average the two middle values for the output value. When set to True, average the two middle values. When set to False, use the smaller value. The default is True. This flag is only valid for numeric types.

Return type:

Aggregation

Returns:

an aggregation

min_(cols=None)[source]

Creates a Min aggregation.

Parameters:

cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

partition(col, include_by_columns=True)[source]

Creates a Partition aggregation. This is not supported in ‘Table.agg_all_by’.

Parameters:
  • col (str) – the column to hold the sub tables

  • include_by_columns (bool) – whether to include the group by columns in the result, default is True

Return type:

Aggregation

Returns:

an aggregation

pct(percentile, cols=None, average_evenly_divided=False)[source]

Creates a Percentile aggregation which computes the percentile value within an aggregation group for each of the given columns.

Parameters:
  • percentile (float) – the percentile used for calculation

  • cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

  • average_evenly_divided (bool) – when the percentile splits the group into two halves, whether to average the two middle values for the output value. When set to True, average the two middle values. When set to False, use the smaller value. The default is False. This flag is only valid for numeric types.

Return type:

Aggregation

Returns:

an aggregation

sorted_first(order_by, cols=None)[source]

Creates a SortedFirst aggregation.

Parameters:
  • order_by (str) – the column to sort by

  • cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

sorted_last(order_by, cols=None)[source]

Creates a SortedLast aggregation.

Parameters:
  • order_by (str) – the column to sort by

  • cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

std(cols=None)[source]

Creates a Std (sample standard deviation) aggregation.

Sample standard deviation is computed using Bessel’s correction, which ensures that the sample variance will be an unbiased estimator of population variance.

Parameters:

cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

sum_(cols=None)[source]

Creates a Sum aggregation.

Parameters:

cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

unique(cols=None, include_nulls=False, non_unique_sentinel=None)[source]

Creates a Unique aggregation which computes the single unique value within an aggregation group for each of the given columns. If all values in a column are null, or if there is more than one distinct value in a column, the result is the specified non_unique_sentinel value (defaults to null).

Parameters:
  • cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

  • include_nulls (bool) – whether null is treated as a value for the purpose of determining if the values in the aggregation group are unique, default is False.

  • non_unique_sentinel (Union[np.number, str, bool]) – the non-null sentinel value when no unique value exists, default is None. Must be a non-None value when include_nulls is True. When passed in as a numpy scalar number value, it must be of one of these types: np.int8, np.int16, np.uint16, np.int32, np.int64(int), np.float32, np.float64(float). Please note that np.uint16 is interpreted as a Deephaven/Java char.

Raises:

TypeError

Return type:

Aggregation

Returns:

an aggregation

var(cols=None)[source]

Creates a sample Variance aggregation.

Sample variance is computed using Bessel’s correction, which ensures that the sample variance will be an unbiased estimator of population variance.

Parameters:

cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

weighted_avg(wcol, cols=None)[source]

Creates a Weighted-average aggregation.

Parameters:
  • wcol (str) – the name of the weight column

  • cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation

weighted_sum(wcol, cols=None)[source]

Creates a Weighted-sum aggregation.

Parameters:
  • wcol (str) – the name of the weight column

  • cols (Union[str, List[str]]) – the column(s) to aggregate, can be renaming expressions, i.e. “new_col = col”; default is None, only valid when used in Table agg_all_by operation

Return type:

Aggregation

Returns:

an aggregation