agg_by

agg_by applies a list of aggregations to table data.

Syntax

agg_by(
    aggs: Union[Aggregation, Sequence[Aggregation]],
    by: Union[str, list[str]] = None,
    preserve_empty: bool = False,
    initial_groups: Table = None,
    ) -> Table

Parameters

Parameter	Type	Description
aggs	Union[Aggregation, Sequence[Aggregation]]	A list of aggregations to compute. The following aggregations are available: `agg.abs_sum_by` `agg.abs_sum` `agg.agg_all_by` `agg.avg` `agg.count_` `agg.count_distinct` `agg.count_where` `agg.first` `agg.formula` `agg.group` `agg.last` `agg.max_` `agg.median` `agg.min_` `agg.pct` `agg.sorted_first` `agg.sorted_last` `agg.std` `agg.sum_` `agg.unique` `agg.var` `agg.weighted_avg` `agg.weighted_sum`
by	Union[str, list[str]]	The names of column(s) by which to group data. Default is `None`.
preserve_empty optional	bool	Whether to keep result rows for groups that are initially empty or become empty as a result of updates. Each aggregation operator defines its own value for empty groups. The default is `False`.
initial_groups optional	Table	A table whose distinct combinations of values for the grouping column(s) should be used to create an initial set of aggregation groups. All other columns are ignored. This is useful in combination with `preserve_empty=True` to ensure that particular groups appear in the result table, or with `preserve_empty=False` to control the encounter order for a collection of groups and thus their relative order in the result. Changes to this table are not expected or handled; if this table is a refreshing table, only its contents at instantiation time will be used. Default is `None`. The result will be the same as if a table is provided, but no rows were supplied. When it is provided, the ‘by’ argument must explicitly specify the grouping columns.

Caution

If an aggregation does not rename the resulting column, the aggregation column will appear in the output table, not the input column. If multiple aggregations on the same column do not rename the resulting columns, an error will result, because the aggregations are trying to create multiple columns with the same name. For example, in table.agg_by([agg.sum_(“X”), agg.avg(“X”)]), both the sum and the average aggregators produce column X, which results in an error.

Returns

Aggregated table data based on the aggregation types specified in the agg_list.

Examples

In this example, agg.first returns the first Y value as grouped by X.

from deephaven import new_table
from deephaven.column import string_col, int_col, double_col
from deephaven import agg as agg

source = new_table(
    [
        string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", "C"]),
        string_col("Y", ["M", "N", "O", "N", "P", "M", "O", "P", "M"]),
        int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
    ]
)

result = source.agg_by([agg.first(cols=["Y"])], by=["X"])

In this example, agg.group returns an array of values from the Number column (Numbers), and agg.max_ returns the maximum value from the Number column (MaxNumber), as grouped by X.

from deephaven import new_table
from deephaven.column import string_col, int_col, double_col
from deephaven import agg as agg

source = new_table(
    [
        string_col("X", ["A", "B", "A", "C", "B", "A", "B", "B", None]),
        string_col("Y", ["M", "N", None, "N", "P", "M", None, "P", "M"]),
        int_col("Number", [55, 76, 20, 130, 230, 50, 73, 137, 214]),
    ]
)

result = source.agg_by(
    [agg.group(cols=["Numbers = Number"]), agg.max_(cols=["MaxNumber = Number"])],
    by=["X"],
)

agg_by

Syntax

Parameters

Returns

Examples

Related documentation