# How to perform dedicated aggregations for groups

This guide will show you how to compute summary information on groups of data using dedicated data aggregations.

Often when working with data, you will want to break the data into subgroups and then perform calculations on the grouped data. For example, a large multi-national corporation may want to know their average employee salary by country, or a teacher might want to calculate grade information for groups of students or in certain subject areas.

The process of breaking a table into subgroups and then performing a single type of calculation on the subgroups is known as "dedicated aggregation." The term comes from most operations creating a summary of data within a group (aggregation) and from a single type of operation being computed at once (dedicated).

Deephaven provides many dedicated aggregations, such as `max_by`

and `min_by`

. These aggregations are good options if only one type of aggregation is needed. If more than one type of aggregation is needed or if you have a custom aggregation, combined aggregations are a more efficient and more flexible solution.

## Syntax

The general syntax follows:

`result = source.DEDICATED_AGG(by=["GroupingColumns"])`

The `by = ["GroupingColumns"]`

parameter determines the column(s) by which to group data.

`DEDICATED_AGG`

should be substituted with one of the chosen aggregations below`[]`

uses the whole table as a single group.`["X"]`

will output the desired value for each group in column`X`

.`["X", "Y"]`

will output the desired value for each group designated from the`X`

and`Y`

columns.

## Single aggregators

Each dedicated aggregator performs one calculation at a time:

`abs_sum_by`

- Sum of absolute values of each group.`avg_by`

- Average (mean) of each group.`count_by`

- Number of rows in each group.`first_by`

- First row of each group.`group_by`

- Group column content into vectors.`head_by`

- First`n`

rows of each group.`last_by`

- Last row of each group.`max_by`

- Maximum value of each group.`median_by`

- Median of each group.`min_by`

- Minimum value of each group.`std_by`

- Sample standard deviation of each group.`sum_by`

- Sum of each group.`tail_by`

- Last`n`

rows of each group.`var_by`

- Sample variance of each group.`weighted_avg_by`

- Weighted average of each group.`weighted_sum_by`

- Weighted sum of each group.

In the following examples, we have test results in various subjects for some students. We want to summarize this information to see if students perform better in one class or another.

`from deephaven import new_table`

from deephaven.column import string_col, int_col, double_col

source = new_table([

string_col("Name", ["James", "James", "James", "Lauren", "Lauren", "Lauren", "Zoey", "Zoey", "Zoey"]),

string_col("Subject", ["Math", "Science", "Art", "Math", "Science", "Art", "Math", "Science", "Art"]),

int_col("Number", [95, 100, 90, 72, 78, 92, 100, 98, 96]),

])

- source

`first_by`

and `last_by`

In this example, we want to know the first and the last test results for each student. To achieve this, we can use `first_by`

to return the first test value and `last_by`

to return the last test value. The results are grouped by `Name`

.

`first = source.first_by(by=["Name"])`

last = source.last_by(by=["Name"])

- first
- last

`head_by`

and `tail_by`

In this example, we want to know the first two and the last two test results for each student. To achieve this, we can use `head_by`

to return the first two test values and `tail_by`

to return the last two test values (`num_rows=2`

). The results are grouped by `Name`

.

`head = source.head_by(2, by=["Name"])`

tail = source.tail_by(2, by=["Name"])

- head
- tail

`count_by`

In this example, we want to know the number of tests each student completed. `count_by`

returns the number of rows in the table as grouped by `Name`

and stores that in a new column, `NumTests`

.

`count = source.count_by("NumTests", by=["Name"])`

- count

## Summary statistics aggregators

In the following examples, we start with the same source table containing students' test results as used above.

Applying these aggregations to a column where the average cannot be computed will result in an error. For example, the average is not defined for a column of string values. For more information on removing columns from a table, see `drop_columns`

. The syntax for using `drop_columns`

is `result = source.drop_columns(cols=["Col1", "Col2"]).sum_by(by=["Col3", "Col4"])`

.

`sum_by`

In this example, `sum_by`

calculates the total sum of test scores for each `Name`

. Because a sum cannot be computed for the string column `Subject`

, this column is dropped before applying `sum_by`

.

`sum_table = source.drop_columns(cols=["Subject"]).sum_by(by=["Name"])`

- sum_table

`avg_by`

In this example, `avg_by`

calculates the average (mean) of test scores for each `Name`

. Because an average cannot be computed for the string column `Subject`

, this column is dropped before applying `avg_by`

.

`mean = source.drop_columns(cols=["Subject"]).avg_by(by=["Name"])`

- mean

`std_by`

In this example, `std_by`

calculates the sample standard deviation of test scores for each `Name`

. Because a sample standard deviation cannot be computed for the string column `Subject`

, this column is dropped before applying `std_by`

.

`std_dev = source.drop_columns(cols=["Subject"]).std_by(by=["Name"])`

- std_dev

`var_by`

In this example, `var_by`

calculates the sample variance of test scores for each `Name`

. Because sample variance cannot be computed for the string column `Subject`

, this column is dropped before applying `var_by`

.

`var = source.drop_columns(cols=["Subject"]).var_by(by=["Name"])`

- var

`median_by`

In this example, `median_by`

calculates the median of test scores for each `Name`

. Because a median cannot be computed for the string column `Subject`

, this column is dropped before applying `median_by`

.

`median = source.drop_columns(cols=["Subject"]).median_by(by=["Name"])`

- median

`min_by`

In this example, `min_by`

calculates the minimum of test scores for each `Name`

. Because a minimum cannot be computed for the string column `Subject`

, this column is dropped before applying `min_by`

.

`minimum = source.drop_columns(cols=["Subject"]).min_by(by=["Name"])`

- minimum

`max_by`

In this example, `max_by`

calculates the maximum of test scores for each `Name`

. Because a maximum cannot be computed for the string column `Subject`

, this column is dropped before applying `max_by`

.

`maximum = source.drop_columns(cols=["Subject"]).max_by(by=["Name"])`

- maximum

## Aggregate columns in the UI

Single aggregations can be performed in the UI using the **Aggregate Columns** feature. However, rather than adding the aggregation in its own column as shown in the programmatic examples above, the aggregation is added as a row to the top or bottom of the table. Column data is aggregated using the operation of your choice. This is useful for quickly calculating the sum, average, or other aggregate of a column.

In the example below, we add an Average row and a Minimum row to the top of the table:

These aggregations can be re-ordered, edited, or deleted from the **Aggregate Columns** dialog.