Skip to contents

Table aggregations are a quintessential feature of Deephaven. You can apply as many aggregations as needed to static tables or streaming tables, and if the parent tables are streaming, the resulting aggregated tables will update alongside their parent tables. It is also very easy to perform grouped aggregations, which allow you to aggregate tables on a per-group basis.

Applying aggregations to a table

There are two methods for performing aggregations on a table, agg_by() and agg_all_by(). agg_by() allows you to perform many aggregations on specified columns, while agg_all_by() allows you to perform a single aggregation to every non-grouping column in the table. Both methods have an optional by parameter that is used to specify grouping columns. Here are some details on each method:

  • TableHandle$agg_by(aggs, by): Creates a new table containing grouping columns and grouped data. The resulting grouped data is defined by the aggregation(s) specified.
  • TableHandle$agg_all_by(agg, by): Creates a new table containing grouping columns and grouped data. The resulting grouped data is defined by the aggregation specified. This method applies the aggregation to all non-grouping columns of the table, so it can only accept one aggregation at a time.

The agg_by() and agg_all_by() methods themselves do not know anything about the columns on which you want to perform aggregations. Rather, the desired columns are passed to individual agg functions, enabling you to apply various kinds of aggregations to different columns or groups of columns as needed.

agg functions

agg functions are used to perform aggregation calculations on grouped data by passing them to agg_by() or agg_all_by(). These functions are generators, meaning they return functions that the Deephaven engine knows how to interpret. We call the functions that they return AggOps. See ?AggOp for more information. These AggOps are not R-level functions, but Deephaven-specific data types that perform all of the intensive calculations. Here is a list of all agg functions available in Deephaven:

For more details on each aggregation function, see the reference documentation by running ?agg_first, ?agg_last, etc.

An Example

library(rdeephaven)

# connecting to Deephaven server
client <- Client$new("localhost:10000", auth_type = "psk", auth_token = "my_secret_token")

# create data frame, push to server, retrieve TableHandle
df <- data.frame(
  X = c("A", "B", "A", "C", "B", "A", "B", "B", "C"),
  Y = c("M", "N", "O", "N", "P", "M", "O", "P", "M"),
  Number1 = c(100, -44, 49, 11, -66, 50, 29, 18, -70),
  Number2 = c(-55, 76, 20, 130, 230, -50, 73, 137, 214)
)
th <- client$import_table(df)

# get first and last elements of each column
th1 <- th$
  agg_by(
  agg_first(c("XFirst = X", "YFirst = Y", "Number1First = Number1", "Number2First = Number2")),
  agg_last(c("XLast = X", "YLast = Y", "Number1Last = Number1", "Number2Last = Number2"))
)

# compute mean and standard deviation of Number1 and Number2, grouped by X
th2 <- th$
  agg_by(
  c(
    agg_avg(c("Number1Avg = Number1", "Number2Avg = Number2")),
    agg_std(c("Number1Std = Number1", "Number2Std = Number2"))
  ),
  by = "X"
)

# compute maximum of all non-grouping columns, grouped by X and Y
th3 <- th$
  agg_all_by(agg_max(), by = c("X", "Y"))

# compute minimum and maximum of Number1 and Number2 respectively grouped by Y
th4 <- th$
  agg_by(
  c(
    agg_min("Number1Min = Number1"),
    agg_max("Number2Max = Number2")
  ),
  by = "Y"
)

client$close()