Group and ungroup data

This guide will show you how to group and ungroup table data in Deephaven.

This guide uses a table of apple data called apples created using newTable. Many of the grouping and ungrouping examples use this table. If you are unfamiliar with creating tables from scratch using newTable, please see our guide Create a new table.

Use the code below to create the apples table:

apples = newTable(
    stringCol("Type", "Granny Smith", "Granny Smith", "Gala", "Gala", "Golden Delicious", "Golden Delicious"),
    stringCol("Color", "Green", "Green", "Red-Green", "Orange-Green", "Yellow", "Yellow"),
    intCol("WeightGrams", 102, 85, 79, 92, 78, 99),
    intCol("Calories", 53, 48, 51, 61, 46, 57)
)

groupBy

The groupBy method groups columnar data into arrays. A list of grouping column names defines grouping keys. All rows from the input table with the same key values are grouped together. The values in the arrays for each group in the output table maintain their order from the input table.

If no input is supplied to groupBy, then there will be one group, which contains all of the data. The resultant table will contain a single row, where column data is grouped into a single array. This is shown in the example below:

applesByNoColumn = apples.groupBy()

If a single input is supplied to groupBy, then the resultant table will have row data grouped into arrays based on each unique value in the input column. This is shown in the example below:

applesByType = apples.groupBy("Type")

If more than one input is supplied to groupBy, then the resultant table will have row data grouped into arrays based on unique value pairs from the grouping columns. This is shown in the example below:

applesByTypeAndColor = apples.groupBy("Type", "Color")

If you want to group data by conditions on columns, you can do so by using the appropriate table update method (in this case, it's updateView) to update the table before grouping:

applesByClassAndDiet = apples.updateView(
    "Class = (WeightGrams < 90) ? `Light` : `Heavy`",
    "Diet = (Calories < 50) ? `Allowed` : `Not Allowed`")
    .groupBy("Class", "Diet")

AggGroup

The AggGroup method returns an aggregator that computes an array of all values within an aggregation group, for each column. Like the other aggregation methods, it is used in conjunction with the aggBy method.

Note

Unlike groupBy, AggGroup throws an error if you don't supply any column names.

In this example, we will group Color, WeightGrams, and Calories by Type:

applesByType = apples.aggBy(AggGroup("WeightGrams", "Calories", "Color"), "Type")

If the by parameter is not supplied, the AggGroup method will group all the values from each column:

applesByNoColumn2 = apples.aggBy(AggGroup("Type", "Color", "WeightGrams", "Calories"))

Ungroup data with ungroup

The ungroup method is the opposite of groupBy. It expands content from arrays or vectors into columns of singular values and builds a new set of rows from it. The method takes optional columns as input. If no inputs are supplied, all array or vector columns are expanded. If one or more columns are given as input, only those columns will have their array values expanded into new rows.

The example below shows how ungroup reverses the groupBy operation used to create applesByClassAndDiet when no columns are given as input. Notice how all array columns have been expanded, leaving a single element in each row of the resultant table:

newApples = applesByClassAndDiet.ungroup()

The example below uses ungroup to expand the Color column in applesByClassAndDiet. This expands only arrays in the Color column, and not the others. Notice how the Type, WeightGrams, and Calories columns still contain arrays:

applesUngroupedByColor = applesByClassAndDiet.ungroup("Color")

Different array types

The ungroup method can ungroup DbArrays and Java arrays.

The example below uses the emptyTable method to create a table with two columns and one row. Each column contains a Java array with 3 elements. The ungroup method works as expected on Java arrays.

t = emptyTable(1).update("X = new int[]{1, 2, 3}", "Z = new int[]{4, 5, 6}")
t_ungrouped = t.ungroup()

The example below uses ungroup to unwrap both an Array (column X) and a Java array (column Z) at the same time.

t = newTable(
    intCol("X", 1, 2, 3)
).groupBy().update("Z = new int[]{4, 5, 6}")

t_ungrouped = t.ungroup()

Handling different array lengths

The ungroup method cannot unpack a row that contains arrays of different length.

To demonstrate this, we'll start by creating a table with two columns and one row.

t = emptyTable(1).update("X = new int[]{1, 2, 3}", "Z = new int[]{4, 5}")

Each column in the above table contains a Java array, but one has three elements and the other has two. Since the arrays are not the same size, calling ungroup without an input column will result in an error.

t_ungrouped = t.ungroup()  // This results in an error

A collapsed error message highlighted in the Deephaven IDE

It is only possible to ungroup columns of the same length. Arrays of different lengths must be ungrouped separately.

t = emptyTable(1).update("X = new int[]{1, 2, 3}", "Z = new int[]{4, 5}")
t_ungroupedByX = t.ungroup("X")
t_ungroupedByZ = t.ungroup("Z")

Null values

Using groupBy on a table with null values will work properly. Null values will appear as empty array elements when grouped with groupBy. Null array elements expanded using ungroup will appear as null (empty) row entries in the corresponding column.

The example below uses the emptyTable method and the ternary operator to create a table with two columns of 5 rows. The first and second rows contain null values. Null values behave as expected during grouping and ungrouping.

t = emptyTable(5).update("X = i", "Z = i < 2 ? NULL_INT : i-2")
t_by = t.groupBy()
new_t = t_by.ungroup()

The example below uses the emptyTable method to create a table with one column and one row. The single cell in the table contains a null Java array. Calling ungroup on this table results in an empty table with one column.

t = emptyTable(1).update("X = (int[])(null)")
t_ungrouped = t.ungroup()

Use of grouping in table operations

Many Deephaven table operations use grouping internally. For example, aggBy creates groups specified by the key column(s) given in the by parameter. The grouping is done automatically, and the resultant table shows summary statistics calculated for each group.

Table operations that require grouping do the grouping internally. It is always more performant to use these table operations than to group data first and then apply some calculations over the groups.