Group and ungroup data
This guide will show you how to group and ungroup table data in Deephaven.
This guide uses a table of apple data called apples
created using new_table
. Many of the grouping and ungrouping examples use this table. If you are unfamiliar with creating tables from scratch using new_table
, please see our guide on creating new tables.
Use the code below to create the apples
table:
from deephaven import new_table
from deephaven.column import string_col, int_col
apples = new_table(
[
string_col(
"Type",
[
"Granny Smith",
"Granny Smith",
"Gala",
"Gala",
"Golden Delicious",
"Golden Delicious",
],
),
string_col(
"Color", ["Green", "Green", "Red-Green", "Orange-Green", "Yellow", "Yellow"]
),
int_col("WeightGrams", [102, 85, 79, 92, 78, 99]),
int_col("Calories", [53, 48, 51, 61, 46, 57]),
]
)
- apples
group_by
The group_by
method groups columnar data into arrays. A list of grouping column names defines grouping keys. All rows from the input table with the same key values are grouped together. The values in the arrays for each group in the output table maintain their order from the input table.
If no input is supplied to group_by
, then there will be one group, which contains all of the data. The resultant table will contain a single row, where column data is grouped into a single array. This is shown in the example below:
apples_by_no_column = apples.group_by()
- apples_by_no_column
If a single input is supplied to group_by
, then the resultant table will have row data grouped into arrays based on each unique value in the input column. This is shown in the example below:
apples_by_type = apples.group_by(by=["Type"])
- apples_by_type
If more than one input is supplied to group_by
, then the resultant table will have row data grouped into arrays based on unique value pairs from the grouping columns. This is shown in the example below:
apples_by_type_and_color = apples.group_by(by=["Type", "Color"])
- apples_by_type_and_color
If you want to group data by conditions on columns, you can do so by using the appropriate table update method (in this case, it's update_view
) to update the table before grouping:
apples_by_class_and_diet = apples.update_view(
formulas=[
"Class = (WeightGrams < 90) ? `Light` : `Heavy`",
"Diet = (Calories < 50) ? `Allowed` : `Not Allowed`",
]
).group_by(["Class", "Diet"])
- apples_by_class_and_diet
agg.group
The agg.group
method returns an aggregator that computes an array of all values within an aggregation group, for each column. Like the other aggregation methods, it is used in conjunction with the agg_by
method.
In this example, we will group Color
, WeightGrams
, and Calories
by Type
:
from deephaven import agg
apples_by_type = apples.agg_by(
aggs=agg.group(["WeightGrams", "Calories", "Color"]), by="Type"
)
- apples_by_type
If the by
parameter is not supplied, the agg.group
method will group all the values from each column:
apples_by_no_column_2 = apples.agg_by(
aggs=agg.group(["Type", "Color", "WeightGrams", "Calories"])
)
- apples_by_no_column_2
Ungroup data with ungroup
The ungroup
method is the opposite of group_by
. It expands content from arrays or vectors into columns of singular values and builds a new set of rows from it. The method takes optional columns as input. If no inputs are supplied, all array or vector columns are expanded. If one or more columns are given as input, only those columns will have their array values expanded into new rows.
The example below shows how ungroup
reverses the group_by
operation used to create apples_by_class_and_diet
when no columns are given as input. Notice how all array columns have been expanded, leaving a single element in each row of the resultant table:
new_apples = apples_by_class_and_diet.ungroup()
- new_apples
The example below uses ungroup
to expand the Color
column in apples_by_class_and_diet
. This expands only arrays in the Color
column, and not the others. Notice how the Type
, WeightGrams
, and Calories
columns still contain arrays:
apples_ungrouped_by_color = apples_by_class_and_diet.ungroup(["Color"])
- apples_ungrouped_by_color
Different array types
The ungroup
method can ungroup DbArrays and Java arrays.
The example below uses the empty_table
method to create a table with two columns and one row. Each column contains a Java array with 3 elements. The ungroup
method works as expected on Java arrays.
from deephaven import empty_table
t = empty_table(1).update(formulas=["X = new int[]{1, 2, 3}", "Z = new int[]{4, 5, 6}"])
t_ungrouped = t.ungroup()
- t
- t_ungrouped
from deephaven import new_table
from deephaven.column import int_col
t = (
new_table([int_col("X", [1, 2, 3])])
.group_by()
.update(formulas=["Z = new int[]{4, 5, 6}"])
)
t_ungrouped = t.ungroup()
- t
- t_ungrouped
Handling different array lengths
The ungroup
method cannot unpack a row that contains arrays of different length.
To demonstrate this, we'll start by creating a table with two columns and one row.
from deephaven import empty_table
t = empty_table(1).update(formulas=["X = new int[]{1, 2, 3}", "Z = new int[]{4, 5}"])
- t
Each column in the above table contains a Java array, but one has three elements and the other has two. Since the arrays are not the same size, calling ungroup
without an input column will result in an error.
t_ungrouped = t.ungroup() # This results in an error
It is only possible to ungroup columns of the same length. Arrays of different lengths must be ungrouped separately.
t_ungrouped_by_x = t.ungroup(["X"])
t_ungrouped_by_z = t.ungroup(["Z"])
- t_ungrouped_by_x
- t_ungrouped_by_z
Null values
Using group_by
on a table with null values will work properly. Null values will appear as empty array elements when grouped with group_by
. Null array elements expanded using ungroup
will appear as null (empty) row entries in the corresponding column.
The example below uses the empty_table
method and the ternary operator to create a table with two columns of 5 rows. The first and second rows contain null values. Null values behave as expected during grouping and ungrouping.
from deephaven import empty_table
from deephaven.constants import NULL_INT
t = empty_table(5).update(formulas=["X = i", "Z = i < 2 ? NULL_INT : i-2"])
t_by = t.group_by()
new_t = t_by.ungroup()
- t
- t_by
- new_t
The example below uses the empty_table
method to create a table with one column and one row. The single cell in the table contains a null Java array. Calling ungroup
on this table results in an empty table with one column.
from deephaven import empty_table
t = empty_table(1).update(formulas=["X = (int[])(null)"])
t_ungrouped = t.ungroup()
- t
- t_ungrouped