Skip to main content

How to group and ungroup data

This guide will show you how to group and ungroup table data in Deephaven.

This guide uses a table of apple data called apples created using newTable. Many of the grouping and ungrouping examples use this table. If you are unfamiliar with creating tables from scratch using newTable, please see our guide Create a new table.

Use the code below to create the apples table:

from deephaven.TableTools import newTable, intCol, stringCol
apples = newTable(    stringCol("Type", "Granny Smith", "Granny Smith", "Gala", "Gala", "Golden Delicious", "Golden Delicious"),    stringCol("Color", "Green", "Green", "Red-Green", "Orange-Green", "Yellow", "Yellow"),    intCol("WeightGrams", 102, 85, 79, 92, 78, 99),    intCol("Calories", 53, 48, 51, 61, 46, 57))

img

Group data with by#

The by method groups columnar data into arrays. A list of grouping column names defines grouping keys. All rows from the input table with the same key values are grouped together.

If no input is supplied to by, then there will be one group, which contains all of the data. The resultant table will contain a single row, where column data is grouped into a single array. This is shown in the example below:

applesByNoColumn = apples.by()

img

If a single input is supplied to by, then the resultant table will have row data grouped into arrays based on each unique value in the input column. This is shown in the example below:

applesByType = apples.by("Type")

img

If more than one input is supplied to by, then the resultant table will have row data grouped into arrays based on unique value pairs from the grouping columns. This is shown in the example below:

applesByTypeAndColor = apples.by("Type", "Color")

img

Formulas supplied to by can also be used to both add new columns and group data. The example below simultaneously adds two new columns and groups on the columns. If you are unfamiliar with the ternary conditional statements used in the example, see How to use the ternary conditional operator.

applesByClassAndDiet = apples.by("Class = (WeightGrams < 90) ? `Light` : `Heavy`",    "Diet = (Calories < 50) ? `Allowed` : `Not Allowed`")

img

Ungroup data with ungroup#

The ungroup method is the inverse of by. It unwraps content from arrays and builds a new set of rows from it. The method takes optional columns as input. If no inputs are supplied, all array columns are unwrapped. If one or more columns are given as input, only those columns will have their array values unwrapped into new rows.

The example below shows how ungroup reverses the by operation used to create applesByClassAndDiet when no columns are given as input. Notice how all array columns have been unwrapped, leaving a single element in each row of the resultant table:

newApples = applesByClassAndDiet.ungroup()

img

The example below uses ungroup to unwrap the Color column in applesByClassAndDiet. This unwraps only arrays in the Color column, and not the others. Notice how the Type, WeightGrams, and Calories columns still contain arrays:

applesUngroupedByColor = applesByClassAndDiet.ungroup("Color")

img

Different array types#

The ungroup method can ungroup DbArrays and Java arrays.

The example below uses the emptyTable method to create a table with two columns and one row. Each column contains a Java array with 3 elements. The ungroup method works as expected on Java arrays.

from deephaven.TableTools import emptyTable
t = emptyTable(1).update("X = new int[]{1, 2, 3}", "Z = new int[]{4, 5, 6}")t_ungrouped = t.ungroup()

img

The example below uses ungroup to unwrap both a DbArray (column X) and a Java array (column Z) at the same time.

from deephaven.TableTools import newTable, intCol
t = newTable(    intCol("X", 1, 2, 3)).by().update("Z = new int[]{4, 5, 6}")
t_ungrouped = t.ungroup()

img

Different array lengths#

The ungroup method cannot unpack a row that contains arrays of different length.

The example below uses the emptyTable method to create a table with two columns and one row. Each column contains a Java array, but one has three elements and the other has two. Calling ungroup without an input column will result in an error.

from deephaven.TableTools import emptyTable
t = emptyTable(1).update("X = new int[]{1, 2, 3}", "Z = new int[]{4, 5}")t_ungrouped = t.ungroup() # This results in an error

img img

It is only possible to ungroup columns of the same length. Arrays of different lengths must be ungrouped separately.

t_ungroupedByX = t.ungroup("X")t_ungroupedByZ = t.ungroup("Z")

img

Null values#

Using by on a table with null values will work properly. Null values will appear as empty array elements when grouped with by. Null array elements unwrapped using ungroup will appear as null (empty) row entries in the corresponding column.

The example below uses the emptyTable method and the ternary operator to create a table with two columns of 5 rows. The first and second rows contain null values. Null values behave as expected during grouping and ungrouping.

from deephaven.TableTools import emptyTablefrom deephaven.conversion_utils import NULL_INT
t = emptyTable(5).update("X = i", "Z = i < 2 ? NULL_INT : i-2")t_by = t.by()new_t = t_by.ungroup()

img

The example below uses the emptyTable method to create a table with one column and one row. The single cell in the table contains a null Java array. Calling ungroup on this table results in an empty table with one column.

from deephaven.TableTools import emptyTable
t = emptyTable(1).update("X = (int[])(null)")t_ungrouped = t.ungroup()

img

Related documentation#