Skip to main content
Version: Java (Groovy)

Merging

Deephaven tables can be combined via two different categories of operation: merge operations and join operations. Merge operations combine tables by stacking them vertically, one on top of the other. Join operations combine tables (or specific columns from tables) horizontally, side by side.

This guide discusses how to merge tables in Deephaven. If you want to join tables horizontally, see joins.

There are two methods for merging tables in Deephaven: merge and merge_sorted.

The following code block initializes three tables, each with two columns. We will use these tables in the following examples.

source1 = newTable(
stringCol("Letter", "A", "B", "D"),
intCol("Number", 1, 2, 3)
)
source2 = newTable(
stringCol("Letter", "C", "D", "E"),
intCol("Number", 14, 15, 16)
)
source3 = newTable(
stringCol("Letter", "E", "F", "A"),
intCol("Number", 22, 25, 27)
)

merge

The merge method simply stacks one or more tables on top of another.

t = merge(tables)
note

The columns for each table must have the same names and types, or a column mismatch error will occur. NULL inputs are ignored.

Let's merge two of our tables using the merge method.

result = merge(source1, source2)

The resulting table result is all of the source tables stacked vertically. If the source tables dynamically change, such as for ticking data, rows will be inserted within the stack. For example, if a row is added to the end of the third source table, in the resulting table, that new row appears after all other rows from the third source table and before all rows from the fourth source table.

mergeSorted

The mergeSorted method sorts the result table after merging the data.

t = mergeSorted(keyColumn, tables)

Where keyColumn is the column by which to sort the merged table, and tables are the source tables.

Let's merge our three tables and sort by Number with mergeSorted.

result = mergeSorted("Number", source1, source2, source3)

The resulting table is all of the source tables stacked vertically and sorted by the Number column.

Perform efficient merges

When performing more than one merge operation, it is best to perform all the merges at the same time, rather than nesting several merges.

In this example, a table named result is initialized. As new tables are generated, the results are merged at every iteration. Calling the merge method on each iteration makes this example inefficient.

result = null

for (int i = 0; i < 5; i++) {
new_result = newTable(stringCol("Code", String.format("A%d", i), String.format("A%d", i)), intCol("Val", i, 10*i))
if (result = null) {
result = new_result
} else {
result = merge(result, new_result)
}
}

Instead, we can make the operation more efficient by calling the merge method just once. Here merge is applied to an array containing all of the source tables:

List<Object> tableArray = new ArrayList<>();

for (int i = 0; i < 5; i++) {
new_result = newTable(stringCol("Code", String.format("A%d", i), String.format("A%d", i)), intCol("Val", i, 10*i))
tableArray.add(new_result)
}
result = merge(tableArray)

If you are sorting the data you want to merge, it is more efficient to use the mergeSorted method instead of merge followed by sort. Your code will be easier to read, too.

source1 = newTable(stringCol("Letter", "A", "B", "D"), intCol("Number", 1, 2, 3))
source2 = newTable(stringCol("Letter", "C", "D", "E"), intCol("Number", 14, 15, 16))
source3 = newTable(stringCol("Letter", "E", "F", "A"), intCol("Number", 22, 25, 27))

// using `merge` followed by `sort`
t_merged = merge(source1, source2, source3).sort("Number")

// using `mergeSorted`
result = mergeSorted("Number", source1, source2, source3)

img

When we use mergeSorted, our query completes more than ten times faster (on average) than it does does when using merge followed by sort.