Merge tables
Deephaven tables can be combined via two different categories of operation: merge operations and join operations. Merge operations combine tables by stacking them vertically, one on top of the other. Join operations combine tables (or specific columns from tables) horizontally, side by side.
This guide discusses how to merge tables in Deephaven. If you want to join tables horizontally, see joins.
There are two methods for merging tables in Deephaven: merge
and mergeSorted
.
The following code block initializes three tables, each with two columns. We will use these tables in the following examples.
source1 = newTable(
stringCol("Letter", "A", "B", "D"),
intCol("Number", 1, 2, 3)
)
source2 = newTable(
stringCol("Letter", "C", "D", "E"),
intCol("Number", 14, 15, 16)
)
source3 = newTable(
stringCol("Letter", "E", "F", "A"),
intCol("Number", 22, 25, 27)
)
- source1
- source2
- source3
merge
The merge
method simply stacks one or more tables on top of another.
t = merge(tables)
The columns for each table must have the same names and types, or a column mismatch error will occur. NULL
inputs are ignored.
Let's merge two of our tables using the merge
method.
result = merge(source1, source2)
- result
The resulting table result
is all of the source tables stacked vertically. If the source tables dynamically change, such as for ticking data, rows will be inserted within the stack. For example, if a row is added to the end of the third source table, in the resulting table, that new row appears after all other rows from the third source table and before all rows from the fourth source table.
mergeSorted
The mergeSorted
method sorts the result table after merging the data.
t = mergeSorted(keyColumn, tables)
Where keyColumn
is the column by which to sort the merged table, and tables
are the source tables.
Let's merge our three tables and sort by Number
with mergeSorted
.
result = mergeSorted("Number", source1, source2, source3)
- result
The resulting table is all of the source tables stacked vertically and sorted by the Number
column.
Perform efficient merges
When performing more than one merge operation, it is best to perform all the merges at the same time, rather than nesting several merges.
In this example, a table named result
is initialized. As new tables are generated, the results are merged at every iteration. Calling the merge method on each iteration makes this example inefficient.
result = null
for (int i = 0; i < 5; i++) {
new_result = newTable(stringCol("Code", String.format("A%d", i), String.format("A%d", i)), intCol("Val", i, 10*i))
if (result = null) {
result = new_result
} else {
result = merge(result, new_result)
}
}
- result
Instead, we can make the operation more efficient by calling the merge
method just once. Here merge
is applied to an array containing all of the source tables:
List<Object> tableArray = new ArrayList<>();
for (int i = 0; i < 5; i++) {
new_result = newTable(stringCol("Code", String.format("A%d", i), String.format("A%d", i)), intCol("Val", i, 10*i))
tableArray.add(new_result)
}
result = merge(tableArray)
- result
If you are sorting the data you want to merge, it is more efficient to use the mergeSorted
method instead of merge
followed by sort
. Your code will be easier to read, too.
source1 = newTable(stringCol("Letter", "A", "B", "D"), intCol("Number", 1, 2, 3))
source2 = newTable(stringCol("Letter", "C", "D", "E"), intCol("Number", 14, 15, 16))
source3 = newTable(stringCol("Letter", "E", "F", "A"), intCol("Number", 22, 25, 27))
// using `merge` followed by `sort`
t_merged = merge(source1, source2, source3).sort("Number")
// using `mergeSorted`
result = mergeSorted("Number", source1, source2, source3)
When we use mergeSorted
, our query completes more than ten times faster (on average) than it does does when using merge
followed by sort
.