Merge tables
Deephaven tables can be combined via two different categories of operation: merge operations and join operations. Merge operations combine tables by stacking them vertically, one on top of the other. Join operations combine tables (or specific columns from tables) horizontally, side by side.
This guide discusses how to merge tables in Deephaven. If you want to join tables horizontally, see Joins: Exact and Relational and Joins: Time-Series and Range.
There are two methods for merging tables in Deephaven: merge
and merge_sorted
.
The following code block initializes three tables, each with two columns. We will use these tables in the following examples.
from deephaven import merge, merge_sorted, new_table
from deephaven.column import int_col, string_col
source1 = new_table(
[string_col("Letter", ["A", "B", "D"]), int_col("Number", [1, 2, 3])]
)
source2 = new_table(
[string_col("Letter", ["C", "D", "E"]), int_col("Number", [14, 15, 16])]
)
source3 = new_table(
[string_col("Letter", ["E", "F", "A"]), int_col("Number", [22, 25, 27])]
)
- source1
- source2
- source3
merge
The merge
method simply stacks one or more tables on top of another.
t = merge(tables: List[Table])
The columns for each table must have the same names and types, or a column mismatch error will occur. NULL
inputs are ignored.
Let's merge two of our tables using the merge
method.
result = merge([source1, source2])
- result
The resulting table result
is all of the source tables stacked vertically. If the source tables dynamically change, such as for ticking data, rows will be inserted within the stack. For example, if a row is added to the end of the third source table, in the resulting table, that new row appears after all other rows from the third source table and before all rows from the fourth source table.
merge_sorted
The merge_sorted
method sorts the result table after merging the data.
t = merge_sorted(tables: List[Table], order_by: str)
Let's merge our three tables and sort by Number
with merge_sorted
.
result = merge_sorted([source1, source2, source3], "Number")
- result
The resulting table is all of the source tables stacked vertically and sorted by the Number
column.
Perform efficient merges
When performing more than one merge operation, it is best to perform all the merges at the same time, rather than nesting several merges.
In this example, a table named result
is initialized. As new tables are generated, the results are merged at every iteration. Calling the merge method on each iteration makes this example inefficient.
from deephaven import merge, new_table
from deephaven.column import int_col, string_col
result = None
for i in range(5):
new_result = new_table(
[string_col("Code", [f"A{i}", f"B{i}"]), int_col("Val", [i, 10 * i])]
)
if result is None:
result = new_result
else:
result = merge([result, new_result])
- result
Instead, we can make the operation more efficient by calling the merge
method just once. Here merge
is applied to an array containing all of the source tables:
from deephaven import merge, new_table
from deephaven.column import int_col, string_col
table_array = []
for i in range(5):
new_result = new_table(
[string_col("Code", [f"A{i}", f"B{i}"]), int_col("Val", [i, 10 * i])]
)
table_array.append(new_result)
result = merge(table_array)
- result
If you are sorting the data you want to merge, it is more efficient to use the merge_sorted
method instead of merge
followed by sort
. Your code will be easier to read, too.
from deephaven import merge, merge_sorted, new_table
from deephaven.column import int_col, string_col
source1 = new_table(
[string_col("Letter", ["A", "B", "D"]), int_col("Number", [1, 2, 3])]
)
source2 = new_table(
[string_col("Letter", ["C", "D", "E"]), int_col("Number", [14, 15, 16])]
)
source3 = new_table(
[string_col("Letter", ["E", "F", "A"]), int_col("Number", [22, 25, 27])]
)
# using `merge` followed by `sort`
t_merged = merge(tables=[source1, source2, source3]).sort(order_by="Number")
# using `merge_sorted`
result = merge_sorted(tables=[source1, source2, source3], order_by="Number")
When we use merge_sorted
, our query completes ten times as fast as it does when using merge
followed by sort
.