Skip to main content
Version: Python

How to join two tables together

This guide discusses how to join two tables in Deephaven. There are several methods to choose from; while the basic syntax is the same, this article demonstrates the differences between each join. Please see our conceptual guide, Choose a join method, to learn more about which join is right for your use case.

Frequently, related data is stored in multiple tables. For example, a company might save its employee data in one table, and its department data in another, as shown below.

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT

employee_table = new_table([
string_col("LastName", ["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers"]),
int_col("DeptID", [31, 33, 33, 34, 34, NULL_INT]),
string_col("Telephone", ["(303) 555-0162", "(303) 555-0149", "(303) 555-0184", "(303) 555-0125", None, None])
])

department_table = new_table([
int_col("DeptID", [31, 33, 34, 35]),
string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
])

However, there may be instances where you would like to combine - or join - certain columns of data from multiple, related tables. For example, you might want a single table that lists employee phone numbers along with their associated department names and department phone numbers.

Deephaven provides several join methods that cater to common use cases.

Join TypeMethod
As-of Joinaj()
Exact Joinexact_join()
Full Outer Joinfull_outer_join()
Joinjoin()
Left Outer Joinleft_outer_join()
Natural Joinnatural_join()
Range Joinrange_join()
Reverse As-of Joinraj()

Syntax

Most join methods combine data from two tables. These tables are usually referred to as the "left table" and the "right table":

  • The left table is the base table data is added to.
  • The right table is the source of data added to the left table.

In most cases, there will be one or more columns (ColumnsToMatch) used as keys to match data between the left and right tables. This format is fundamental for writing join statements in Deephaven. However, there are variations in the syntax for different circumstances.

Here is the basic format and syntax for all join methods:

result = left_table.join_method(table = right_table, on = ["ColumnsToMatch, ColumnsToAdd"])

result = left_table.join_method(table=right_table, on=["ColumnsToMatch"], joins=["ColumnsToJoin"])

The left table is followed by the specific join method we want to use. The arguments of the join method are:

  1. Right table, expressed as table = Table,
  2. the column or columns on which to match, expressed as on = ["ColumnToMatch1, ColumnToMatch2, ..."], and
  3. (Optional) the column or columns in the right table to join to the left table, expressed as joins = ["ColumnToJoin1, ColumnToJoin2, ..."]. If excluded, all columns are joined.

Add all columns

Using the tables shown below, we want to get data from the department_table (the right table) and join it to the employee_table (the left table). Both tables have DeptID as a common column which we can use as a key to join the content from the right table to the left table.

If we want to join all the columns from the right table, we use the two-argument join method:

result_table = left_table.join_method(table=right_table, on=["ColumnsToMatch"])
from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT

employee_table = new_table([
string_col("LastName", ["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers"]),
int_col("DeptID", [31, 33, 33, 34, 34, NULL_INT]),
string_col("Telephone", ["(303) 555-0162", "(303) 555-0149", "(303) 555-0184", "(303) 555-0125", None, None]),
])

department_table = new_table([
int_col("DeptID", [31, 33, 34, 35]),
string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
string_col("DeptManager", ["Martinez", "Williams", "Garcia", "Lopez"]),
int_col("DeptGarage", [33, 52, 22, 45])
])

combined = employee_table.natural_join(table=department_table, on=["DeptID"])

However, to reduce the number of columns in the output table, we can specify columns to add via the third argument (ColumnsToJoin). ColumnsToJoin is a comma-separated list (an array) of column names:

left_table.join_method(table = right_table, on = ["ColumnsToMatch"] , joins = ["ColumnToJoin1, ColumnToJoin2"])
from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT

employee_table = new_table([
string_col("LastName", ["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers"]),
int_col("DeptID", [31, 33, 33, 34, 34, NULL_INT]),
string_col("Telephone", ["(303) 555-0162", "(303) 555-0149", "(303) 555-0184", "(303) 555-0125", None, None]),
])

department_table = new_table([
int_col("DeptID", [31, 33, 34, 35]),
string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
string_col("DeptManager", ["Martinez", "Williams", "Garcia", "Lopez"]),
int_col("DeptGarage", [33, 52, 22, 45])
])

combined = employee_table.natural_join(table=department_table, on=["DeptID"], joins=["DeptName, DeptManager"])

Use multiple match columns

It is possible to join tables on multiple key columns. This is done by listing all of the key columns within the ColumnsToMatch argument as comma-separated values.

For a join with two different ColumnsToMatch, the format would look like this, with each matching column listed and separated by commas within the quotes containing the second argument:

left_table.join_method(table = right_table, on = ["ColumnToMatch1, ColumnToMatch2, ..."], joins = ["ColumnsToJoin"])

In the example below, only one element matches for both DeptID and DeptGarage:

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT

employee_table = new_table([
string_col("LastName", ["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers"]),
int_col("DeptID", [31, 33, 33, 34, 34, NULL_INT]),
string_col("Telephone", ["(303) 555-0162", "(303) 555-0149", "(303) 555-0184", "(303) 555-0125", None, None]),
int_col("DeptGarage", [33, 33, 33, 52, 52, 22])
])

department_table = new_table([
int_col("DeptID", [31, 33, 34, 35]),
string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
string_col("DeptManager", ["Martinez", "Williams", "Garcia", "Lopez"]),
int_col("DeptGarage", [33, 52, 22, 45])
])

combined = employee_table.natural_join(table = department_table, on =["DeptID, DeptGarage"])

Match columns with different names

When joining data from two different tables, you may need to match on columns that do not have the same name. For example, in the tables below, the column representing the department number has a different name in each table.

To join the two tables based on this common column, we need to use the equals sign (=) to indicate which pairs of columns should be matched. DeptID is the name of the column containing the matching criteria in the left table, and DeptNumber is the name of the column containing the matching criteria in the right table. Therefore, the following argument would be used for the ColumnsToMatch argument: "DeptID = DeptNumber".

The basic syntax looks like this:

result = left_table.join_method(table = right_table, on = ["ColumnToMatchLeft = ColumnToMatchRight"], joins = ["ColumnsToJoin"])
from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT

employee_table = new_table([
string_col("LastName", ["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers"]),
int_col("DeptNumber", [31, 33, 33, 34, 34, NULL_INT]),
string_col("Telephone", ["(303) 555-0162", "(303) 555-0149", "(303) 555-0184", "(303) 555-0125", None, None]),
])

department_table = new_table([
int_col("DeptID", [31, 33, 34, 35]),
string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
])

combined = employee_table.natural_join(table = department_table, on = ["DeptNumber=DeptID"])

Rename appended columns

When joining data from two different tables, you may want to rename a column from the right table before adding it to the left table. For example, both tables below have a column labeled Telephone. When adding data from the right table, Telephone can be renamed to DeptTelephone to avoid a conflict with the Telephone column inherited from the left table.

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT

employee_table = new_table([
string_col("LastName", ["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers"]),
int_col("DeptID", [31, 33, 33, 34, 34, NULL_INT]),
string_col("Telephone", ["(303) 555-0162", "(303) 555-0149", "(303) 555-0184", "(303) 555-0125", None, None]),
])

department_table = new_table([
int_col("DeptID", [31, 33, 34, 35]),
string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
string_col("Telephone", ["(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"])
])

combined = employee_table.natural_join(table = department_table, on = ["DeptID"], joins = ["DeptName, DeptTelephone=Telephone"])