Exact and Relational Joins

This guide covers exact and relational joins in Deephaven. Exact and relational join operations combine data from two tables based on one or more related key columns.

An exact join combines tables and only keeps rows where exact matches occur in the key column(s). The following table operations perform an exact join:
- exact_join
- natural_join
A relational join primarily combines rows with exact matches across tables, but can also include rows where no exact match exists, depending on the type of join used. The following table operations exemplify different relational joins:

Exact and relational joins differ from time-series and range joins. For a detailed guide, see Joins: time-series and range.

To join three or more table operations with one operation, see the multi_join operation documented in a later section of this article.

Syntax

Following convention, the tables being joined together will be referred to as the "left table" and the "right table":

The left table is the base table to which data is added.
The right table is the source of data added to the left table.

One or more columns will be used as keys to match data between the left and right tables. This format is fundamental for writing join statements in Deephaven. However, the syntax can vary depending on the circumstances.

The basic syntax for join, exact_join, and natural_join is as follows:

# Include all non-key columns
result = left_table.join_method(table=right_table, on=["ColumnsToMatch"])

# Include only some non-key columns (ColumnsToAdd)
result = left_table.join_method(
    table=right_table, on=["ColumnsToMatch"], joins=["ColumnsToAdd"]
)

The basic syntax for left_outer_join and full_outer_join are as follows:

# Port all non-key columns
result = outer_join_method(
    l_table=left_table, r_table=right_table, on=["ColumnsToMatch"]
)

# Port only some non-key columns (ColumnsToAdd)
result = outer_join_method(
    l_table=left_table,
    r_table=right_table,
    on=["ColumnsToMatch"],
    joins=["ColumnsToAdd"],
)

Note

left_outer_join and full_outer_join are currently experimental. The API may change in the future.

Outside of the left and right tables, exact and relational joins take up to two more arguments. The first is required, while the second is optional:

on: The key column(s) on which to look for exact matches. Columns of any data type can be used as key columns, but corresponding match columns in the left and right table must be of the same data type.
joins (Optional): The column(s) in the right table to join to the left table. If not specified, all columns are joined.

Match columns with different names

When two tables can be joined, their match column(s) often don't have identical names. The syntax below joins left_table and right_table on ColumnToMatchLeft and ColumnToMatchRight:

result = left_table.join_method(
    table=right_table,
    on=["ColumnToMatchLeft=ColumnToMatchRight"],
    joins=["ColumnsToAdd"],
)

Multiple match columns

Tables can be joined on more than one match column. The syntax below joins tables on two or more match columns:

result = left_table.join_method(
    table=right_table, on=["Column1", "Column2", "Column3Left = Column3Right", ...]
)

Rename joined columns

Columns being joined from the right table that have the same name as existing columns in the left table will cause a name conflict error. To avoid this, the joins argument can be renamed as a column from the right table. The following example renames the right table's OldColumnName column to NewColumnName:

result = left_table.join_method(
    table=right_table,
    on=["ColumnToMatchLeft=ColumnToMatchRight"],
    joins=["NewColumnName=OldColumnName"],
)

Exact joins

The output of an exact join operation appends columns to the left table (from the right table) for rows where an exact key match exists in the right table.

Exact matches fail if multiple matching keys are in the right table for any key in the left table.

There are two available operations to perform an exact match join. They differ based on:

If all rows from the left table are included.
How zero matches are handled.

`exact_join`

Exact_join requires the distinct key set of the left table to be identical to the full set of key column values in the right table. If there is no matching key in the right table for any value in a left table key column, the exact_join will fail. Additionally, the operation will fail if multiple matching keys exist in the right table for any key in the left table.

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT

left = new_table(
    [
        string_col("LastName", ["Rafferty", "Jones", "Steiner", "Robins", "Smith"]),
        int_col("DeptID", [31, 33, 33, 34, 34]),
        string_col(
            "Telephone",
            [
                "(303) 555-0162",
                "(303) 555-0149",
                "(303) 555-0184",
                "(303) 555-0125",
                "",
            ],
        ),
    ]
)

right = new_table(
    [
        int_col("DeptID", [31, 33, 34, 35]),
        string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
        string_col(
            "DeptTelephone",
            ["(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"],
        ),
    ]
)

result = left.exact_join(table=right, on=["DeptID"])

`natural_join`

Natural_join allows for cases when there are no matching keys in the right table for particular values in the key column of the left table. If no matching key exists in the right table, appended column values are simply NULL. Similarly to exact_join, if there are multiple key matches in the right table, the operation will fail.

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT

left = new_table(
    [
        string_col(
            "LastName",
            ["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers", "DelaCruz"],
        ),
        int_col("DeptID", [31, 33, 33, 34, 34, 36, NULL_INT]),
        string_col(
            "Telephone",
            [
                "(303) 555-0162",
                "(303) 555-0149",
                "(303) 555-0184",
                "(303) 555-0125",
                "",
                "",
                "(303) 555-0160",
            ],
        ),
    ]
)

right = new_table(
    [
        int_col("DeptID", [31, 33, 34, 35]),
        string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
        string_col(
            "DeptTelephone",
            ["(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"],
        ),
    ]
)

result = left.natural_join(table=right, on=["DeptID"])

Relational joins

In contrast to exact joins, relational joins provide operations where multiple key matches in the right table will not result in an error.

The three relational join methods differ in how zero exact matches are handled.

`join`

The output table from a join operation contains rows with matching values in both tables. Rows without matching values are not included in the result.

from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT

left = new_table(
    [
        string_col(
            "LastName",
            ["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers", "DelaCruz"],
        ),
        int_col("DeptID", [31, 33, 33, 34, 34, 36, NULL_INT]),
        string_col(
            "Telephone",
            [
                "(303) 555-0162",
                "(303) 555-0149",
                "(303) 555-0184",
                "(303) 555-0125",
                "",
                "",
                "(303) 555-0160",
            ],
        ),
    ]
)

right = new_table(
    [
        int_col("DeptID", [31, 33, 34, 35]),
        string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
        string_col(
            "DeptTelephone",
            ["(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"],
        ),
    ]
)

result = left.join(table=right, on=["DeptID"])

Tip

join computes the cross product of the left and right tables and subsets the rows based on the arguments. This means it is slow relative to natural_join, so natural_join should be preferred in most places.

`left_outer_join`

Note

This table operation is currently experimental. The API may change in the future.

The output table from a left_outer_join operation contains all rows from the left table as well as rows from the right table that have matching keys in the match column(s).

from deephaven.experimental.outer_joins import left_outer_join
from deephaven import empty_table

left = empty_table(5).update(["I = ii", "A = `left`"])
right = empty_table(5).update(["I = ii * 2", "B = `right`", "C = Math.sin(I)"])

result = left_outer_join(l_table=left, r_table=right, on=["I"])

`full_outer_join`

Note

This table operation is currently experimental. The API may change in the future.

The output table from a full_outer_join operation contains all rows in the key identifier columns from both tables. Keys that exist in one table but not the other project null values into the respective non-key columns for the unmatched row.

from deephaven.experimental.outer_joins import full_outer_join
from deephaven import empty_table

left = empty_table(5).update(["I = ii", "A = `left`"])
right = empty_table(5).update(["I = ii * 2", "B = `right`", "C = Math.sin(I)"])

result = full_outer_join(l_table=left, r_table=right, on=["I"])

Join three or more tables

The multi_join operation joins three or more tables. It was developed to improve the join speed by taking advantage of the potential to share a single hash table and exploit concurrency.

multi_join joins three or more tables together in the same way that natural_join joins two tables together. The result of multi_join is not a typical table, but rather a MultiJoinTable object, so calling the table method is necessary for most use cases.

There are two ways to use multi_join: with constituent tables or with one or more MultiJoinInput objects.

With constituent tables

Using constituent tables is syntactically simple. The syntax is as follows:

multi_table = multi_join(input=[t1, t2, t3], on="CommonKeyColumn")
multi_table = multi_join(input=[t1, t2, t3], on=["CommonKeyCol1, CommonKeyCol2"])

Using constituent tables requires that all tables have identical key column names and that all of the tables' output rows are desired.

The following example joins three tables that correspond to letter grades for students at three different grade levels.

from deephaven.table import multi_join
from deephaven import new_table
from deephaven.column import string_col

grade5 = new_table(
    [
        string_col("Name", ["Mark", "Austin", "Sandra", "Andy", "Caleb"]),
        string_col("Grade5", ["A", "A", "C", "B", "A"]),
    ]
)

grade6 = new_table(
    [
        string_col("Name", ["Sandra", "Andy", "Kathy", "June", "Caleb"]),
        string_col("Grade6", ["B", "C", "D", "A", "A"]),
    ]
)

grade7 = new_table(
    [
        string_col("Name", ["Austin", "Kathy", "Sandra", "Mark", "Caleb"]),
        string_col("Grade7", ["C", "B", "A", "C", "B"]),
    ]
)

multijoin_table = multi_join(input=[grade5, grade6, grade7], on=["Name"])

result = multijoin_table.table

With `MultiJoinInput` objects

Using MultiJoinInput objects as inputs for multi_join is syntactically more complex than using constituent tables, but allows for more flexibility. The syntax for creating a MultiJoinInput object is as follows:

multijoin_input = [
    MultiJoinInput(table=t1, on="KeyColumn"),  # All columns added
    MultiJoinInput(
        table=t2, on="KeyColumn", joins=["Column1", "Column2"]
    ),  # Specific columns added
]

Then, the syntax for using the multijoin_input object in a multi_join is simple:

multi_table = multi_join(input=multijoin_input)

The following example demonstrates the use of multi_join to join three tables via a MultiJoinInput object.

# import multijoin classes
from deephaven.table import MultiJoinInput, MultiJoinTable, multi_join
from deephaven import new_table
from deephaven.column import string_col

grade5 = new_table(
    [
        string_col("Name", ["Mark", "Austin", "Sandra", "Andy", "Caleb"]),
        string_col("Grade5", ["A", "A", "C", "B", "A"]),
    ]
)

grade6 = new_table(
    [
        string_col("Name", ["Sandra", "Andy", "Kathy", "June", "Caleb"]),
        string_col("Grade6", ["B", "C", "D", "A", "A"]),
    ]
)

grade7 = new_table(
    [
        string_col("Name", ["Austin", "Kathy", "Sandra", "Mark", "Caleb"]),
        string_col("Grade7", ["C", "B", "A", "C", "B"]),
    ]
)

# create a MultiJoinInput array
mji_arr = [
    MultiJoinInput(table=grade5, on="Key=Name"),
    MultiJoinInput(table=grade6, on="Key=Name"),
    MultiJoinInput(table=grade7, on="Key=Name"),
]

# create a MultiJoinTable object
multijoin_table = multi_join(input=mji_arr)

# retrieve the underlying table
result = multijoin_table.table

Which method should you use?

Choosing the right join method can be tricky, so here are some things to consider when choosing between what's available:

How should multiple exact matches be handled?
- exact_join and natural_join raise an error.
- join, left_outer_join, and full_outer_join do not raise an error.
How should zero exact matches be handled?
- exact_join raises an error.
- natural_join joins a null value.
What data should be included in the result?
- join includes only rows that have matching values in both tables.
- left_outer_join includes all rows from the left table, and only rows from the right table that match those in the left table.
- full_outer_join includes all rows from both tables.

For help in choosing a method that uses inexact matches to join tables, see here.

The following figure presents a flowchart to help choose the right join method for your query.

Exact and Relational Joins

Syntax

Match columns with different names

Multiple match columns

Rename joined columns

Exact joins

exact_join

natural_join

Relational joins

join

left_outer_join

full_outer_join

Join three or more tables

With constituent tables

With MultiJoinInput objects

Which method should you use?

Related documentation

`exact_join`

`natural_join`

`join`

`left_outer_join`

`full_outer_join`

With `MultiJoinInput` objects