Exact and Relational Joins

This guide covers exact and relational joins in Deephaven. Exact and relational join operations combine data from two tables based on one or more related key columns.

  • An exact join combines tables and only keeps rows where exact matches occur in the key column(s). The following table operations perform an exact join:
  • A relational join primarily combines rows with exact matches across tables, but can also include rows where no exact match exists, depending on the type of join used. The following table operations exemplify different relational joins:

Exact and relational joins differ from time-series and range joins. For a detailed guide, see Joins: time-series and range.

To join three or more table operations with one operation, see the multi_join operation documented in a later section of this article.

Syntax

Following convention, the tables being joined together will be referred to as the "left table" and the "right table":

  • The left table is the base table to which data is added.
  • The right table is the source of data added to the left table.

One or more columns will be used as keys to match data between the left and right tables. This format is fundamental for writing join statements in Deephaven. However, the syntax can vary depending on the circumstances.

The basic syntax for join, exact_join, and natural_join is as follows:

The basic syntax for left_outer_join and full_outer_join are as follows:

Note

left_outer_join and full_outer_join are currently experimental. The API may change in the future.

Outside of the left and right tables, exact and relational joins take up to two more arguments. The first is required, while the second is optional:

  • on: The key column(s) on which to look for exact matches. Columns of any data type can be used as key columns, but corresponding match columns in the left and right table must be of the same data type.
  • joins (Optional): The column(s) in the right table to join to the left table. If not specified, all columns are joined.

Match columns with different names

When two tables can be joined, their match column(s) often don't have identical names. The syntax below joins left_table and right_table on ColumnToMatchLeft and ColumnToMatchRight:

Multiple match columns

Tables can be joined on more than one match column. The syntax below joins tables on two or more match columns:

Rename joined columns

Columns being joined from the right table that have the same name as existing columns in the left table will cause a name conflict error. To avoid this, the joins argument can be renamed as a column from the right table. The following example renames the right table's OldColumnName column to NewColumnName:

Exact joins

The output of an exact join operation appends columns to the left table (from the right table) for rows where an exact key match exists in the right table.

Exact matches fail if multiple matching keys are in the right table for any key in the left table.

There are two available operations to perform an exact match join. They differ based on:

  • If all rows from the left table are included.
  • How zero matches are handled.

exact_join

Exact_join requires the distinct key set of the left table to be identical to the full set of key column values in the right table. If there is no matching key in the right table for any value in a left table key column, the exact_join will fail. Additionally, the operation will fail if multiple matching keys exist in the right table for any key in the left table.

natural_join

Natural_join allows for cases when there are no matching keys in the right table for particular values in the key column of the left table. If no matching key exists in the right table, appended column values are simply NULL. Similarly to exact_join, if there are multiple key matches in the right table, the operation will fail.

Relational joins

In contrast to exact joins, relational joins provide operations where multiple key matches in the right table will not result in an error.

The three relational join methods differ in how zero exact matches are handled.

join

The output table from a join operation contains rows with matching values in both tables. Rows without matching values are not included in the result.

Tip

join computes the cross product of the left and right tables and subsets the rows based on the arguments. This means it is slow relative to natural_join, so natural_join should be preferred in most places.

left_outer_join

Note

This table operation is currently experimental. The API may change in the future.

The output table from a left_outer_join operation contains all rows from the left table as well as rows from the right table that have matching keys in the match column(s).

full_outer_join

Note

This table operation is currently experimental. The API may change in the future.

The output table from a full_outer_join operation contains all rows in the key identifier columns from both tables. Keys that exist in one table but not the other project null values into the respective non-key columns for the unmatched row.

Join three or more tables

The multi_join operation joins three or more tables. It was developed to improve the join speed by taking advantage of the potential to share a single hash table and exploit concurrency.

multi_join joins three or more tables together in the same way that natural_join joins two tables together. The result of multi_join is not a typical table, but rather a MultiJoinTable object, so calling the table method is necessary for most use cases.

There are two ways to use multi_join: with constituent tables or with one or more MultiJoinInput objects.

With constituent tables

Using constituent tables is syntactically simple. The syntax is as follows:

Using constituent tables requires that all tables have identical key column names and that all of the tables' output rows are desired.

The following example joins three tables that correspond to letter grades for students at three different grade levels.

With MultiJoinInput objects

Using MultiJoinInput objects as inputs for multi_join is syntactically more complex than using constituent tables, but allows for more flexibility. The syntax for creating a MultiJoinInput object is as follows:

Then, the syntax for using the multijoin_input object in a multi_join is simple:

The following example demonstrates the use of multi_join to join three tables via a MultiJoinInput object.

Which method should you use?

Choosing the right join method can be tricky, so here are some things to consider when choosing between what's available:

For help in choosing a method that uses inexact matches to join tables, see here.

The following figure presents a flowchart to help choose the right join method for your query.