Skip to main content
Version: Java (Groovy)

Exact and Relational Joins

This guide covers exact and relational joins in Deephaven. Exact and relational join operations combine data from two tables based on one or more related key columns.

  • An exact join combines tables and only keeps rows where exact matches occur in the key column(s). The following table operations perform an exact join:
  • A relational join primarily combines rows with exact matches across tables, but can also include rows where no exact match exists, depending on the type of join used. The following table operations exemplify different relational joins:

Exact and relational joins differ from time-series and range joins. For a detailed guide, see Joins: time-series and range.

To join three or more table operations with one operation, see the multijoin operation documented in a later section of this article.

Syntax

Following convention, the tables being joined together will be referred to as the "left table" and the "right table":

  • The left table is the base table to which data is added.
  • The right table is the source of data added to the left table.

One or more columns will be used as keys to match data between the left and right tables. This format is fundamental for writing join statements in Deephaven. However, the syntax can vary depending on the circumstances.

The basic syntax for join, exactJoin, and naturalJoin is as follows:

// Include all non-key columns
result = leftTable.joinMethod(rightTable, columnsToMatch)

// Include only some non-key columns (ColumnsToAdd)
result = leftTable.joinMethod(rightTable, columnsToMatch, columnsToAdd)

Where rightTable is the table to join with, and columnsToMatch and columnsToAdd are the String names for columns to match and add, respectively.

The basic syntax for leftOuterJoin and fullOuterJoin are as follows:

// Port all non-key columns
result = outerJoinMethod(leftTable, rightTable, columnsToMatch)

// Port only some non-key columns (ColumnsToAdd)
result = outerJoinMethod(leftTable, rightTable, columnsToMatch, columnsToAdd)
note

leftOuterJoin and fullOuterJoin are currently experimental. The API may change in the future.

Outside of the left and right tables, exact and relational joins take up to two more arguments. The first is required, while the second is optional:

  • columnsToMatch: The key column(s) on which to look for exact matches. Columns of any data type can be used as key columns, but corresponding match columns in the left and right table must be of the same data type.
  • columnsToAdd (Optional): The column(s) in the right table to join to the left table. If not specified, all columns are joined.

Match columns with different names

When joining data from two different tables, you may need to match on columns that do not have the same name. For example, in the tables below, the column representing the department number has a different name in each table.

The basic syntax looks like this:

result = leftTable.joinMethod(rightTable, "columnToMatchLeft = columnToMatchRight", "columnsToJoin")

Multiple match columns

Tables can be joined on more than one match column. This is done by listing all of the key columns within the columnsToMatch argument as comma-separated values. The syntax below joins tables on two or more match columns:

result = leftTable.joinMethod(rightTable, "Column1", "Column2", "Column3Left = Column3Right")

Rename joined columns

Columns being joined from the right table that have the same name as existing columns in the left table will cause a name conflict error. To avoid this, the joins argument can be renamed as a column from the right table. The following example renames the right table's OldColumnName column to NewColumnName:

result = leftTable.joinMethod(rightTable, "ColumnToMatchLeft=ColumnToMatchRight", "NewColumnName=OldColumnName")

Exact joins

The output of an exact join operation appends columns to the left table (from the right table) for rows where an exact key match exists in the right table.

Exact matches fail if multiple matching keys are in the right table for any key in the left table.

There are two available operations to perform an exact match join. They differ based on:

  • If all rows from the left table are included.
  • How zero matches are handled.

exactJoin

exactJoin requires the distinct key set of the left table to be identical to the full set of key column values in the right table. If there is no matching key in the right table for any value in a left table key column, the exactJoin will fail. Additionally, the operation will fail if multiple matching keys exist in the right table for any key in the left table.

left = newTable(
stringCol("LastName", "Rafferty", "Jones", "Steiner", "Robins", "Smith"),
intCol("DeptID", 31, 33, 33, 34, 34),
stringCol("Telephone", "(303) 555-0162", "(303) 555-0149", "(303) 555-0184", "(303) 555-0125", ""),
)

right = newTable(
intCol("DeptID", 31, 33, 34, 35),
stringCol("DeptName", "Sales", "Engineering", "Clerical", "Marketing"),
stringCol("DeptTelephone", "(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"),
)

result = left.exactJoin(right, "DeptID")

naturalJoin

naturalJoin allows for cases when there are no matching keys in the right table for particular values in the key column of the left table. If no matching key exists in the right table, appended column values are simply NULL. Similarly to exactJoin, if there are multiple key matches in the right table, the operation will fail.

left = newTable(
stringCol("LastName", "Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers", "DelaCruz"),
intCol("DeptID", 31, 33, 33, 34, 34, 36, NULL_INT),
stringCol("Telephone", "(303) 555-0162", "(303) 555-0149", "(303) 555-0184", "(303) 555-0125", "", "", "(303) 555-0160"),
)

right = newTable(
intCol("DeptID", 31, 33, 34, 35),
stringCol("DeptName", "Sales", "Engineering", "Clerical", "Marketing"),
stringCol("DeptTelephone", "(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"),
)

result = left.naturalJoin(right, "DeptID")

Relational joins

In contrast to exact joins, relational joins provide operations where multiple key matches in the right table will not result in an error.

The three relational join methods differ in how zero exact matches are handled.

join

The output table from a join operation contains rows with matching values in both tables. Rows without matching values are not included in the result.

left = newTable(
stringCol("LastName", "Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers", "DelaCruz"),
intCol("DeptID", 31, 33, 33, 34, 34, 36, NULL_INT),
stringCol("Telephone", "(303) 555-0162", "(303) 555-0149", "(303) 555-0184", "(303) 555-0125", "", "", "(303) 555-0160"),
)

right = newTable(
intCol("DeptID", 31, 33, 34, 35),
stringCol("DeptName", "Sales", "Engineering", "Clerical", "Marketing"),
stringCol("DeptTelephone", "(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"),
)

result = left.join(right, "DeptID")
tip

join computes the cross product of the left and right tables and subsets the rows based on the arguments. This means it is slow relative to naturalJoin, so naturalJoin should be preferred in most places.

leftOuterJoin

note

This table operation is currently experimental. The API may change in the future.

The output table from a leftOuterJoin operation contains all rows from the left table as well as rows from the right table that have matching keys in the match column(s).

import io.deephaven.engine.util.OuterJoinTools

left = emptyTable(5).update("I = ii", "A = `left`")
right = emptyTable(5).update("I = ii * 2", "B = `right`", "C = Math.sin(I)")

result = OuterJoinTools.leftOuterJoin(left, right, "I")

fullOuterJoin

note

This table operation is currently experimental. The API may change in the future.

The output table from a fullOuterJoin operation contains all rows in the key identifier columns from both tables. Keys that exist in one table but not the other project null values into the respective non-key columns for the unmatched row.

import io.deephaven.engine.util.OuterJoinTools

left = emptyTable(5).update("I = ii", "A = `left`")
right = emptyTable(5).update("I = ii * 2", "B = `right`", "C = Math.sin(I)")

result = OuterJoinTools.fullOuterJoin(left, right, "I")

Join three or more tables

The Multi-Join joins three or more tables. It was developed to improve the join speed by taking advantage of the potential to share a single hash table and exploit concurrency.

MultiJoin.of joins three or more tables together in the same way that naturalJoin joins two tables together. The result of MultiJoin.of is not a typical table, but rather a MultiJoinTable object, so calling the table method is necessary for most use cases.

There are two ways to use MultiJoin.of: with constituent tables or with one or more MultiJoinInput objects.

With constituent tables

Using constituent tables is the simplest option, and can be accomplished with one line of code - however, it requires that all of the MultiJoinTable's constituent tables have identical key column names and that all of the tables' output rows are desired. The syntax is as follows:

MultiJoinTable mjTable = MultiJoinFactory.of(keys, tables...)
  • keys is a single string of comma-separated key column names; for example, "key1, key2".
  • tables is any number of tables to merge; for example, table1, table2, table3.

The following example demonstrates the simple version of the syntax.

  • First, we create three tables, each containing a list of students' names and associated grades.
  • Then, we create a MultiJoinTable object, which joins the three tables on the Name and Grade columns.
  • Finally, we call table() to access the MultiJoinTable object's underlying table.
// import multijoin classes
import io.deephaven.engine.table.MultiJoinFactory
import io.deephaven.engine.table.MultiJoinTable

// create tables
Grade5 = newTable(
stringCol("Name","Mark", "Austin", "Jane", "Alex", "May"),
stringCol("Grade", "A", "A", "C", "B", "A"),
)

Grade6 = newTable(
stringCol("Name","Sandra", "Andy", "Kathy", "June", "October"),
stringCol("Grade", "B", "C", "D", "A", "A"),
)

Grade7 = newTable(
stringCol("Name","Lando", "Han", "Luke", "Ben", "Caleb"),
stringCol("Grade", "C", "B", "A", "C", "B"),
)

// create a MultiJoinTable object and join the three tables
MultiJoinTable mtTable = MultiJoinFactory.of("Name, Grade", Grade5, Grade6, Grade7)

// access the multijoin object's internal table
result = mtTable.table()

With MultiJoinInput objects

MultiJoin.of can also take a list of MultiJoinInput objects as input. This method is more complex, but allows for more flexibility. The syntax for creating a MultiJoinInput object is as follows:

// create a MultiJoinInput array
MultiJoinInput mjInputArr = new MultiJoinInput[] {
MultiJoinInput.of(t1, "Key1=A,Key2=B", "C1=C,D1=D"),
MultiJoinInput.of(y2, "Key1=A,Key2=B", "C2=C,D2=D")
}

// create a MultiJoinTable object
MultiJoinTable mjTable = MultiJoinFactory.of(mjInputArr);

The following example demonstrates the creation of a MultiJoinTable using a MultiJoinInput array.

  • First, we create three tables.
  • Then, we create a MultiJoinInput array, which is used as an input for the MultiJoinFactory.of method instead of a String array of keys and a list of tables.
  • Finally, we retrieve the underlying table.
info

Note that if we call MultiJoinInput.of with only a table and a key, all of the table's constituent columns will be included in the join.

// import multijoin classes
import io.deephaven.engine.table.MultiJoinFactory
import io.deephaven.engine.table.MultiJoinTable
import io.deephaven.engine.table.MultiJoinInput

// create tables
t1 = newTable(intCol("C1", 1, 2), intCol("C2", 1, 1), intCol("S1", 10, 11))

t2 = newTable(intCol("C1", 3123, 62364), intCol("C3", 56, 99), intCol("S2", 10, 11))

t3 = newTable(intCol("C1", 44, 3), intCol("C4", 182374, 1231), intCol("S3", 44, 2313))

// create a MultiJoinInput array
mjArr = new MultiJoinInput[] {MultiJoinInput.of(t1, "Key=C1"), MultiJoinInput.of(t2, "Key=C1"), MultiJoinInput.of(t3, "Key=C1")}

// create a MultiJoinTable object
mjTable = MultiJoinFactory.of(mjArr)

// retrieve the underlying table
result = mjTable.table()

In this example, we repeat the previous example but specify which columns we want to include in the join.

// import multijoin classes
import io.deephaven.engine.table.MultiJoinFactory
import io.deephaven.engine.table.MultiJoinTable
import io.deephaven.engine.table.MultiJoinInput

// create tables
t1 = newTable(intCol("C1", 1, 2), intCol("C2", 1, 1), intCol("S1", 10, 11))

t2 = newTable(intCol("C1", 3123, 62364), intCol("C3", 56, 99), intCol("S2", 10, 11))

t3 = newTable(intCol("C1", 44, 3), intCol("C4", 182374, 1231), intCol("S3", 44, 2313))

// create a MultiJoinInput array
mjArr = new MultiJoinInput[] {MultiJoinInput.of(t1, "Key=C1", "C2"), MultiJoinInput.of(t2, "Key=C1", "S2"), MultiJoinInput.of(t3, "Key=C1", "C4")}

// create multijoin object and retrieve the underlying table
result = MultiJoinFactory.of(mjArr).table()

Now only the columns specified appear in the output table.

Which method should you use?

Choosing the right join method can be tricky, so here are some things to consider when choosing between what's available:

  • How should multiple exact matches be handled?
  • How should zero exact matches be handled?
  • What data should be included in the result?
    • join includes only rows that have matching values in both tables.
    • leftOuterJoin includes all rows from the left table, and only rows from the right table that match those in the left table.
    • fullOuterJoin includes all rows from both tables.

For help in choosing a method that uses inexact matches to join tables, see here.

The following figure presents a flowchart to help choose the right join method for your query.

Do all columnshave exactmatches?Do you wantthe nearest eventbefore or after?NoYesBeforeAfterAs-of JoinReverse As-of JoinAre multiplematches expectedin the right table?JoinWhat do youwant when thereis no match?NoYesRaise an errorUse null valueExact JoinNatural JoinDo you want all rowsfrom the left table?NoYesDo you want all rows from the right table?YesNoFull outer joinLeft outer joinDo you want to perform aggregations on grouped joined data?YesRange JoinNo