Skip to main content

Choose a join method

This guide discusses the set of join methods in Deephaven and their differences. There are several methods to choose from, and while the basic syntax is the same, this article will show you how to choose which join is right for your use case.

Frequently, related data is stored in two different tables, and you'd like to create a single table containing data from both data sources. Join methods allow you to do this. Each join method produces a different result, so it is important to know which produces your desired output. If more than two tables need to be joined, then the process can be repeated.

The following flowchart will help you choose which join is right for your use case.

img

Are all of your match columns exact matches?

  • When analyzing time series, it is common to join data immediately before or immediately after an event. Such inexact matches can be performed using the aj(As-of Join) and raj(Reverse As-of Join) methods. aj joins the closest data at or before an event. raj joins closest data at or after an event.

Should all data from the left table appear in the result?

  • Many joins include all data from the left table in the result. join is different. It only includes rows that have matching values in both tables.

How should multiple exact matches be handled?

How should zero exact matches be handled?

Inexact match joins (time-series joins)#

When analyzing time series, you may want to know values immediately before or immediately after an event. Inexact matches join these approximate matches in the left and right table.

As-of Join#

aj is used to join data from the right table immediately before or at the time of an event in the left table. If there is no matching key in the right table, appended row values are null. If there are multiple matches, only the closest match is returned.

Reverse As-of Join#

raj is used to join data from the right table immediately after or at the time of an event in the left table. If there is no matching key in the right table, appended row values are null. If there are multiple matches, only the closest match is returned.

Exact match joins#

There are several methods to join when values are equal. They differ based on:

  • if all rows from the left table are included,
  • how multiple matches are handled, and
  • how zero matches are handled.

Join#

join performs an inner join and returns all possible combinations having matching records in both tables. Rows without matches in both tables do not occur in the result. Rows with multiple matches appear multiple times in the result.

from deephaven.TableTools import newTable, stringCol, intColfrom deephaven.conversion_utils import NULL_INT
left = newTable(    stringCol("Letter", "A", "B", "C"),    intCol("Number", 5, 3, 2))
right = newTable(    stringCol("Letter", "A", "A", "B", "B", "D"),    intCol("Code", 10, 12, 14, NULL_INT, 16),)
result = left.join(right, "Letter")

img

Left Join#

leftJoin groups matches into arrays. If there are multiple matches, all of the values appear in the arrays. If there are no matches, joined values are null.

from deephaven.TableTools import newTable, stringCol, intColfrom deephaven.conversion_utils import NULL_INT
left = newTable(    stringCol("Letter", "A", "B", "C"),    intCol("Number", 5, 3, 2))
right = newTable(    stringCol("Letter", "A", "A", "B", "B", "D"),    intCol("Code", 10, 12, 14, NULL_INT, 16),)
result = left.leftJoin(right, "Letter")

img

Exact Join#

exactJoin joins on matching values from the right table. If there are zero matches or multiple matches, an error is raised.

from deephaven.TableTools import newTable, stringCol, intColfrom deephaven.conversion_utils import NULL_INT
left = newTable(    stringCol("Letter", "A", "B", "C"),    intCol("Number", 5, 3, 2))
right = newTable(    stringCol("Letter", "A", "B", "C", "D"),    intCol("Code", 10, NULL_INT, 16, 18))
result = left.exactJoin(right, "Letter")

img

Natural Join#

naturalJoin joins on matching values from the right table. If there are no matches, joined values are null. If there are multiple matches, an error is raised.

naturalJoin is similar to exactJoin, but no matches results in a null value instead of an error.

from deephaven.TableTools import newTable, stringCol, intColfrom deephaven.conversion_utils import NULL_INT
left = newTable(    stringCol("Letter", "A", "B", "C"),    intCol("Number", 5, 3, 2) )
right = newTable(    stringCol("Letter", "A", "B", "C", "D"),    intCol("Code", 10, NULL_INT, 16, 18) )
result = left.naturalJoin(right, "Letter")

img

Related documentation#