join
join
joins data from a pair of tables - a left and right table - based upon a set of match columns. The match columns establish key identifiers in the left table that will be used to find data in the right table. Any data types can be chosen as keys, and keys can be constructed from multiple values.
The output table contains rows that have matching values in both tables. Rows that do not have matching criteria will not be included in the result. If there are multiple matches between a row from the left table and rows from the right table, all matching combinations will be included. If no match columns are specified ([]
), every combination of left and right table rows is included.
Syntax
left.join(
table: Table,
on: Union[str, Sequence[str]],
joins: Union[str, Sequence[str]] = None,
) -> Table
Parameters
Parameter | Type | Description |
---|---|---|
table | Table | The table data is added from (the right table). |
on | Union[str, Sequence[str]] | Columns from the left and right tables used to join on.
|
joins optional | Union[str, Sequence[str]] | Columns from the right table to be added to the left table based on key may be specified in this list:
|
Returns
A new table containing rows that have matching values in both tables. Rows that do not have matching criteria will not be included in the result. If there are multiple matches between a row from the left table and rows from the right table, all matching combinations will be included. If no match columns are specified, every combination of left and right table rows is included.
Examples
In the following example, the left and right tables are joined on a matching column named DeptID
.
from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT
left = new_table(
[
string_col(
"LastName",
["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers", "DelaCruz"],
),
int_col("DeptID", [31, 33, 33, 34, 34, 36, NULL_INT]),
string_col(
"Telephone",
[
"(303) 555-0162",
"(303) 555-0149",
"(303) 555-0184",
"(303) 555-0125",
"",
"",
"(303) 555-0160",
],
),
]
)
right = new_table(
[
int_col("DeptID", [31, 33, 34, 35]),
string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
string_col(
"DeptTelephone",
["(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"],
),
]
)
result = left.join(table=right, on=["DeptID"])
- left
- right
- result
The left
table has seven rows of data and the right
table has four rows of data, but the result
table has five rows. This is because the last two rows of the left
table and the last row of the right
table have no matches in the DeptID
column, so they are not included in the resulting table.
If the right table has columns that need renaming due to an initial name match, a new column name can be supplied in the third argument of the join. In the following example, Telephone
is renamed to DeptTelephone
.
from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT
left = new_table(
[
string_col(
"LastName",
["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers", "DelaCruz"],
),
int_col("DeptID", [31, 33, 33, 34, 34, 36, NULL_INT]),
string_col(
"Telephone",
[
"(303) 555-0162",
"(303) 555-0149",
"(303) 555-0184",
"(303) 555-0125",
"",
"",
"(303) 555-0160",
],
),
]
)
right = new_table(
[
int_col("DeptID", [31, 33, 34, 35]),
string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
string_col(
"Telephone",
["(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"],
),
]
)
result = left.join(
table=right, on=["DeptID"], joins=["DeptName, DeptTelephone = Telephone"]
)
- left
- right
- result
In the following example, the left and right tables have multiple matches. The result is the cross product of possible outcomes.
from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT
left = new_table(
[
string_col(
"LastName",
["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers", "DelaCruz"],
),
int_col("DeptID", [31, 33, 33, 34, 34, 36, NULL_INT]),
string_col(
"Telephone",
[
"(303) 555-0162",
"(303) 555-0149",
"(303) 555-0184",
"(303) 555-0125",
"",
"",
"(303) 555-0160",
],
),
]
)
right = new_table(
[
int_col("DeptID", [31, 31, 33, 34, 35, NULL_INT]),
string_col(
"DeptName",
["Sales", "Support", "Engineering", "Clerical", "Marketing", "Safety"],
),
string_col(
"DeptTelephone",
[
"(303) 555-0136",
"(303) 555-0187",
"(303) 555-0162",
"(303) 555-0175",
"(303) 555-0171",
"(303) 555-0145",
],
),
]
)
result = left.join(table=right, on=["DeptID"])
- left
- right
- result
In some cases, the matching columns have different names in the left and right table. Below, the left table has a column name DeptNumber
that we want to match to the colomn DeptID
in the right table. To perform this match, the second argument needs the name of each column in the left and right tables.
from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT
left = new_table(
[
string_col(
"LastName",
["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers", "DelaCruz"],
),
int_col("DeptNumber", [31, 33, 33, 34, 34, 36, NULL_INT]),
string_col(
"Telephone",
[
"(303) 555-0162",
"(303) 555-0149",
"(303) 555-0184",
"(303) 555-0125",
"",
"",
"(303) 555-0160",
],
),
]
)
right = new_table(
[
int_col("DeptID", [31, 33, 34, 35]),
string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
string_col(
"DeptTelephone",
["(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"],
),
]
)
result = left.join(table=right, on=["DeptNumber = DeptID"], joins=["DeptName"])
- left
- right
- result
In some cases, the matching columns argument is absent. As a result all possible matches are joined.
from deephaven import new_table
from deephaven.column import string_col, int_col
from deephaven.constants import NULL_INT
left = new_table(
[
string_col(
"LastName",
["Rafferty", "Jones", "Steiner", "Robins", "Smith", "Rogers", "DelaCruz"],
),
int_col("DeptNumber", [31, 33, 33, 34, 34, 36, NULL_INT]),
string_col(
"Telephone",
[
"(303) 555-0162",
"(303) 555-0149",
"(303) 555-0184",
"(303) 555-0125",
"",
"",
"(303) 555-0160",
],
),
]
)
right = new_table(
[
int_col("DeptID", [31, 33, 34, 35]),
string_col("DeptName", ["Sales", "Engineering", "Clerical", "Marketing"]),
string_col(
"DeptTelephone",
["(303) 555-0136", "(303) 555-0162", "(303) 555-0175", "(303) 555-0171"],
),
]
)
result = result = left.join(table=right, on=[""])
- left
- right
- result