raj
raj
, reverse-as-of join, joins data from a pair of tables - a left and right table - based upon one or more match columns. The match columns establish key identifiers in the left table that will be used to find data in the right table. Any data types can be chosen as keys.
raj
is an inexact match join. For columns appended to the left table, row values equal those from the right table where the keys from the left table most closely match the keys from the right table without going under. If there is no matching key in the right table, appended row values are NULL
.
raj
is typically used in cases where no exact match between key column row values is guaranteed. A common use case is joining data on date-time columns to find rows where the time of measurement in the right table is the closest after that in the left table.
When using raj
, the first N-1
match columns are exactly matched. The last match column is used to find the key values from the right table that are closest to the values in the left table without going under the left value. For example, if the right table contains a value 5
and the left table contains values 4
and 6
, the left table's 4
will be matched on the right table's 5
.
The output table contains all of the rows and columns of the left table plus additional columns containing data from the right table. For columns appended to the left table, row values equal the row values from the right table where the keys from the left table most closely match the keys from the right table, as defined above. If there is no matching key in the right table, appended row values are NULL
.
Syntax
left.raj(
table: Table,
on: Union[str, Sequence[str]],
joins: Union[str, Sequence[str]] = None,
) -> Table
Parameters
Parameter | Type | Description |
---|---|---|
table | Table | The table data is added from (the right table). |
on | Union[str, Sequence[str]] | Columns from the left and right tables used to join on.
The first |
joins optional | Union[str, Sequence[str]] | Columns from the right table to be added to the left table based on key may be specified in this list:
|
Returns
A new table containing all of the rows and columns of the left table plus additional columns containing data from the right table. For columns appended to the left table, row values equal the row values from the right table where the keys from the left table most closely match the keys from the right table, as defined above. If there is no matching key in the right table, appended row values are NULL
.
Examples
The first two examples join a left
and right
table on numeric columns named X
and Y
, respectively. Each column contains the integer row index.
In this example, no join
columns are given, so all columns from the right
table are appended in result
. Every row in the left
table's X
column has a row in the right
table where the criteria (X <= Y
) is met, so the result
table has all data from right
appended.
from deephaven import empty_table
left = empty_table(10).update(["X = i", "LeftVals = randomInt(1, 100)"])
right = empty_table(10).update(["Y = i", "RightVals = randomInt(1, 100)"])
result = left.raj(table=right, on=["X <= Y"])
- left
- right
- result
The following example uses X < Y
instead of X <= Y
as the on
parameter. Thus, the last row in X
no longer has a corresponding row in Y
where the criteria is met. Thus, NULL
values are appended in the last row of result
.
from deephaven import empty_table
left = empty_table(10).update(["X = i", "LeftVals = randomInt(1, 100)"])
right = empty_table(10).update(["Y = i", "RightVals = randomInt(1, 100)"])
result = left.raj(table=right, on=["X < Y"])
- left
- right
- result
The next two examples look at stock quotes and trades. Quotes are the published prices and sizes at which people are willing to trade a security, while trades are the prices and sizes of actual trades. raj
is used to find the first quote immediately after.
The following example joins all quote columns onto the trade table.
from deephaven import new_table
from deephaven.column import string_col, int_col, double_col, datetime_col
from deephaven.time import to_j_instant
trades = new_table(
[
string_col("Ticker", ["AAPL", "AAPL", "AAPL", "IBM", "IBM"]),
datetime_col(
"Timestamp",
[
to_j_instant("2021-04-05T09:10:00 ET"),
to_j_instant("2021-04-05T09:31:00 ET"),
to_j_instant("2021-04-05T16:00:00 ET"),
to_j_instant("2021-04-05T16:00:00 ET"),
to_j_instant("2021-04-05T16:30:00 ET"),
],
),
double_col("Price", [2.5, 3.7, 3.0, 100.50, 110]),
int_col("Size", [52, 14, 73, 11, 6]),
]
)
quotes = new_table(
[
string_col("Ticker", ["AAPL", "AAPL", "IBM", "IBM", "IBM"]),
datetime_col(
"Timestamp",
[
to_j_instant("2021-04-05T09:11:00 ET"),
to_j_instant("2021-04-05T09:30:00 ET"),
to_j_instant("2021-04-05T16:00:00 ET"),
to_j_instant("2021-04-05T16:30:00 ET"),
to_j_instant("2021-04-05T17:00:00 ET"),
],
),
double_col("Bid", [2.5, 3.4, 97, 102, 108]),
int_col("BidSize", [10, 20, 5, 13, 23]),
double_col("Ask", [2.5, 3.4, 105, 110, 111]),
int_col("AskSize", [83, 33, 47, 15, 5]),
]
)
result = trades.raj(table=quotes, on=["Ticker", "Timestamp"])
- trades
- quotes
- result
The following example illustrates the exclusion of exact matches by using <
rather than <=
. The date-time columns are named differently, as specified in the on
input parameter.
from deephaven import new_table
from deephaven.column import string_col, int_col, double_col, datetime_col
from deephaven.time import to_j_instant
trades = new_table(
[
string_col("Ticker", ["AAPL", "AAPL", "AAPL", "IBM", "IBM"]),
datetime_col(
"Datetime",
[
to_j_instant("2021-04-05T09:10:00 ET"),
to_j_instant("2021-04-05T09:31:00 ET"),
to_j_instant("2021-04-05T16:00:00 ET"),
to_j_instant("2021-04-05T16:00:00 ET"),
to_j_instant("2021-04-05T16:30:00 ET"),
],
),
double_col("Price", [2.5, 3.7, 3.0, 100.50, 110]),
int_col("Size", [52, 14, 73, 11, 6]),
]
)
quotes = new_table(
[
string_col("Ticker", ["AAPL", "AAPL", "IBM", "IBM", "IBM"]),
datetime_col(
"Timestamp",
[
to_j_instant("2021-04-05T09:11:00 ET"),
to_j_instant("2021-04-05T09:30:00 ET"),
to_j_instant("2021-04-05T16:00:00 ET"),
to_j_instant("2021-04-05T16:30:00 ET"),
to_j_instant("2021-04-05T17:00:00 ET"),
],
),
double_col("Bid", [2.5, 3.4, 97, 102, 108]),
int_col("BidSize", [10, 20, 5, 13, 23]),
double_col("Ask", [2.5, 3.4, 105, 110, 111]),
int_col("AskSize", [83, 33, 47, 15, 5]),
]
)
result = trades.raj(table=quotes, on=["Ticker", "Datetime < Timestamp"])
- trades
- quotes
- result
The following example illustrates joining on columns of different names as well as joining a subset of columns.
from deephaven import new_table
from deephaven.column import string_col, int_col, double_col, datetime_col
from deephaven.time import to_j_instant
trades = new_table(
[
string_col("Ticker", ["AAPL", "AAPL", "AAPL", "IBM", "IBM"]),
datetime_col(
"TradeTime",
[
to_j_instant("2021-04-05T09:10:00 ET"),
to_j_instant("2021-04-05T09:31:00 ET"),
to_j_instant("2021-04-05T16:00:00 ET"),
to_j_instant("2021-04-05T16:00:00 ET"),
to_j_instant("2021-04-05T16:30:00 ET"),
],
),
double_col("Price", [2.5, 3.7, 3.0, 100.50, 110]),
int_col("Size", [52, 14, 73, 11, 6]),
]
)
quotes = new_table(
[
string_col("Ticker", ["AAPL", "AAPL", "IBM", "IBM", "IBM"]),
datetime_col(
"QuoteTime",
[
to_j_instant("2021-04-05T09:11:00 ET"),
to_j_instant("2021-04-05T09:30:00 ET"),
to_j_instant("2021-04-05T16:00:00 ET"),
to_j_instant("2021-04-05T16:30:00 ET"),
to_j_instant("2021-04-05T17:00:00 ET"),
],
),
double_col("Bid", [2.5, 3.4, 97, 102, 108]),
int_col("BidSize", [10, 20, 5, 13, 23]),
double_col("Ask", [2.5, 3.4, 105, 110, 111]),
int_col("AskSize", [83, 33, 47, 15, 5]),
]
)
result = trades.raj(table=quotes, on=["Ticker"], joins=["Bid", "Offer = Ask"])
- trades
- quotes
- result