Join tables
Joins -- bringing data from one table to another -- is vital in any data system. Deephaven is no exception.
Because this is Deephaven, joins update in real time as new (and relevant) data becomes available.
Deephaven supports a few different joins:
- These get used the most:
naturalJoin()
aj()
- Other options:
join()
leftJoin()
exactJoin()
At this time, Deephaven does not support an elegant outerJoin()
, though your support persons can provide a workaround.
Syntax relevant for all Join methods
The syntax of all Joins follows this pattern (where the joinMethod
appears in the list above):
leftTable.joinMethod(rightTable, "Key(s)", "columnsToJoinIn")
Left and right tables
The mental map of laying out two tables and then joining them is often helpful.
The leftTable
is the starting base table to which data is joined, and the rightTable
is the table from which data comes.
Important
“Join data from the right to the left“ is a concept that will help your debugging and communication with colleagues.
Using quotes
- Use double quotes around
"KeyOnly"
or around the set of"Key1, Key2, Key3"
. - Use double quotes around
"SingleColToJoin"
or around the set of"Col1, Col2, Col3"
. - Again, sets have quotes around the whole set, not each element.
Join Keys
The Join key or keys establishes the identifier in the leftTable
that will be used to find data in the rightTable
. Any data type can be chosen as a key, and there can be multiple keys.
Think of keys as “columns to match” in all Deephaven Joins.
-
Keys are chosen based on use cases: “I want to bring data to the table todayTrades, from the table yestTrades, based on the USym of each row (in todayTrades).”
leftTable = todayTrades
rightTable = yestTrades
key = USym
-
“I want to bring data to the table risk, from the table factorData, based on the Sector and FactorName of each row (in risk).”
leftTable = risk
rightTable = factorData
keys = USym, FactorName
Keys can have different column names in the two tables. The syntax below addresses this:
leftTable.joinMethod(rightTable, "KeyLeftName = KeyRightName")
OR
leftTable.joinMethod(rightTable, "Key1, Key2LeftName=Key2RightName")
OR
left.joinMethod(right, "KeyLeft1=KeyRight1, KeyLeft2=KeyRight2")
Warning
Queries will error out if the identified Key columns do not exist in the respective tables.
Columns to Join (onto the left from the right)
You can choose what columns to bring from the rightTable
to the leftTable
by populating the aforementioned columnsToJoinIn
.
Various alternatives:
- Bring all the non-Key columns:
leftTable.joinMethod(rightTable, "Key")
- Bring only the ones I select:
leftTable.joinMethod(rightTable, "Key", "ColOne")
- Bring several that I select:
leftTable.joinMethod(rightTable, "Key", "ColOne, ColTwo, ColZZZ")
- Bring and rename the ones I select:
leftTable.joinMethod(rightTable, "Key", "New1=Col1, New2=Col2")
Warning
Queries will error out if either:
- One of the
columnsToJoinIn
from therightTable
already exists in theleftTable
(i.e., same name). - The identified column from the
rightTable
does not exist.
Examples using the syntax above
Below are a series of Join queries using different Join methods and the various combinations of the degrees of freedom documented above:
- One or multiple Join keys,
- Differently named columns serving as keys between the left and right tables,
- Inheriting all, some, or renamed columns from the right as part of the Join process.
Setting up the examples with some source tables:
allQuotesFri = db.t("LearnDeephaven", "StockQuotes").where("Date = `2017-08-25`")
allTradesFri = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-25`")
eodPriceFri = allTradesFri.where("isBusinessTime(ExchangeTimestamp)").lastBy("USym").view("Date", "USym", "Last")
eodPriceFriGoog = eodPriceFri.where("USym = `GOOG`")
eodPriceThurs = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-24`", "isBusinessTime(ExchangeTimestamp)").lastBy("USym").view("Date", "USym", "Last")
eodPriceWed = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-23`", "isBusinessTime(ExchangeTimestamp)").lastBy("USym").view("USym", "Price = Last")
Examples of syntax -- don’t worry yet about what each one does. These all work:
oneKeyAllCols = eodPriceThurs.naturalJoin(eodPriceWed, "USym")
oneKeyChooseCol = eodPriceFri.join(eodPriceFriGoog,"Date", "GoogLast")
oneKeyRenameCols = eodPriceFri.exactJoin(eodPriceThurs, "USym", "Last_Thurs = Last")
multiKeysChooseCols = allTradesFri.aj(allQuotesFri, "Date, USym, ExchangeTimestamp", "Bid, Ask")
Natural Join
The naturalJoin
is somewhat like an Excel vlookup or SQL leftJoin.
- The
leftTable
always retains the same number of rows with which it started. - If there are no matching values for a row, the appended cell(s) from the right table will contain NULL values.
- The
rightTable
can only have one row for each key of the join.
Examples of naturalJoin()
Example of a one-key naturalJoin()
:
leftTable = eodPriceFri
rightTable1 = eodPriceThurs
njThem = leftTable.naturalJoin(rightTable1,"USym", "ThursLast = Last")
Example of multi-key naturalJoin()
with renaming of join columns:
# get Vwap prices by hour
leftTable2 = allTradesFri\
.where("isBusinessTime(ExchangeTimestamp)")\
.view("USym", "TimeBin = upperBin(ExchangeTimestamp, HOUR)", "Size", "Dollars = Last * Size")\
.sumBy("USym", "TimeBin").updateView("Vwap = Dollars / Size")
# get the last price of each hour
rightTable2 = allTradesFri\
.where("isBusinessTime(ExchangeTimestamp)")\
.view("USym", "TimeBin = upperBin(ExchangeTimestamp, HOUR)", "Last", "Size")\
.lastBy("USym", "TimeBin")
leftTable2.naturalJoin(rightTable2, "USym, TimeBin",\
"Last, Last_Size = Size")
As-of Join
- As-of joins [
aj()
] and reverse-as-of-joins [raj()
] are generally considered time series joins. - Technically, any sorted column can be used as the (last) join key, but in most cases some version of timestamp will be employed.
- As-of joins follow the same syntax as other joins, but the last join key is not used to exactly match the two tables.
- Instead, for that key (i.e., “time” usually), Deephaven finds the value in the
rightTable
that is closest to the corresponding value in theleftTable
without going over (i.e., “same or just-earlier”). - Then, as with any other join, the
ColumnsToJoinIn
are added to theleftTable
from therightTable
.
Note
raj()
is nearly identical to aj()
, except the engine finds the value that is the same or just after the corresponding value (i.e., “same or just-later”).
Examples of aj
and raj
Example of one key aj()
:
# all trades
leftTable3 = db.t("LearnDeephaven", "StockTrades")\
.where("Date=`2017-08-25`")\
.view("Sym", "ExchangeTimestamp", "Last")
# just GOOG trades
rightTable3 = db.t("LearnDeephaven", "StockTrades")\
.where("Date=`2017-08-25`", "USym = `GOOG`")\
.view("ExchangeTimestamp", "Last_GOOG = Last")
ajThem3 = leftTable3.aj(rightTable3, "ExchangeTimestamp")
Example of multi key aj()
with columnToJoinIn
naming:
# join Trades with Quotes
# all trades
leftTable4 = db.t("LearnDeephaven", "StockTrades")\
.where("Date=`2017-08-25`")\
.view("USym", "ExchangeTimestamp", "TradePrice = Last")
# all quotes
rightTable4 = db.t("LearnDeephaven", "StockQuotes")\
.where("Date=`2017-08-25`")\
.view("USym", "ExchangeTimestamp", "Bid", "Ask")
ajThem4 = leftTable4.aj(rightTable4, "USym, ExchangeTimestamp",\
"QuoteTime = ExchangeTimestamp, Bid, Ask")\
.updateView("Mid = (Bid + Ask) / 2")
Example of raj()
a table on itself:
# trades
leftTable5 = db.t("LearnDeephaven", "StockTrades")\
.where("Date=`2017-08-25`")
.view("USym", "ExchangeTimestamp", "TradePrice = Last")
# same table just lagged 5 minutes
rightTable5 = leftTable\
.updateView("TimeLag5min = ExchangeTimestamp - 5 * MINUTE")
ajOnSameTable = leftTable5.raj(rightTable5,\
"USym, ExchangeTime = TimeLag5min",\
"TimeLater = TimeLag5min, PriceLater = TradePrice")
Other Joins
Additional methods are available:
See our Table Operations guide for full documentation.
All follow the syntax described at the top of this document:
leftTable.joinMethod(rightTable, "Key(s)", "ColumnsToJoinIn")
where each is simply placed in the place of joinMethod
.
Read on for quick, conceptual summaries for each.
join()
Two scenarios that differ from naturalJoin()
:
- There is no match for a key from the
leftTable
in therightTable
.- For
naturalJoin()
, the joined columns are null. - For
join()
, the row [from theleftTable
] is removed from the result.
- For
- There are multiple matches for a key between the two tables.
- For
naturalJoin()
, the query fails. - For
join()
, multiple rows are produced -- i.e., theleftTable
row is repeated for each of therightTable
matches.
- For
It is possible, therefore, for the results of a join()
operation to have less rows or more rows than are present in the leftTable
.
exactJoin()
- Requires exactly one match from the
rightTable
for each of the rows in theleftTable
. - If there are zero or multiple matches, the operation will fail.
leftJoin()
- This returns the exact rows of the
leftTable
. - The data joined in from the
rightTable
are grouped into arrays of data.
So, for example, a leftTable
of end-of-day trade-count could be leftJoin-ed by USym with a rightTable
of trades where the Price column would be an array.
leftTable = trades.countBy("TradeCount", "USym")
rightTable = trades\
.where("isBusinessTime(ExchangeTimestamp)")\
.renameColumns("Price = Last")
joinThem = leftTable.leftJoin(rightTable, "USym", "Price")