Join tables

Joins -- bringing data from one table to another -- is vital in any data system. Deephaven is no exception.

Because this is Deephaven, joins update in real time as new (and relevant) data becomes available.

Deephaven supports a few different joins:

  • These get used the most:
    • naturalJoin()
    • aj()
  • Other options:
    • join()
    • leftJoin()
    • exactJoin()

At this time, Deephaven does not support an elegant outerJoin(), though your support persons can provide a workaround.

Syntax relevant for all Join methods

The syntax of all Joins follows this pattern (where the joinMethod appears in the list above):

leftTable.joinMethod(rightTable, "Key(s)", "columnsToJoinIn")

Left and right tables

The mental map of laying out two tables and then joining them is often helpful.

The leftTable is the starting base table to which data is joined, and the rightTable is the table from which data comes.

Important

“Join data from the right to the left“ is a concept that will help your debugging and communication with colleagues.

Using quotes

  • Use double quotes around "KeyOnly" or around the set of "Key1, Key2, Key3".
  • Use double quotes around "SingleColToJoin" or around the set of "Col1, Col2, Col3".
  • Again, sets have quotes around the whole set, not each element.

Join Keys

The Join key or keys establishes the identifier in the leftTable that will be used to find data in the rightTable. Any data type can be chosen as a key, and there can be multiple keys.

Think of keys as “columns to match” in all Deephaven Joins.

  1. Keys are chosen based on use cases: “I want to bring data to the table todayTrades, from the table yestTrades, based on the USym of each row (in todayTrades).”

       leftTable = todayTrades

       rightTable = yestTrades

       key = USym

  2. “I want to bring data to the table risk, from the table factorData, based on the Sector and FactorName of each row (in risk).”

       leftTable = risk

       rightTable = factorData

       keys = USym, FactorName

Keys can have different column names in the two tables. The syntax below addresses this:

leftTable.joinMethod(rightTable, "KeyLeftName = KeyRightName")

OR

leftTable.joinMethod(rightTable, "Key1, Key2LeftName=Key2RightName")

OR

left.joinMethod(right, "KeyLeft1=KeyRight1, KeyLeft2=KeyRight2")

Warning

Queries will error out if the identified Key columns do not exist in the respective tables.

Columns to Join (onto the left from the right)

You can choose what columns to bring from the rightTable to the leftTable by populating the aforementioned columnsToJoinIn.

Various alternatives:

  • Bring all the non-Key columns:    leftTable.joinMethod(rightTable, "Key")
  • Bring only the ones I select:    leftTable.joinMethod(rightTable, "Key", "ColOne")
  • Bring several that I select:    leftTable.joinMethod(rightTable, "Key", "ColOne, ColTwo, ColZZZ")
  • Bring and rename the ones I select:    leftTable.joinMethod(rightTable, "Key", "New1=Col1, New2=Col2")

Warning

Queries will error out if either:

  1. One of the columnsToJoinIn from the rightTable already exists in the leftTable (i.e., same name).
  2. The identified column from the rightTable does not exist.

Examples using the syntax above

Below are a series of Join queries using different Join methods and the various combinations of the degrees of freedom documented above:

  • One or multiple Join keys,
  • Differently named columns serving as keys between the left and right tables,
  • Inheriting all, some, or renamed columns from the right as part of the Join process.

Setting up the examples with some source tables:

allQuotesFri = db.t("LearnDeephaven", "StockQuotes").where("Date = `2017-08-25`")

allTradesFri = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-25`")

eodPriceFri = allTradesFri.where("isBusinessTime(ExchangeTimestamp)").lastBy("USym").view("Date", "USym", "Last")

eodPriceFriGoog = eodPriceFri.where("USym = `GOOG`")

eodPriceThurs = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-24`", "isBusinessTime(ExchangeTimestamp)").lastBy("USym").view("Date", "USym", "Last")

eodPriceWed = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-23`", "isBusinessTime(ExchangeTimestamp)").lastBy("USym").view("USym", "Price = Last")

Examples of syntax -- don’t worry yet about what each one does. These all work:

oneKeyAllCols = eodPriceThurs.naturalJoin(eodPriceWed, "USym")

oneKeyChooseCol = eodPriceFri.join(eodPriceFriGoog,"Date", "GoogLast")

oneKeyRenameCols = eodPriceFri.exactJoin(eodPriceThurs, "USym", "Last_Thurs = Last")

multiKeysChooseCols = allTradesFri.aj(allQuotesFri, "Date, USym, ExchangeTimestamp", "Bid, Ask")

Natural Join

The naturalJoin is somewhat like an Excel vlookup or SQL leftJoin.

  • The leftTable always retains the same number of rows with which it started.
  • If there are no matching values for a row, the appended cell(s) from the right table will contain NULL values.
  • The rightTable can only have one row for each key of the join.

Examples of naturalJoin()

Example of a one-key naturalJoin():

leftTable = eodPriceFri
rightTable1 = eodPriceThurs

img

njThem = leftTable.naturalJoin(rightTable1,"USym", "ThursLast = Last")

img

Example of multi-key naturalJoin() with renaming of join columns:

# get Vwap prices by hour
leftTable2 = allTradesFri\
    .where("isBusinessTime(ExchangeTimestamp)")\
    .view("USym", "TimeBin = upperBin(ExchangeTimestamp, HOUR)", "Size", "Dollars = Last * Size")\
    .sumBy("USym", "TimeBin").updateView("Vwap = Dollars / Size")

# get the last price of each hour
rightTable2 = allTradesFri\
    .where("isBusinessTime(ExchangeTimestamp)")\
    .view("USym", "TimeBin = upperBin(ExchangeTimestamp, HOUR)", "Last", "Size")\
    .lastBy("USym", "TimeBin")

img

leftTable2.naturalJoin(rightTable2, "USym, TimeBin",\
    "Last, Last_Size = Size")

img

As-of Join

  • As-of joins [aj()] and reverse-as-of-joins [raj()] are generally considered time series joins.
  • Technically, any sorted column can be used as the (last) join key, but in most cases some version of timestamp will be employed.
  • As-of joins follow the same syntax as other joins, but the last join key is not used to exactly match the two tables.
  • Instead, for that key (i.e., “time” usually), Deephaven finds the value in the rightTable that is closest to the corresponding value in the leftTable without going over (i.e., “same or just-earlier”).
  • Then, as with any other join, the ColumnsToJoinIn are added to the leftTable from the rightTable.

Note

raj() is nearly identical to aj(), except the engine finds the value that is the same or just after the corresponding value (i.e., “same or just-later”).

Examples of aj and raj

Example of one key aj():

# all trades
leftTable3 = db.t("LearnDeephaven", "StockTrades")\
    .where("Date=`2017-08-25`")\
    .view("Sym", "ExchangeTimestamp", "Last")

# just GOOG trades
rightTable3 = db.t("LearnDeephaven", "StockTrades")\
    .where("Date=`2017-08-25`", "USym = `GOOG`")\
    .view("ExchangeTimestamp", "Last_GOOG = Last")

img

ajThem3 = leftTable3.aj(rightTable3, "ExchangeTimestamp")

img

Example of multi key aj() with columnToJoinIn naming:

# join Trades with Quotes

# all trades
leftTable4 = db.t("LearnDeephaven", "StockTrades")\
    .where("Date=`2017-08-25`")\
    .view("USym", "ExchangeTimestamp", "TradePrice = Last")

# all quotes
    rightTable4 = db.t("LearnDeephaven", "StockQuotes")\
    .where("Date=`2017-08-25`")\
    .view("USym", "ExchangeTimestamp", "Bid", "Ask")

img

ajThem4 = leftTable4.aj(rightTable4, "USym, ExchangeTimestamp",\
    "QuoteTime = ExchangeTimestamp, Bid, Ask")\
    .updateView("Mid = (Bid + Ask) / 2")

img

Example of raj() a table on itself:

# trades
leftTable5 = db.t("LearnDeephaven", "StockTrades")\
     .where("Date=`2017-08-25`")
     .view("USym", "ExchangeTimestamp", "TradePrice = Last")

img

# same table just lagged 5 minutes
rightTable5 = leftTable\
     .updateView("TimeLag5min = ExchangeTimestamp - 5 * MINUTE")
     ajOnSameTable = leftTable5.raj(rightTable5,\
     "USym, ExchangeTime = TimeLag5min",\
     "TimeLater = TimeLag5min, PriceLater = TradePrice")

img

Other Joins

Additional methods are available:

See our Table Operations guide for full documentation.

All follow the syntax described at the top of this document:

leftTable.joinMethod(rightTable, "Key(s)", "ColumnsToJoinIn")

where each is simply placed in the place of joinMethod.

Read on for quick, conceptual summaries for each.

join()

Two scenarios that differ from naturalJoin():

  1. There is no match for a key from the leftTable in the rightTable.
    • For naturalJoin(), the joined columns are null.
    • For join(), the row [from the leftTable] is removed from the result.
  2. There are multiple matches for a key between the two tables.
    • For naturalJoin(), the query fails.
    • For join(), multiple rows are produced -- i.e., the leftTable row is repeated for each of the rightTable matches.

It is possible, therefore, for the results of a join() operation to have less rows or more rows than are present in the leftTable.

exactJoin()

  • Requires exactly one match from the rightTable for each of the rows in the leftTable.
  • If there are zero or multiple matches, the operation will fail.

leftJoin()

  • This returns the exact rows of the leftTable.
  • The data joined in from the rightTable are grouped into arrays of data.

So, for example, a leftTable of end-of-day trade-count could be leftJoin-ed by USym with a rightTable of trades where the Price column would be an array.

leftTable = trades.countBy("TradeCount", "USym")
rightTable = trades\
    .where("isBusinessTime(ExchangeTimestamp)")\
    .renameColumns("Price = Last")
joinThem = leftTable.leftJoin(rightTable, "USym", "Price")

img