Join

Joining data from one table to another is vital in any data system. Deephaven provides several join methods to choose from depending on your specific needs. In Deephaven, joins update in real time as new data becomes available:

  • Natural Join - naturalJoin()
  • As-of Join / Reverse-as-of Join - aj() and raj()
  • Exact Join - exactJoin()
  • Left Join - leftJoin()
  • Join - join ()

The syntax of all joins follows this pattern (where the joinMethod appears in the list above):

leftTable.joinMethod(rightTable, "Key(s)", "Column(s)ToJoinIn")

You may specify multiple ColumnsToJoinIn. Sets have quotes around the whole set, not each element:

  • Use double quotes around a single key - “Key” - or around a set - “Key1, Key2, Key3”.
  • Use double quotes around a single "ColumnToJoinIn” or around the set of “Col1, Col2, Col3”.

We'll discuss these general syntax components below.

Concepts

Left Table and Right Table

The leftTable is the starting base table to which data is joined, and the rightTable is the table from which data comes.

Tip

Join data from the right to the left.

Join Keys

The join key or keys establishes the identifier in the leftTable that will be used to find data in the rightTable. Any data type can be chosen as a key, and there can be multiple keys.

Think of keys as “columns to match” in all Deephaven Joins.

  1. Keys are chosen based on use cases: “I want to bring data to the table todayTrades, from the table yestTrades, based on the USym of each row (in todayTrades).”

       leftTable = todayTrades

       rightTable = yestTrades

       key = USym

  2. “I want to bring data to the table risk, from the table factorData, based on the Sector and FactorName of each row (in risk).”

       leftTable = risk

       rightTable = factorData

       keys = USym, FactorName

Keys can have different column names in the two tables. To rename columns within your query, use one of the following options:

  • leftTable.joinMethod(rightTable, "KeyLeftName = KeyRightName")
  • leftTable.joinMethod(rightTable, "Key1, Key2LeftName=Key2RightName")
  • left.joinMethod(right, "KeyLeft1=KeyRight1, KeyLeft2=KeyRight2")

Warning

Queries will error out if the identified Key columns do not exist in the respective tables.

Columns to Join In

You can choose what columns to bring from the rightTable to the leftTable by populating the ColumnsToJoinIn argument.

  • Bring all the non-Key columns:    leftTable.joinMethod(rightTable, "Key")
  • Bring only the ones I select:    leftTable.joinMethod(rightTable, "Key", "Col1")
  • Bring several that I select:    leftTable.joinMethod(rightTable, "Key", "Col1, Col2, ColZ")
  • Bring and rename the ones I select:    leftTable.joinMethod(rightTable, "Key", "New1=Col1, New2=Col2")

Warning

Queries will error out if either:

  1. One of the columnsToJoinIn from the rightTable already exists in the leftTable (i.e., same name).
  2. The identified column from the rightTable does not exist.

Examples

Continue on to learn about each distinct join method and their particularities.

Natural Join

The naturalJoin is somewhat like an Excel vlookup or SQL leftJoin:

  • The leftTable always retains the same number of rows with which it started.
  • If there are no matching values for a row, the appended cell(s) from the rightTable will contain NULL values.
  • The right side table can only have one row for each key of the join.

Syntax

leftTable.naturalJoin(rightTable, "Key", "ColumnsToJoinIn")

One-key naturalJoin()

leftTable = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-25`" , "isBusinessTime(ExchangeTimestamp)").lastBy("USym").view("Date", "USym", "Last")

rightTable = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-24`", "isBusinessTime(ExchangeTimestamp)").lastBy("USym").view("Date", "USym", "Last")

njThem = leftTable.naturalJoin(rightTable,"USym", "ThursLast = Last")

img

Multi-key naturalJoin()

# get Vwap prices by hour
leftTable2 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-25`" , "isBusinessTime(ExchangeTimestamp)")\
   .view("USym", "TimeBin = upperBin(ExchangeTimestamp, HOUR)", "Size", "Dollars = Last * Size")\
   .sumBy("USym", "TimeBin").updateView("Vwap = Dollars / Size")

# get the last price of each hour
rightTable2 = db.t("LearnDeephaven", "StockTrades").where("Date = `2017-08-25`" , "isBusinessTime(ExchangeTimestamp)")\
    .view("USym", "TimeBin = upperBin(ExchangeTimestamp, HOUR)", "Last", "Size")\
    .lastBy("USym", "TimeBin")

njThem2 = leftTable2.naturalJoin(rightTable2, "USym, TimeBin",\
	"Last, Last_Size = Size")

img

As-of Join and Reverse-as-of Join

Unique to Deephaven, the As-of Join and the Reverse As-Of Join methods enable users to join data from tables that do not have an exact match in the key argument. This is especially useful when analyzing temporal data.

  • As-of Join allows you to match on the closest value that is less than or equal to.
  • Reverse As-of Join allows you to match on the closest value that is equal to or greater than.

Syntax

As-of joins follow the same syntax as other joins, but the method usually involves two or more matching columns.

.aj(rightTable, "Key, LastKey", "ColumnsToJoinIn")

The initial key(s) is used to perform an exact match in both the left table and the right table, while the last key does not need to be an exact match. Instead, for that key, Deephaven finds the value in the right table that is closest to the corresponding value in the left table without going over (i.e. “same or just-earlier”). Technically, any sorted column can be used as the (last) join key, but in most cases some version of timestamp will be employed.

One-key aj()

leftTable3 = db.t("LearnDeephaven", "StockTrades")\
   .where("Date=`2017-08-25`")\
   .view("Sym", "ExchangeTimestamp", "Last")

rightTable3 = db.t("LearnDeephaven", "StockTrades")\
   .where("Date=`2017-08-25`", "USym = `GOOG`")\
   .view("ExchangeTimestamp", "Last_GOOG = Last")

ajThem3 = leftTable3.aj(rightTable3, "ExchangeTimestamp")

img

Multi-key aj()

leftTable4 = db.t("LearnDeephaven", "StockTrades")\
   .where("Date=`2017-08-25`")\
   .view("USym", "ExchangeTimestamp", "TradePrice = Last")

rightTable4 = db.t("LearnDeephaven", "StockQuotes")\
   .where("Date=`2017-08-25`")\
   .view("USym", "ExchangeTimestamp", "Bid", "Ask")

ajThem4 = leftTable4.aj(rightTable4, "USym, ExchangeTimestamp",\
    "QuoteTime = ExchangeTimestamp, Bid, Ask")\
    .updateView("Mid = (Bid + Ask) / 2")

img

raj() a table on itself

leftTable5 = db.t("LearnDeephaven", "StockTrades")\
    .where("Date=`2017-08-25`")\
    .view("USym", "ExchangeTimestamp", "TradePrice = Last")

# same table just lagged 5 minutes
rightTable5 = leftTable5\
    .updateView("TimeLag5min = ExchangeTimestamp - 5 * MINUTE")

ajOnSameTable = leftTable5.raj(rightTable5,\
    "USym, ExchangeTimestamp = TimeLag5min",\
    "TimeLater = TimeLag5min, PriceLater = TradePrice")

img

img

Exact Join

The Exact Join method:

  • Requires exactly one match from the rightTable for each of the rows in the leftTable.
  • If there are zero or multiple matches, the operation will fail.

Syntax

leftTable.exactJoin(rightTable,"Key","ColumnsToJoinIn")

from deephaven import *
leftTable6 = db.t("LearnDeephaven", "StockTrades")\
   .where("Date=`2017-08-25`")\
   .headBy(3, "Sym")\
   .view("Sym", "Last", "Size")
rightTable6 = leftTable6.by(caf.AggCombo(caf.AggAvg("AvgPrice=Last"), caf.AggSum("TotalShares=Size")), "Sym").update("TotalShares=(long)TotalShares")
ejThem =leftTable6.exactJoin(rightTable6, "Sym", "AvgPrice, TotalShares")

Join

When using the join method, Deephaven compares the leftTable to the rightTable, and for every instance that matches the key argument for both tables, Deephaven will then "join" the data requested from the right table to the left table.

  • If there is no match for a key from the leftTable in the rightTable, the row from the leftTable is removed from the result. (Note that for naturalJoin(), the joined columns are null.)
  • If there are multiple matches for a key between the two tables, multiple rows are produced; i.e., the leftTable row is repeated for each of the rightTable matches. (Note that for naturalJoin(), this query would fail.)
  • It is possible, therefore, for the results of a join() operation to have less rows or more rows than are present in the left table.

Syntax

leftTable.join(rightTable, "Key", "ColumnsToJoinIn", numRightBitsToReserve)

Note the optional argument numRightBitsToReserve. This is the number of bits to reserve for the right table rows. The default is 10; see the OutOfKeySpaceException section below for more details.

One special case of the join method is a cross-join, or cartesian product, which takes the following form:

crossJoin = leftTable.join(rightTable,"", "Col1,Col2")

Note that the second argument - the key - is empty. The result of this query is a table with every possible combination of rows from the two tables.

Note

The query below uses "rightTable6" from the previous example.

from deephaven import *

leftTable7 = db.t("LearnDeephaven", "StockTrades")\
   .where("Date=`2017-08-25`")\
   .headBy(3, "Sym")\
   .view("Sym", "Last", "Size")

rightTable7 = rightTable6.tail(9)

j1 = leftTable7.join(rightTable7, "Sym", "AvgPrice, TotalShares")

duplicate = ttools.emptyTable(1).updateView("Sym=`MSFT`", "AvgPrice=72.9", "TotalShares=(long)23")
# Note: using (long)23 ensures 23 is interpreted as type long

rtDup = ttools.merge(rightTable6, duplicate)

j2 = leftTable7.join(rtDup, "Sym", "AvgPrice, TotalShares")

Troubleshooting an OutOfKeySpace Exception

To efficiently produce updates, the bits that represent a key for a given result row are split into two. Unless specified, join reserves 10 bits to represent a right row. When there are too few bits to represent all of the right rows for a given aggregation group the table will shift a bit from the left side to the right side. The default of 10 bits was carefully chosen because it results in an efficient implementation to process live updates.

An OutOfKeySpaceException is thrown when the total number of bits needed to express the result table exceeds that needed to represent Long.MAX_VALUE. The exception provides details such as how much key-space the left and right tables are using, how sparse the tables appear to be, and how many of the bits in the result key-space are being used for each table.

There are a few workarounds:

  • If the leftTable is sparse, consider flattening the leftTable. To flatten a table invoke the no-argument flatten() method.
  • If there are no key-columns and the rightTable is sparse, consider flattening the rightTable. (Note: when there are key-columns, flattening the rightTable will not affect bits allocated to right rows.)
  • If the maximum size of a rightTable's group is small, you can reserve fewer bits by setting numRightBitsToReserve on initialization.

Left Join

  • The leftJoin() method returns the exact rows of the leftTable.
  • The data joined in from the rightTable are grouped into arrays of data.

Syntax

.leftJoin(rightTable, "Key", "ColumnsToJoinIn")

For example, a leftTable of end-of-day trade-count could be leftJoin-ed by USym with a rightTable of trades where the Price column would be an array.

leftTable = trades.countBy("TradeCount", "USym")

rightTable = trades\
    .where("isBusinessTime(ExchangeTimestamp)")\
    .renameColumns("Price = Last")

joinThem = leftTable.leftJoin(rightTable, "USym", "Price")

img