Understand the power of whereIn
whereIn
and whereNotIn
The whereIn
and whereNotIn
methods enable you to filter one table based on the contents of another table, which may or may not contain ticking data.
A where
clause is evaluated only when a row in the filtered table ticks. whereIn
and whereNotIn
are evaluated whenever either table changes. Join expressions, such as join
and naturalJoin
, are also evaluated when either table changes.
Sample syntax:
validResults = tableToFilter.whereIn(validValuesTable, "ColName")
The "tableToFilter" is the left table; the "validValuesTable" table is the right table. This query will filter the specified column in "tableToFilter" to only the values in the "validValueTables" and store the results in a new table, "validResults".
The whereNotIn
syntax is identical:
invalidResults = tableToFilter.whereNotIn(validValuesTable, "ColName")
As you can imagine, returns the values from the specified column that do not appear in our "validValuesTable", or the 'invalid' results.
You can also specify more than one column to filter, and the names need not be the same.
For example, if we wanted to match on USym and Expiry, but the tableToFilter table had a column named Maturity instead, we could use the following construct:
validResults=tableToFilter.whereIn(validValueTables, "USym", "Maturity=Expiry")
whereIn
is not appropriate for all situations. Its purpose is to enable more efficient filtering for a set that changes infrequently. Any time the right table (in this example "validValuesTable") ticks, all rows of the left table ("tableToFilter") must be re-evaluated. If you have a right table that often ticks, you should use a naturalJoin
instead.
Example
To demonstrate, we will create a "usymsOfInterest" table. If this were a ticking table, its contents could vary as it returns the first three distinct USyms at any given time.
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
usymsOfInterest = trades.countBy("Count", "USym").sortDescending(“Count”).head(3)
If you have another table of stock data (that might include thousands of USyms), you might want to filter that table to only show the USyms that you care about — the ones in your "usymsOfInterest" table. You can do this with whereIn()
:
interestingStockQuotes = trades.whereIn(usymsOfInterest, "USym")
This will filter the large trades table accordingly. We used the Advanced Filters dialog as a quick way to confirm "interestingStockQuotes" is limited to three USyms in our "usymsOfInterest" table.
Similarly, whereNotIn
can be used to return rows from the left table that have no match in the right table.
invalid=trades.whereNotIn(usymsOfInterest, "USym")
In this case, the invalid table contains only rows with a USym value that is not in the "usymsOfInterest" list.
whereIn
vs. naturalJoin
Unlike naturalJoin
, whereIn
can be used when there is more than one matching value in the right table for values in the left table. This is true of join
as well, but whereIn
is faster to return matching rows than join
.
Also, whereIn
only provides filtering, and does not allow adding columns from the right table. In some cases it may be desirable to use whereIn
to filter and then join
to add columns from the right table. This provides similar performance to naturalJoin
while still allowing multiple matches from the right table.
Tip
A good rule of thumb is to use whereIn
when your list of values (your right table) stays relatively stable.
The whereIn
call maintains a minimal amount of state. The set of unique values in the right table is maintained in memory. Only the Index of rows that currently match the whereIn filter is maintained for the left table. When the valid set changes, the columns that you are using to filter the left table must be scanned. When the right table changes infrequently, this is a useful trade-off. The left table is seldom scanned, and less memory and computation is used on each update.
naturalJoin
, on the other hand, must maintain a state entry for all values in both the left and right tables. This includes all keys that currently pass the filter, but also all keys that do not pass the filter. For each key, an Index data structure is maintained for the left and right side. When a value enters the right table, all of the matching left hand side rows are available and can be augmented with the corresponding right hand side row. When a value leaves the right table, all of the matching left hand side rows are available and the augmentation can be removed. This means that processing each individual left hand side update is more expensive, but the left hand table need not be scanned when the right hand side table changes.
Traders or firms often have a fairly stable set of symbols that they have a position in. Using a whereIn
clause to filter other tables to those positions is generally efficient. The query need not maintain excess state for the rare case in which the valid symbols change. However, when the positions do change, full table scans are required.
On the other hand, the set of open order IDs changes frequently. In this case, filtering a table by open order ID using whereIn
would require frequent scans. A naturalJoin
would require more in-memory state, but would be able to compute the results more incrementally.