---
title: Filter
---

> [!WARNING]
> Legacy documentation: This documentation applies to **Legacy Deephaven Enterprise only** and does not apply to Core+.

> [!NOTE]
> For Core+ workers, see Community Core documentation for [where](/core/docs/reference/table-operations/filter/where).

Filtering is a way to exclude data you do not want to see or use in your analysis. Because less data gets loaded, filtering reduces the processing times for queries so you can work faster.

Filtering is accomplished using the where method and other related methods described below

> [!TIP]
> We recommended that all your queries include a filter on the table's partitioning column.

## `where()`

The `where` method returns all the rows from the source table that meet the conditions in the argument:

    .where("USym = `AAPL`")

For example:

```python
t = (
    db.t("LearnDeephaven", "StockQuotes")
    .where("Date = `2017-08-25`")
    .where("USym = `AAPL`")
)
```

You could also write this like:

```python
t = db.t("LearnDeephaven", "StockQuotes").where("Date = `2017-08-25`", "USym = `AAPL`")
```

**Filters are applied from left to right.** Therefore, the order in which they are passed to the function can have a substantial impact on the execution time.

For example, if we moved the "USym" filter first, Deephaven would evaluate that filter for all partitions of the table before reducing the data to a single date.

> [!CAUTION]
> Example of bad ordering
>
> ```python
> t = db.t("LearnDeephaven", "StockQuotes").where("USym = `AAPL`", "Date = `2017-08-25`")
> ```

So far, the examples used for describing the `where` method have involved a table with only a couple thousand rows of data. Filtering on such a small table is a trivial task for Deephaven. However, when your table size grows to thousands or millions of rows (or more), you will want to ensure you are filtering the data in the most efficient method to reduce compute expense and execution time.

## Partitioning and Grouping Columns

When you partition a hard drive, you are dividing the entire hard drive into multiple, logical, independent volumes. In a high-end data system like Deephaven, data is also divided into multiple, logical and independent partitions, which improves the speed, performance and efficiency of the data analysis.

The most common **partitionioning column** is Date. For example, the "StockTrades" table in the "LearnDeephaven" namespace contains data from August 21, 2017 - August 25, 2017. Instead of having one massive file that contains every record for that business week, the data is stored in 5 smaller partitions. In most cases, you will want to analyze only a specific range of time. In Deephaven, you can do so without analyzing or loading the entire dataset stored in that table.

The Deephaven query language is designed to make filtering on partitioning columns highly efficient. Therefore, when possible, one should filter first on partitioning columns.

**Grouping columns** determine how table data is organized on disk such that each unique value, and its related row data, are located in contiguous blocks of rows. These have persistent indexes that the Deephaven query engine can leverage to vastly improve the efficiency of match filters.

If your analyses require filtering on the data contained in both partitioning columns and grouping columns, the most efficient method would be to first filter on partitioning columns, and then separately filter on grouping columns and/or any remaining columns as needed.
To determine the partitioning and grouping columns in your dataset, you will need to review the metadata for your applicable table(s).

One way this can be accomplished is by using the `getMeta` method, which will return a table with column details:

```python
tMeta = db.t("LearnDeephaven", "StockTrades").getMeta()
```

> [!TIP]
> If a filter operation is the cause of a failed query, the exception in the Deephaven interface will prompt you to clear filters with the accompanying **Clear Filtering** button or access to the right-click menu.

## Match Filters

Because match filters benefit from special handling in Deephaven, you should use them whenever possible and before any conditional filters. Match filters often enable Deephaven to perform optimizations that would not be possible with a potentially more expressive conditional filter. For example, grouping information on historical data sources can be used with match filters, but not conditional filters.

There are five kinds of match filters in Deephaven:

- `=` (the equal sign)
- `in`
- `not in`
- `icase in`
- `icase not in`

> [!CAUTION]
> Match filters must be used individually. They cannot be combined in the same filter string.

### `=` (the equal sign)

This method returns rows that have a matching value in the specified column.

`.where("columnName=value")`

**Example**

```python
t = db.t("LearnDeephaven", "StockQuotes").where("Date = `2017-08-25`", "USym = `AAPL`")
```

### `in`

This method returns rows that contain a match of one or more values in the specified column.

`.where("columnName in valueList")`

**Example**

```python
t = (
    db.t("LearnDeephaven", "StockQuotes")
    .where("Date = `2017-08-25`")
    .where("USym in `AAPL` , `GOOG`")
)
```

### `not in`

This method returns rows that do not contain a match of one or more values in the specified column.

`.where("columnName not in valueList")`

**Example**

```python
t = (
    db.t("LearnDeephaven", "StockQuotes")
    .where("Date = `2017-08-25`")
    .where("USym not in `AAPL` , `GOOG`")
)
```

> [!TIP]
> A backtick is used for a string-within-a-string; e.g., ``“Date=`some-other-string`”``. A single quotation mark is used for a single character; e.g., ‘a’, as opposed to a string containing a single character, “a”

### `icase in`

This method returns rows that contain a match of one or more values in the specified column regardless of the capitalization of the values.

`.where("columnName icase in valueList")`

**Example**

```python
t = (
    db.t("LearnDeephaven", "StockQuotes")
    .where("Date = `2017-08-25`")
    .where("USym icase in `aapl` , `goog`")
)
```

### `icase not in`

This method returns rows that do not contain a match of one or more values in the specified column regardless of the capitalization of the values.

`.where("columnName icase not in valueList")`

**Example**

```python
t = (
    db.t("LearnDeephaven", "StockQuotes")
    .where("Date = `2017-08-25`")
    .where("USym icase not in `aapl` , `goog`")
)
```

### Using Variables within Match Filters

The right side of a match filter using the "`in`" statement allows for use of variables as follows:

- list of variables: `A in X, Y, Z` - the filter will return `true` for all the rows where `A` is equal to `X`, `Y`, or `Z`
- single variable name: `A in X`:
  - If `X` is a Java array or `java.util.Collection`, the filter will return `true` for all the rows where `A` is equal to one element of `X`
  - For all other types of `X`, the filter will return `true` for all the rows where `A` is equal to `X`

Conversely, the right side of a "`not in`" statement allows for use of variables as follows:

- list of variables: `A not in X, Y, Z` - the filter will return `true` for all the rows where `A` is not equal to `X`, `Y`, or `Z`
- single variable name `A not in X`:
  - If `X` is a Java array or Collection, the filter will return `true` for all the rows where `A` is not equal to one element of `X`
  - For all other types of `X`, the filter will return `true` for all the rows where `A` is not equal to `X`

## Conditional Filters

Conditional filters can be used to filter data based on formulas other than those included in match filters. Conditional filters are not optimized like match filters. Therefore they should be placed after match filters in a given `where` clause. Conditional filters can be any arbitrary Java expression evaluator, including:

- `where("x==y")`
- `where("x>y")`
- `where("x<y")`
- `startsWith()`
- `endsWith()`

#### Example 1

```python
t1 = db.t("LearnDeephaven", "StockQuotes").where("Date = `2017-08-25`").where("Bid>30")
```

As you can see below, the lowest value in "t1" is now 30.79.

![img](../../assets/coreops/greaterthan.png)

#### Example 2

```python
t2 = (
    db.t("LearnDeephaven", "StockQuotes").where("Date = `2017-08-25`").where("Bid%2==0")
)
```

![img](../../assets/coreops/even.png)

#### Example 3

The next example uses formulas on multiple columns.

```python
t3 = (
    db.t("LearnDeephaven", "StockQuotes")
    .where("Date = `2017-08-25`")
    .where("BidSize%2==1", "Exchange.startsWith(`N`)")
)
```

![img](../../assets/coreops/multiple.png)

Here, StockQuotes is filtered to rows where:

- the value in the BidSize column has a remainder of 1 after being divided by 2, and
- the value in the Exchange column starts with the string N.

> [!NOTE]
> Because this is Java code, all the normal constraints of Java code must be respected. For example, in the filter `C.startsWith(`AA`)`, the variable `C`, may not be `NULL` otherwise a `NullPointerException` will result. You must keep your condition filter code consistent with your data. If you expect `NULL` values, you must first check for null values (e.g., `C != null && C.startsWith(`AA`)`).

### Filtering with Regexes

When working with Strings, it is often useful to be able to filter your data down using more complicated patterns than 'starts with' or 'ends with.' For these situations, the `RegexFilter` may be useful.

The general syntax follows:

`.where(new com.illumon.iris.db.v2.select.RegexFilter("ColumnName", "<regex>"))`

> [!NOTE]
> If you're not familiar with Regexes, or Regular Expressions, please see the Java documentation on the [`java.util.regex package`](https://docs.oracle.com/javase/8/docs/api/index.html?java/util/regex/package-summary.html); a full treatment of regexes is too extensive to include in this page.

#### Example

```python skip-test
t2 = db.t("LearnDeephaven" , "StockQuotes")\
 .where("Date = `2017-08-25`")\
 .where(new com.illumon.iris.db.v2.select.RegexFilter("USym", "C..O"))
```

This example will find all rows where the USym column has a value starting with `C` and ending with `O`, with exactly two characters in between, such as `CSCO` or `C12O` or even `C O`.

## Conjunctive and Disjunctive Filtering

When filtering on multiple columns in a table, your query can be written so they work on a conjunctive basis or a disjunctive basis.

### Conjunctive

In conjunctive filtering, all filters within a `where()` clause are evaluated. For example, two filters are working conjunctively in the following:

```python
t4 = (
    db.t("LearnDeephaven", "StockQuotes")
    .where("Date = `2017-08-25`")
    .where("Bid>30", "Exchange.startsWith(`N`)")
)
```

In this case, the filter returns rows only when the value in the Bid column is greater than 25, and the value in the Exchange column starts with the string N.

The following sample queries all use equivalent variations of conjunctive filtering:

**Example 1**

```python test-set=1 skip-test
t1 = t.where("A in firstSet", "B in secondSet")
```

`t1` will evaluate, using an optimized match filter, `"A in firstSet"` and then only for values which match the `A` filter, it will evaluate `"B in secondSet"`.

**Example 2**

Assuming `firstSet` and `secondSet` are collections, the query could also be written with a condition filter as:

```python test-set=1 skip-test
t2 = t.where("firstSet.contains(A)", "secondSet.contains(B)")
```

`t2` is logically the same, but instead of using an optimized match filter, it will evaluate the "A" column for each row and pass it to the condition filter `"firstSet.contains(A)"`. For rows that pass this filter, it will evaluate the "B" column, and pass those values to `"secondSet.contains(B)"`.

**Example 3**

This query could also be written as:

```python test-set=1 skip-test
t3 = t.where("firstSet.contains(A) && secondSet.contains(B)")
```

`t3` will produce the same output as `t2`, but the "A" and "B" columns are evaluated and passed into the filter expression (`"firstSet.contains(A) && secondSet.contains(B)"`).

### Disjunctive

In disjunctive filtering, each filter clause is evaluated independently, and results are presented if any of the filters return results.
For example, two filters are working disjunctively in this clause:

```python test-set=2 skip-test
t5 = (
    db.t("LearnDeephaven", "StockQuotes")
    .where("Date = `2017-08-25`")
    .where("Bid>30 || Exchange.startsWith(`N`)")
)
```

In this case, the filter will present rows only when the value in the Bid column is greater than 30, or the value in the Exchange column starts with the string N, or both.

The following syntax can also be used to indicate disjunctive filtering:

```python test-set=2 skip-test
t6 = t.whereOneOf("Bid>25", "Exchange.startsWith(`N`)")
```

## Composing Complex Where Clauses

- When building a query that applies multiple filters to the same data, the best approach is to chain multiple filter clauses within the same where method.
- When you use any filter on a partitioning column or when you specifically use a match filter on a grouping column, that filter should be an independent clause.
- If you have a complex formula, then it is best to combine the clauses if and only if they reference the same column(s).
- Multiple clauses can be combined into single where method calls or separated into multiple where method calls. The results and processing effort are the same, but multiple where clauses may improve readability of the query.
- Order matters and you should generally put the more selective or more efficient filters first, although the exact performance may vary depending on the filters themselves or the data being filtered. In many cases, doing so allows the query engine to avoid overhead.

The following `where` clauses illustrate these differences:

**Example 1** ✅

`where("Date=`2011-03-21`","USym==`AAPL`", "BidMkt > 100")`

The partitioning column (Date) is evaluated first, followed by the USym filter.

**Example 2** 🚫

`where("USym==`AAPL`", "Date=`2011-03-21`", "BidMkt > 100")`

The USym filter is evaluated first, without having first pruned off irrelevant partitions. Every row of every partition's USym column must be examined, which takes longer and uses more computing power.

**Example 3** 🚫

`where("Date=`2011-03-21`", "BidMkt > 100","USym==`AAPL`")`

The partitioning column (Date) is evaluated first, which is good. However, the USym filter can be executed more efficiently on grouped data, so that filter should be placed ahead of the BidMkt filter.

**Example 4** ✅

`where("Date=`2011-03-21`", "BidMkt > 100 && BidMkt < 200")`

The partitioning column (Date) is evaluated first, and there is only a single formula evaluation for the conditions on BidMck.

**Example 5** ⚠

`where("Date=`2011-03-21`", "BidMkt > 100", "BidMkt < 200")`

The partitioning column (Date) is evaluated first, but two formulas must be evaluated (one for BidMkt > 100, and one for BidMkt < 200) rather than one.

**Example 6** ⚠

`where("Date=`2011-03-21`").where("BidMkt > 100 && BidMkt < 200")`

The partitioning column (Date) is evaluated first, but an intermediate table is created by the secondary `where` clause.

**Example 7** ✅

`where("Date=`2011-03-21`", "USym ==`AAPL`", "BidMkt > 100 && BidMkt < 200")`

The partitioning column (Date) is evaluated first, and the USym filter and the BidMkt filter are independently evaluated in the best order.

**Example 8** 🚫

`where("Date=`2011-03-21`", "USym ==`AAPL`&& BidMkt > 100 && BidMkt > 200")`

The partitioning column (Date) is evaluated first, but "USym == AAPL" cannot be applied as a match filter, because it is part of a more complex formula.

## whereIn and whereNotIn

The `whereIn` and `whereNotIn` methods enable you to filter one table based on the contents of another table, which may or may not contain ticking data.

A `where` clause is evaluated only when a row in the filtered table ticks. `whereIn` and `whereNotIn` are evaluated whenever either table changes. Join expressions, such as `join` and `naturalJoin` are also evaluated when either table changes.

The syntax follows:

`validResults = tableToFilter.whereIn(validValuesTable, "ColName")`

The "tableToFilter" table is the left table; the "validValuesTable" table is the right table. This query will filter the specified column in "tableToFilter" to only the values in the "validValueTables" and store the results in a new table, "validResults".

The `whereNotIn` syntax is identical:

`invalidResults = tableToFilter.whereNotIn(validValuesTable, "ColName")`

As you can imagine, `whereNotIn` returns the values from the specified column that do not appear in our "validValuesTable", or the "invalid" results.

You can also specify more than one column to filter, and the names need not be the same.

For example, if we wanted to match on USym and Expiry, but the "tableToFilter" table had a column named Maturity instead, we could use the following construct:

`validResults=tableToFilter.whereIn(validValueTables, "USym", "Maturity=Expiry")`

> [!TIP]
> `whereIn` is not appropriate for all situations. Its purpose is to enable more efficient filtering for a set that changes infrequently. Any time the right table (in this example, "validValuesTable") ticks, all rows of the left table ("tableToFilter") must be re-evaluated. If you have a right table that often ticks, you should use a `naturalJoin` instead.

**Example**

To demonstrate, we will create a "usymsOfInterest" table. If this were a ticking table, its contents could vary as it returns the first three distinct USyms at any given time. Then, we'll filter another table of stock data (that might include thousands of USyms), to only show the USyms that we care about — the ones in your "usymsOfInterest" table. We'll also create a table with the invalid results.

```python
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")

usymsOfInterest = trades.countBy("Count", "USym").sortDescending("Count").head(3)

interestingStockQuotes = trades.whereIn(usymsOfInterest, "USym")

invalid = trades.whereNotIn(usymsOfInterest, "USym")
```

![img](../../assets/coreops/usymsofinterest.png)

In the image below, we use the **Advanced Filters** dialog as a quick way to confirm "interestingStockQuotes" is limited to the three USyms in our "usymsOfInterest" table.

![img](../../assets/coreops/whereIn.png)

The invalid table contains only rows with a USym value that is not in the "usymsOfInterest" list.

![img](../../assets/coreops/invalidresults.png)

Unlike `naturalJoin`, `whereIn` can be used when there are more than one matching value in the right table for values in the left table. This is true of `join` as well, but `whereIn` is faster to return matching rows than `join`.

Also, `whereIn` only provides filtering, and does not allow adding columns from the right table. In some cases it may be desirable to use `whereIn` to filter and then `join` to add columns from the right table. This provides similar performance to `naturalJoin` while still allowing multiple matches from the right table.

## Head & Tail Filtering

Head and tail filters are used to return the first or last set of rows of a table by specifying the number of rows desired or by specifying the percent of the table.

> [!NOTE]
> See also:
> [Dedicated Aggregators](./aggregate.md#dedicated-aggregators).

The following filters are used to return a specific number of rows:

- `head()`
- `tail()`

The following filters are used to return a specific percentage of rows:

- `headPct()`
- `tailPct()`

For example,

```python skip-test
t = db.t("LearnDeephaven" , "StockQuotes")\
    .where("Date = `2017-08-25`")\
t2 = t.head(10) # returns the first 10 rows of a table
t3 = t.tail(20) # returns the last 20 rows of a table
t4 = t.headPct(0.25) # returns the first 25% of rows contained in a table
t5 = t.tailPct(0.5) # returns the last 50% of rows contained in a table
```

![img](../../assets/coreops/headtail.png)

## DownsampledWhereFilter

The `DownsampledWhereFilter` enables users to downsample time series data by calculating the bin intervals for values, and then using `upperBin` and `lastBy` to select the last row for each bin.

> [!NOTE]
> The column containing the data to be binned must be sorted for the method to work.

```python skip-test
from deephaven import *

downsampledX = x.where(DownsampledWhereFilter("Timestamp", 5 * dbtu.MINUTE))
```

```groovy skip-test
import com.illumon.iris.db.v2.select.DownsampledWhereFilter

downsampledX = x.where(new DownsampledWhereFilter("Timestamp", 5 * MINUTE))
```

The default for this method is to downsample the bins based on `upperBin` and `lastBy`. However, you can downsample the bin based on `lowerBin` and `firstBy` by adding a third argument to the `DownsampledWhereFilter` method. An example follows with the third argument highlighted:

```python skip-test
from deephaven import *

downsampledX = x.where(
    DownsampledWhereFilter(
        "Timestamp", 5 * dbtu.MINUTE, DownsampledWhereFilter.SampleOrder.LOWERFIRST
    )
)
```

```groovy skip-test
import com.illumon.iris.db.v2.select.DownsampledWhereFilter

downsampledX = x.where(new DownsampledWhereFilter("Timestamp", 5 * MINUTE,
    DownsampledWhereFilter.SampleOrder.LOWERFIRST))
```

- `LOWERFIRST` is the constant for `lowerBin`/`firstBy`.
- `UPPERLAST` is the constant for `upperBin`/`lastBy`.

Either constant works in this query. However, if the third argument is not present, the downsampling will occur on an `upperBin`/`lastBy` basis.

> [!WARNING]
> The `DownsampleWhereFilter` cannot be used on ticking tables. To use it against intraday data, one option would be use `false` as a third argument for `db.i()`; e.g., `p=db.i("DbInternal","ProcessEventLog",false).where("Date=currentDateNy()")`.

All the following examples will give you the same result of trades after or including 11am:

```python skip-test
p = db.i("SystemEquity", "TradeData").where(
    "Date=`2012-02-15`", "Timestamp>='2012-02-15T11:00:00 NY'"
)

p = db.i("SystemEquity", "TradeData").where(
    "Date=`2012-02-15`", "Timestamp+'1:00:00'>='2012-02-15T10:00:00 NY'"
)

p = db.i("SystemEquity", "TradeData").where(
    "Date=`2012-02-15`", "Timestamp+'T1h'>='2012-02-15T10:00:00 NY'"
)

p = db.i("SystemEquity", "TradeData").where(
    "Date=`2012-02-15`", "hourOfDay(Timestamp, TZ_NY)>=11"
)
```

> [!NOTE]
> The ticks around the timestamp are regular ticks, not backticks.
