Filters
This guide focuses on using filters within query strings. Unlike formulas, which add new columns to a table, filters are boolean expressions used to create a new table that is a subset of an existing table. Filters are employed in the following table operations:
Additionally, filters can be used in partitioned table operations.
What is a filter?
A filter is a query string expression that evaluates to true or false. The filter expression is applied to every row in a table and can be used to create a new table that is a subset of an existing table.
Filters can be applied to columns of any data type. For instance, numeric filters can keep only rows that match a specific value, are in a range, or are even/odd:
String filtering provides significant flexibility. You can search for full string matches, substring matches, use regex, and more. The following example shows several string filters:
Types of filters
Match filters
A match filter evaluates to true if the column value matches one specified in the filter.
Match filters can search for a single-value match:
Match filters can be applied to a column of any data type:
Match filters can also search for multiple values. This applies to columns of any data type. Deephaven offers some special match filters to do just this:
in: Returns only rows where the column matches one of the specified values.not in: Returns only rows where the column does not match any of the specified values.
Note
Match filters created using the equality operators (=, == or !=) follow standard IEEE 754 rules for handling NaN values. Any comparison involving NaN returns false, except for !=, which returns true for all values.
In contrast, match filters created with set inclusion syntax (in, not in) will match NaN values. For example: value in NaN, 10.0 will return true if value is NaN or 10.0. Alternatively, you can use the isNaN(value) function to explicitly test for NaN values such as isNaN(value) || value < 10.0.
Range filters
Range filters evaluate to true if the column value is within a specified range. This type of filter is typically applied to numeric columns but can be applied to any column that supports comparison operators.
Note
Null values are considered less than any non-null value for sorting and comparison purposes. Therefore, < and <= comparisons will always include null. To prevent this behavior, you can add an explicit null check; for example: !isNull(value) && value < 10.
Note
Comparison operators on floating-point values follow standard IEEE 754 rules for handling NaN values. Any comparison involving NaN returns false, except for !=, which returns true for all values. To include NaN values in your comparisons, use the isNaN(value) function to explicitly test for NaN values, such as isNaN(value) || value < 10.0.
Both result_range and result_inrange can instead be implemented by conjunctively combining two separate range filters:
You can also filter for data that is not in a range by using the ! operator or by disjunctively combining two separate range filters:
String filters
String filters return only rows that match the specified criteria for string columns. As such, string filters can use any java.lang.String methods.
The following example shows several string filters:
Deephaven offers some additional special string match filters:
in: Returns only rows where the string column matches one of the specified values.icase in: Returns only rows where the string column matches one of the specified values, ignoring case.not in: Returns only rows where the string column does not match any of the specified values.icase not in: Returns only rows where the string column does not match any of the specified values, ignoring case.includes any: Returns only rows where the string column contains any of the specified substrings.includes all: Returns only rows where the string column contains all of the specified substrings.
Combine filters
Filters can be combined conjunctively or disjunctively.
Conjunctive
Conjunctive filters return only rows that match all of the specified filters. There are two ways to conjunctively combine filters:
- Pass a single query string with multiple filters separated by the
&&operator into one of the following table operations: - Pass multiple query strings into one of the following table operations:
Disjunctive
Disjunctive filters return only rows that match any of the specified filters. There are two ways to disjunctively combine filters:
- Pass a single query string with multiple filters separated by the
||operator into one of the following table operations: - Pass multiple query strings into the following table operation:
Filter performance
How you structure filter clauses can affect query performance. The following guidelines help optimize filter execution.
These guidelines provide useful rules of thumb, but they won't perform best for all filters and data combinations. Filters exercise many aspects of the Deephaven engine: each filter constructs a RowSet representing the rows that pass, and rowset construction can be expensive. Simple filters (like match or range filters) may take advantage of optimizations like Parquet row group statistics. To evaluate a filter, data must be read from its source (e.g., disk or a network server). For important queries, measure performance to determine the optimal order and structure of filters.
Combine filters on the same column
When filtering the same column multiple times, combine the conditions in a single clause. This reads the column data once instead of multiple times:
Both produce the same result, but result_combined is more efficient because it evaluates both conditions in a single pass over the data.
Separate filters on different columns
When filtering different columns, separating them into different clauses can improve performance. The engine evaluates each clause sequentially, so earlier filters reduce the data that later filters must examine:
Note
The performance benefit of separating filters on different columns depends on the selectivity of the filters. If the first filter removes most rows, subsequent filters have less work to do. However, if filters are not very selective, combining them may perform similarly.
Order matters
Put more selective filters first. A filter that eliminates most rows early reduces the work for subsequent filters:
Keep match filters separate from formulas
Match filters (equality checks like X == 5 or X in 1, 2, 3) are optimized differently than formula filters (expressions like X > 5 && X < 10). When combining them in a single clause, the match filter optimization may not apply: