Filter table data

This guide covers filtering table data in Deephaven. Many different table operations filter data in a table. Some keep data based on conditional formulas, whereas others keep data based on row position.

These table operations keep rows that match one or more conditional filter formulas:

These table operations keep rows based on row position:

These table operations keep rows based on matching values in one or more columns of a separate table:

Conditional filtering

Conditional filtering applies one or more filters to keep rows that meet the specified criteria. Comparison formulas can use match filters, range filters, string filters, and regular expression filters. Comparison formulas must equate to either true or false. When true, Deephaven keeps the row. When false, Deephaven excludes the row.

The following code block imports the Iris dataset found in Deephaven's example repository. Subsequent examples filter this data using comparison formulas:

import static io.deephaven.csv.CsvTools.readCsv

iris = readCsv("https://media.githubusercontent.com/media/deephaven/examples/main/Iris/csv/iris.csv")

Match filters

Match filters set forth one of the following criteria:

  • Equality (= and ==): If a value is equal to another value, the result is true. Otherwise, the result is false.
  • Inequality (!=): If a value is not equal to another value, the result is true. Otherwise, the result is false.
  • in: If a value exists within a set of values, the result is true. Otherwise, the result is false.
  • not in: If a value does not exist within a set of values, the result is true. Otherwise, the result is false.
  • icase in: The same as in, but ignores capitalization of letters. Only use this to filter string columns.
  • icase not in: The same as not in, but ignores capitalization of letters. Only use this to filter string columns.

The following code block applies an equality filter and an inequality filter:

filteredBySepalWidth = iris.where("SepalWidthCM = 3.5")
notVirginica = iris.where("Class != `Iris-virginica`")

The following code block applies in, not in, icase in, and icase not in:

setosaAndVirginica = iris.where("Class in `Iris-setosa`, `Iris-virginica`")
notSetosaOrVirginica = iris.where("Class not in `Iris-setosa`, `Iris-virginica`")
virginica = iris.where("Class icase in `iris-virginica`")
notVersicolor = iris.where("Class icase not in `iris-versicolor`")

Range filters

Range filters keep rows where values fall within a specified range. Range filters use any of the following operators:

  • <: Less than
  • <=: Less than or equal to
  • >: Greater than
  • >=: Greater than or equal to
  • inRange: Checks if a value is in a given range

The following code block applies each operator to filter data:

sepalwidthLessthan3 = iris.where("SepalWidthCM < 3.0")
petallenthGreaterThan2 = iris.where("PetalLengthCM > 2.0")
sepallengthGreaterThanOrEqualTo6 = iris.where("SepalLengthCM >= 6.0")
petalwidthLessThanOrEqualTo1 = iris.where("PetalWidthCM <= 1")
petalwidthInrange = iris.where("inRange(PetalWidthCM, 0, 1)")

String filters

Deephaven stores strings as Java strings. Any method you can call on a Java string can filter string data. Methods such as startsWith, endswith, contains, and matches are particularly useful.

The following code block applies these methods to filter the iris table on its String column, Class:

classStartsWith = iris.where("Class.startsWith(`Iris`)")
classEndsWith = iris.where("Class.endsWith(`setosa`)")
classContains = iris.where("Class.contains(`vir`)")
classMatches = iris.where("Class.matches(`Iris-versicolor`)")

Regular expression (regex) filters

Regular expression filtering is another type of string filtering that uses regular expressions to match patterns in string data. Deephaven's FilterPattern class enables the use of regex in filtering operations.

When performing regex filtering with FilterPattern, you apply a specific pattern mode. Two pattern modes exist:

  • FIND looks for a subsequence match in the string.
  • MATCHES matches an entire string against the given pattern.

The following code block performs regular expression filtering on the iris table with both pattern modes. The first finds strings that are eleven characters long, whereas the second looks for the subsequence virginica in the Class column:

import io.deephaven.api.filter.FilterPattern
import java.util.regex.Pattern

filterElevenChars = FilterPattern.of(ColumnName.of("Class"), Pattern.compile("..........."), FilterPattern.Mode.MATCHES, false)
filterRegexMatch = FilterPattern.of(ColumnName.of("Class"), Pattern.compile("virginica"), FilterPattern.Mode.FIND, false)

irisElevenChars = iris.where(filterElevenChars)
irisVirginica = iris.where(filterRegexMatch)

Combine filters

You can combine multiple match and/or conditional statements to filter data in a table. These combinations can be either conjunctive or disjunctive.

Conjunctive

Conjunctive filtering returns a table where all conditional filters in a where clause return true. Achieve conjunctive filtering by passing multiple filters into where via multiple parameters, or by using the && operator in a single filter.

The following example applies a conjunctive filter to the iris table to produce a new table of only Iris setosa flowers with a petal length in a specific range.

conjunctiveFilteredIris = iris.where("Class in `Iris-setosa`", "PetalLengthCM >= 1.3 && PetalLengthCM <= 1.6")

Disjunctive

Disjunctive filtering returns a table where one or more of the statements return true. You can accomplish this by using the || operator in a where statement, or by using Filter.or to combine multiple filters.

In the following example, two filters work disjunctively to return a new table where the petal length is greater than 1.9 cm or the petal width is less than 1.3 cm.

orFilteredIris = iris.where("PetalLengthCM > 1.9 || PetalWidthCM < 1.3")
irisFilterOr = iris.where(Filter.or(Filter.from("PetalLengthCM > 1.9", "PetalWidthCM < 1.3")))

Filter one table based on another

The whereIn and whereNotIn methods filter one table based on another table. Deephaven evaluates these two methods whenever either table changes, whereas it only evaluates where when the filtered table ticks. ​The following example uses whereIn and whereNotIn to find Iris virginica sepal widths that match and do not match Iris versicolor sepal widths:

virginica = iris.where("Class in `Iris-virginica`")
versicolor = iris.where("Class in `Iris-versicolor`")
virginicaMatchingPetalWidths = virginica.whereIn(versicolor, "PetalWidthCM")
virginicaNonMatchingPetalWidths = virginica.whereNotIn(versicolor, "PetalWidthCM")

[!CAUTION] > whereIn and whereNotIn are inefficient if the filter table updates frequently.

Tip

Unlike naturalJoin, whereIn works when more than one value in the right table matches values in the left table. join also handles this, but whereIn returns matching rows faster than join.

whereIn only provides filtering and does not allow adding columns from the right table. You may want to use whereIn to filter and then join to add columns from the right table. This provides similar performance to naturalJoin while still allowing matches from the right table.

Filter by row position

Filtering by row position removes unwanted data at the beginning, middle, or end of a table. You can specify row positions as absolute values or as a percentage of the total table size.

The following example uses head, tail, and slice to keep only the first, middle, and last 10 rows of iris, respectively.

irisHead = iris.head(10)
irisSlice = iris.slice(70, 80)
irisTail = iris.tail(10)

The following example uses headPct, slicePct, and tailPct to keep only 10% of the rows at the top, middle, and end of the table, respectively.

irisHeadPct = iris.headPct(0.1)
irisSlicePct = iris.slicePct(0.45, 0.55)
irisTailPct = iris.tailPct(0.1)

Incremental Release Filter

The IncrementalReleaseFilter, available from Java and Groovy, converts a static or add-only table into a ticking table that parcels out rows over time. This can be useful to simulate ticking data for development or to limit the number of rows that a complex query processes at one time. The incremental release filter takes two parameters:

  • The initial number of rows to present in the resulting table.
  • The number of rows to release at the beginning of each update graph cycle.
import io.deephaven.csv.CsvTools
import io.deephaven.engine.table.impl.select.IncrementalReleaseFilter

iris = CsvTools.readCsv("https://media.githubusercontent.com/media/deephaven/examples/main/Iris/csv/iris.csv")
// release 10 rows initially, then release 5 rows at the beginning of each update graph cycle
iris_incremental = iris.where(new IncrementalReleaseFilter(10, 5))