Skip to main content
Version: Java (Groovy)

Filter table data

This guide covers filtering table data in Deephaven. Many different table operations can be used to filter unwanted data out of a table. Some remove data based on conditional formulas, whereas others remove data based on row indices or columns.

The following table operation removes data that does not meet the conditions set forth by one or more conditional filter formulas:

The following table operations remove data based on row indices:

The following table operations remove data based on equality in one or more columns of a separate table:

Conditional filtering

Conditional filtering applies one or more comparison formulas to remove data that does not meet the specified criteria. Comparison formulas can use match filters, range filters, string filters, and regular expression filters to remove unwanted data. The comparison formulas used in conditional filtering must equate to either true or false. When true, the data is kept in the table. When false, the data is removed.

The following code block imports the Iris dataset found in Deephaven's example repository. Subsequent examples filter this data using comparison formulas:

import static io.deephaven.csv.CsvTools.readCsv

iris = readCsv(
"https://media.githubusercontent.com/media/deephaven/examples/main/Iris/csv/iris.csv"
)

Match filters

Match filters set forth one of the following criteria:

  • Equality (= and ==): If a value is equal to another value, the result is true. Otherwise, the result is false.
  • Inequality (!=): If a value is not equal to another value, the result is true. Otherwise, the result is false.
  • in: If a value exists within a set of values, the result is true. Otherwise, the result is false.
  • not in: If a value does not exist within a set of values, the result is true. Otherwise, the result is false.
  • icase in: The same as in, but capitalization of letters is ignored. This should only be used to filter string columns.
  • icase not in: The same as not in, but capitalization of letters is ignored. This should only be used to filter string columns.

The following code block applies an equality filter and an inequality filter:

filteredBySepalWidth = iris.where("SepalWidthCM = 3.5")
notVirginica = iris.where("Class != 'Iris-virginica'")

The following code block applies in, not in, icase in, and icase not in:

setosaAndVirginica = iris.where("Class in `Iris-setosa`, `Iris-virginica`")
notSetosaOrVirginica = iris.where("Class not in `Iris-setosa`, `Iris-virginica`")
virginica = iris.where("Class icase in `iris-virginica`")
notVersicolor = iris.where("Class icase not in `iris-versicolor`")

equals (= and ==)

This method returns rows that have a matching value in a specified column. In the example below, the new table filteredBySepalWidth contains only the rows from the iris table with a 3.5 cm sepal width.

filteredBySepalWidth = iris.where("SepalWidthCM = 3.5")
note

The single equals (=) and double equals (==) can be used interchangeably in filters.

not equals (!=)

This method returns rows that do not have a matching value in a specified column. In the example below, the new table notSetosa contains only Iris versicolor and virginica flowers.

notSetosa = iris.where("Class != `Iris-setosa`")

in

This method returns rows that contain a match of one or more values in a specified column. In the example below, the new table filteredByClass contains only Iris setosa and virginica flowers.

setosaAndVirginica = iris.where("Class in `Iris-setosa`, `Iris-virginica`")

not in

This method returns rows that do not contain a match of one or more values in a specified column. In the example below, the new table versicolor contains only Iris versicolor flowers.

notSetosaOrVirginica = iris.where("Class not in `Iris-setosa`, `Iris-virginica`")

icase in

This method returns rows that contain a match of one or more values in a specified column, regardless of capitalization. In the example below, the new table virginica contains only Iris virginica flowers.

virginica = iris.where("Class icase in `iris-virginica`")

icase not in

This method returns rows that do not contain a match of one or more values in a specified column, regardless of capitalization. In the example below, the new table notVersicolor contains data for Iris setosa and viriginca flowers.

notVersicolor = iris.where("Class icase not in `iris-versicolor`")

Range filters

Range filters remove data that does not fall into a range of values. Range filters use any of the following operators:

  • <: Less than
  • <=: Less than or equal to
  • >: Greater than
  • >=: Greater than or equal to
  • inRange: Checks if a value is in a given range

The following code block applies each operator to remove unwanted data:

sepalWidthLessThan3 = iris.where("SepalWidthCM < 3.0")
petalLengthGreaterThan2 = iris.where("PetalLengthCM > 2.0")
sepalLengthGreaterThanOrEqualTo6 = iris.where("SepalLengthCM >= 6.0")
petalWidthLessThanOrEqualTo1 = iris.where("PetalWidthCM <= 1")
petalWidthInRange = iris.where("inRange(PetalWidthCM, 0, 1)")

String filters

Strings in Deephaven tables are stored as Java strings. Any method that can be called on a Java string can be used to filter string data. Methods such as startsWith, endswith, contains, and matches are particularly useful.

The following code block applies these methods to filter the iris table on its String column, Class:

classStartsWith = iris.where("Class.startsWith(`Iris`)")
classEndsWith = iris.where("Class.endsWith(`setosa`)")
classContains = iris.where("Class.contains(`vir`)")
classMatches = iris.where("Class.matches(`Iris-versicolor`)")

Regular expression filters

Regular expression filtering is another type of string filtering that uses regular expressions to remove unwanted data. Deephaven's filters submodule enables the use of regex in filtering operations.

When performing regex filtering with the filters submodule, the filters are applied with a specific pattern mode. Two pattern modes are available:

  • FIND looks for a subsequence match in the string.
  • MATCHES matches an entire string against the given pattern.

The following code block performs regular expression filtering on the iris table with both pattern modes. The first finds strings that are eleven characters long, whereas the second looks for the subsequence virginica in the Class column:

import io.deephaven.api.filter.FilterPattern.Mode
import java.util.regex.Pattern
import io.deephaven.api.expression.Function
import io.deephaven.api.expression.Method
import io.deephaven.api.ColumnName

filterElevenChars = FilterPattern.of(ColumnName.of('Class'), Pattern.compile("..........."), Mode.MATCHES, false)
filterRegexMatch = FilterPattern.of(ColumnName.of('Class'), Pattern.compile("virginica"), Mode.FIND, false)

irisElevenChars = iris.where(filterElevenChars)
irisVirginica = iris.where(filterRegexMatch)

Combine filters

Multiple match and/or conditional statements can be combined to filter data in a table. These combinations can be either conjunctive or disjunctive.

Conjunctive

Conjunctive filtering is used to return a table where all conditional filters in a where clause return true. Conjunctive filtering can be achieved by passing multiple filters into where via a list, or by using the && operator in a single filter.

In the following example, a conjunctive filter is applied to the iris table to produce a new table of only Iris setosa flowers with a petal length in a specific range.

conjunctiveFilteredIris = iris.where("Class in `Iris-setosa`", "PetalLengthCM >= 1.4 && PetalLengthCM <= 1.6")

Disjunctive

Disjunctive filtering is used to return a table where one or more of the statements return true. This can be achieved using the || operator in a where statement.

In the following example, two filters work disjunctively to return a new table where the petal length is greater than 1.9 cm or less than 1.3 cm.

orFilteredIris = iris.where("PetalLengthCM > 1.9 || PetalWidthCM < 1.3")

Filter one table based on another

The whereIn and whereNotIn methods enable filtering of one table based on another table. These two methods are evaluated whenever either table changes, whereas where is only evaluated when the filtered table ticks. ​In the example below, the whereIn and whereNotIn methods are used to find Iris virginica sepal widths that match and do not match Iris versicolor sepal widths:

virginica = iris.where("Class in `Iris-virginica`")
versicolor = iris.where("Class in `Iris-versicolor`")
virginicaMatchingPetalWidths = virginica.whereIn(versicolor, "PetalWidthCM")
virginicaNonMatchingPetalWidths = virginica.whereNotIn(versicolor, "PetalWidthCM")
caution

whereIn and whereNotIn are inefficient if the filter table updates frequently.

tip

Unlike naturalJoin, whereIn can be used when there is more than one matching value in the right table for values in the left table. This is true of join as well, but whereIn is faster at returning matching rows than join.

whereIn only provides filtering and does not allow columns to be added from the right table. In some cases, it may be desirable to use whereIn to filter and then join to add columns from the right table. This provides similar performance to naturalJoin while still allowing matches from the right table.

Filter by row index

Filtering by row index removes unwanted data at the top, middle, or end of a table. Row indices can be chosen on their own or by percentage of the total size of a table.

The following example uses head, tail, and slice to keep only the first, middle, and last 10 rows of iris, respectively.

irisHead = iris.head(10)
irisSlice = iris.slice(70, 80)
irisTail = iris.tail(10)

The following example uses headPct, slicePct, and tailPct to keep only 10% of the rows at the top, middle, and end of the table, respectively.

irisHeadPct = iris.headPct(0.1)
irisSlicePct = iris.slicePct(0.45, 0.55)
irisTailPct = iris.tailPct(0.1)