Filter table data
This guide covers filtering table data in Deephaven. Many different table operations can be used to filter unwanted data out of a table. Some remove data based on conditional formulas, whereas others remove data based on row indices or columns.
The following table operation removes data that does not meet the conditions set forth by one or more conditional filter formulas:
The following table operations remove data based on row indices:
The following table operations remove data based on equality in one or more columns of a separate table:
Conditional filtering
Conditional filtering applies one or more comparison formulas to remove data that does not meet the specified criteria. Comparison formulas can use match filters, range filters, string filters, and regular expression filters to remove unwanted data. The comparison formulas used in conditional filtering must equate to either true
or false
. When true
, the data is kept in the table. When false
, the data is removed.
The following code block imports the Iris dataset found in Deephaven's example repository. Subsequent examples filter this data using comparison formulas:
import static io.deephaven.csv.CsvTools.readCsv
iris = readCsv(
"https://media.githubusercontent.com/media/deephaven/examples/main/Iris/csv/iris.csv"
)
- iris
Match filters
Match filters set forth one of the following criteria:
- Equality (
=
and==
): If a value is equal to another value, the result istrue
. Otherwise, the result isfalse
. - Inequality (
!=
): If a value is not equal to another value, the result istrue
. Otherwise, the result isfalse
. in
: If a value exists within a set of values, the result istrue
. Otherwise, the result isfalse
.not in
: If a value does not exist within a set of values, the result istrue
. Otherwise, the result isfalse
.icase in
: The same asin
, but capitalization of letters is ignored. This should only be used to filter string columns.icase not in
: The same asnot in
, but capitalization of letters is ignored. This should only be used to filter string columns.
The following code block applies an equality filter and an inequality filter:
filteredBySepalWidth = iris.where("SepalWidthCM = 3.5")
notVirginica = iris.where("Class != 'Iris-virginica'")
- filteredBySepalWidth
- notVirginica
The following code block applies in
, not in
, icase in
, and icase not in
:
setosaAndVirginica = iris.where("Class in `Iris-setosa`, `Iris-virginica`")
notSetosaOrVirginica = iris.where("Class not in `Iris-setosa`, `Iris-virginica`")
virginica = iris.where("Class icase in `iris-virginica`")
notVersicolor = iris.where("Class icase not in `iris-versicolor`")
- setosaAndVirginica
- notSetosaOrVirginica
- virginica
- notVersicolor
equals (=
and ==
)
This method returns rows that have a matching value in a specified column. In the example below, the new table filteredBySepalWidth
contains only the rows from the iris
table with a 3.5 cm sepal width.
filteredBySepalWidth = iris.where("SepalWidthCM = 3.5")
The single equals (=
) and double equals (==
) can be used interchangeably in filters.
not equals (!=
)
This method returns rows that do not have a matching value in a specified column. In the example below, the new table notSetosa
contains only Iris versicolor and virginica flowers.
notSetosa = iris.where("Class != `Iris-setosa`")
- notSetosa
in
This method returns rows that contain a match of one or more values in a specified column. In the example below, the new table filteredByClass
contains only Iris setosa and virginica flowers.
setosaAndVirginica = iris.where("Class in `Iris-setosa`, `Iris-virginica`")
not in
This method returns rows that do not contain a match of one or more values in a specified column. In the example below, the new table versicolor
contains only Iris versicolor flowers.
notSetosaOrVirginica = iris.where("Class not in `Iris-setosa`, `Iris-virginica`")
icase in
This method returns rows that contain a match of one or more values in a specified column, regardless of capitalization. In the example below, the new table virginica
contains only Iris virginica flowers.
virginica = iris.where("Class icase in `iris-virginica`")
icase not in
This method returns rows that do not contain a match of one or more values in a specified column, regardless of capitalization. In the example below, the new table notVersicolor
contains data for Iris setosa and viriginca flowers.
notVersicolor = iris.where("Class icase not in `iris-versicolor`")
Range filters
Range filters remove data that does not fall into a range of values. Range filters use any of the following operators:
<
: Less than<=
: Less than or equal to>
: Greater than>=
: Greater than or equal toinRange
: Checks if a value is in a given range
The following code block applies each operator to remove unwanted data:
sepalWidthLessThan3 = iris.where("SepalWidthCM < 3.0")
petalLengthGreaterThan2 = iris.where("PetalLengthCM > 2.0")
sepalLengthGreaterThanOrEqualTo6 = iris.where("SepalLengthCM >= 6.0")
petalWidthLessThanOrEqualTo1 = iris.where("PetalWidthCM <= 1")
petalWidthInRange = iris.where("inRange(PetalWidthCM, 0, 1)")
- sepalWidthLessThan3
- petalLengthGreaterThan2
- sepalLengthGreaterThanOrEqualTo6
- petalWidthLessThanOrEqualTo1
- petalWidthInRange
String filters
Strings in Deephaven tables are stored as Java strings. Any method that can be called on a Java string can be used to filter string data. Methods such as startsWith
, endswith
, contains
, and matches
are particularly useful.
The following code block applies these methods to filter the iris
table on its String column, Class
:
classStartsWith = iris.where("Class.startsWith(`Iris`)")
classEndsWith = iris.where("Class.endsWith(`setosa`)")
classContains = iris.where("Class.contains(`vir`)")
classMatches = iris.where("Class.matches(`Iris-versicolor`)")
- classStartsWith
- classEndsWith
- classContains
- classMatches
Regular expression filters
Regular expression filtering is another type of string filtering that uses regular expressions to remove unwanted data. Deephaven's filters
submodule enables the use of regex in filtering operations.
When performing regex filtering with the filters
submodule, the filters are applied with a specific pattern mode. Two pattern modes are available:
FIND
looks for a subsequence match in the string.MATCHES
matches an entire string against the given pattern.
The following code block performs regular expression filtering on the iris
table with both pattern modes. The first finds strings that are eleven characters long, whereas the second looks for the subsequence virginica
in the Class
column:
import io.deephaven.api.filter.FilterPattern.Mode
import java.util.regex.Pattern
import io.deephaven.api.expression.Function
import io.deephaven.api.expression.Method
import io.deephaven.api.ColumnName
filterElevenChars = FilterPattern.of(ColumnName.of('Class'), Pattern.compile("..........."), Mode.MATCHES, false)
filterRegexMatch = FilterPattern.of(ColumnName.of('Class'), Pattern.compile("virginica"), Mode.FIND, false)
irisElevenChars = iris.where(filterElevenChars)
irisVirginica = iris.where(filterRegexMatch)
- irisElevenChars
- irisVirginica
Combine filters
Multiple match and/or conditional statements can be combined to filter data in a table. These combinations can be either conjunctive or disjunctive.
Conjunctive
Conjunctive filtering is used to return a table where all conditional filters in a where
clause return true
. Conjunctive filtering can be achieved by passing multiple filters into where
via a list, or by using the &&
operator in a single filter.
In the following example, a conjunctive filter is applied to the iris
table to produce a new table of only Iris setosa flowers with a petal length in a specific range.
conjunctiveFilteredIris = iris.where("Class in `Iris-setosa`", "PetalLengthCM >= 1.4 && PetalLengthCM <= 1.6")
- conjunctiveFilteredIris
Disjunctive
Disjunctive filtering is used to return a table where one or more of the statements return true
. This can be achieved using the ||
operator in a where
statement.
In the following example, two filters work disjunctively to return a new table where the petal length is greater than 1.9 cm or less than 1.3 cm.
orFilteredIris = iris.where("PetalLengthCM > 1.9 || PetalWidthCM < 1.3")
- orFilteredIris
Filter one table based on another
The whereIn
and whereNotIn
methods enable filtering of one table based on another table. These two methods are evaluated whenever either table changes, whereas where
is only evaluated when the filtered table ticks. In the example below, the whereIn
and whereNotIn
methods are used to find Iris virginica sepal widths that match and do not match Iris versicolor sepal widths:
virginica = iris.where("Class in `Iris-virginica`")
versicolor = iris.where("Class in `Iris-versicolor`")
virginicaMatchingPetalWidths = virginica.whereIn(versicolor, "PetalWidthCM")
virginicaNonMatchingPetalWidths = virginica.whereNotIn(versicolor, "PetalWidthCM")
- virginica
- versicolor
- virginicaMatchingPetalWidths
- virginicaNonMatchingPetalWidths
whereIn
and whereNotIn
are inefficient if the filter table updates frequently.
Unlike naturalJoin
, whereIn
can be used when there is more than one matching value in the right table for values in the left table. This is true of join
as well, but whereIn
is faster at returning matching rows than join
.
whereIn
only provides filtering and does not allow columns to be added from the right table. In some cases, it may be desirable to use whereIn
to filter and then join
to add columns from the right table. This provides similar performance to naturalJoin
while still allowing matches from the right table.
Filter by row index
Filtering by row index removes unwanted data at the top, middle, or end of a table. Row indices can be chosen on their own or by percentage of the total size of a table.
The following example uses head
, tail
, and slice
to keep only the first, middle, and last 10 rows of iris
, respectively.
irisHead = iris.head(10)
irisSlice = iris.slice(70, 80)
irisTail = iris.tail(10)
- irisHead
- irisSlice
- irisTail
The following example uses headPct
, slicePct
, and tailPct
to keep only 10% of the rows at the top, middle, and end of the table, respectively.
irisHeadPct = iris.headPct(0.1)
irisSlicePct = iris.slicePct(0.45, 0.55)
irisTailPct = iris.tailPct(0.1)
- irisHeadPct
- irisSlicePct
- irisTailPct