TailInitializationFilter

TailInitializationFilter reduces the input size for downstream operations by limiting initialization to only the most recent rows from each partition. This is particularly useful when working with large datasets that periodically publish new snapshots, and you intend to run a lastBy on the data to retrieve the most recent snapshot.

The filter is designed to work with add-only source tables with one or more partitions. If the input table is in Parquet or Enterprise format, partitions are detected automatically. Otherwise, each contiguous range of row keys is assumed to represent a partition. Each partition must be sorted by timestamp, with the most recent timestamp at the end.

Once initialized, the filter passes through all new rows. Rows that have already been filtered are not removed or modified.

Syntax

result = TailInitializationFilter.mostRecent(table, timestampName, period)
result = TailInitializationFilter.mostRecent(table, timestampName, nanos)
result = TailInitializationFilter.mostRecentRows(table, rowCount)

Parameters

`mostRecent` (period)

Parameter Type Description

table

Table

The source table to filter. Must be add-only with partitions sorted by timestamp.

timestampName

String

The name of the timestamp column used to determine recency.

period

String

The time period string specifying how far back from the last row to include rows. The period is parsed using DateTimeUtils.parseDurationNanos().

Examples: "PT1H" (1 hour), "PT30M" (30 minutes), "PT10S" (10 seconds)

`mostRecent` (nanos)

Parameter	Type	Description
table	Table	The source table to filter. Must be add-only with partitions sorted by timestamp.
timestampName	String	The name of the timestamp column used to determine recency.
nanos	long	The interval in nanoseconds between the last row in a partition and rows that match the filter.

`mostRecentRows`

Parameter	Type	Description
table	Table	The source table to filter. Must be add-only.
rowCount	long	The number of rows to include per partition.

Returns

A table containing only the most recent values from each partition in the source table.

How it works

For each partition, the filter uses the last row's timestamp as the reference point. It subtracts the specified period from this timestamp and performs a binary search to identify rows within that time window.

The filter makes these assumptions:

The source table is add-only (no modifications, shifts, or removals).
Each partition is sorted by timestamp.
Null timestamps are not permitted.

If any of these assumptions are violated, the result table is undefined.

Examples

Filter by time period

This example uses a time table and filters to show only rows from the last 10 seconds:

import io.deephaven.engine.table.impl.util.TailInitializationFilter

source = timeTable("2026-01-01T00:00:00 America/New_York", "PT00:00:01").update("Value = ii")

result = TailInitializationFilter.mostRecent(source, "Timestamp", "PT00:00:10")

This filters to show only rows where the timestamp is within 10 seconds of the most recent row in the table.

Filter by time in nanoseconds

This example filters to show rows from the last 5 seconds (5 billion nanoseconds):

import io.deephaven.engine.table.impl.util.TailInitializationFilter
import static io.deephaven.time.DateTimeUtils.SECOND

source = timeTable("2026-01-01T00:00:00 America/New_York", "PT00:00:01").update("Value = ii")

result = TailInitializationFilter.mostRecent(source, "Timestamp", 5 * SECOND)

Filter by row count

The mostRecentRows method filters to show a specified number of rows from the end of each partition:

source = timeTable("2026-01-01T00:00:00 America/New_York", "PT00:00:01").update("Value = ii")
rowCount = 10
result = TailInitializationFilter.mostRecentRows(source, rowCount)