TailInitializationFilter

TailInitializationFilter reduces the input size for downstream operations by limiting initialization to only the most recent rows from each partition. This is particularly useful when working with large datasets that periodically publish new snapshots, and you intend to run a lastBy on the data to retrieve the most recent snapshot.

The filter is designed to work with add-only source tables with one or more partitions. If the input table is in Parquet or Enterprise format, partitions are detected automatically. Otherwise, each contiguous range of row keys is assumed to represent a partition. Each partition must be sorted by timestamp, with the most recent timestamp at the end.

Once initialized, the filter passes through all new rows. Rows that have already been filtered are not removed or modified.

Syntax

Parameters

mostRecent (period)

ParameterTypeDescription
tableTable

The source table to filter. Must be add-only with partitions sorted by timestamp.

timestampNameString

The name of the timestamp column used to determine recency.

periodString

The time period string specifying how far back from the last row to include rows. The period is parsed using DateTimeUtils.parseDurationNanos().

Examples: "PT1H" (1 hour), "PT30M" (30 minutes), "PT10S" (10 seconds)

mostRecent (nanos)

ParameterTypeDescription
tableTable

The source table to filter. Must be add-only with partitions sorted by timestamp.

timestampNameString

The name of the timestamp column used to determine recency.

nanoslong

The interval in nanoseconds between the last row in a partition and rows that match the filter.

mostRecentRows

ParameterTypeDescription
tableTable

The source table to filter. Must be add-only.

rowCountlong

The number of rows to include per partition.

Returns

A table containing only the most recent values from each partition in the source table.

How it works

For each partition, the filter uses the last row's timestamp as the reference point. It subtracts the specified period from this timestamp and performs a binary search to identify rows within that time window.

The filter makes these assumptions:

  • The source table is add-only (no modifications, shifts, or removals).
  • Each partition is sorted by timestamp.
  • Null timestamps are not permitted.

If any of these assumptions are violated, the result table is undefined.

Examples

Filter by time period

This example uses a time table and filters to show only rows from the last 10 seconds:

This filters to show only rows where the timestamp is within 10 seconds of the most recent row in the table.

Filter by time in nanoseconds

This example filters to show rows from the last 5 seconds (5 billion nanoseconds):

Filter by row count

The mostRecentRows method filters to show a specified number of rows from the end of each partition: