TailInitializationFilter
TailInitializationFilter reduces the input size for downstream operations by limiting initialization to only the most recent rows from each partition. This is particularly useful when working with large datasets that periodically publish new snapshots, and you intend to run a lastBy on the data to retrieve the most recent snapshot.
The filter is designed to work with add-only source tables with one or more partitions. If the input table is in Parquet or Enterprise format, partitions are detected automatically. Otherwise, each contiguous range of row keys is assumed to represent a partition. Each partition must be sorted by timestamp, with the most recent timestamp at the end.
Once initialized, the filter passes through all new rows. Rows that have already been filtered are not removed or modified.
Syntax
Parameters
mostRecent (period)
| Parameter | Type | Description |
|---|---|---|
| table | Table | The source table to filter. Must be add-only with partitions sorted by timestamp. |
| timestampName | String | The name of the timestamp column used to determine recency. |
| period | String | The time period string specifying how far back from the last row to include rows. The period is parsed using Examples: |
mostRecent (nanos)
| Parameter | Type | Description |
|---|---|---|
| table | Table | The source table to filter. Must be add-only with partitions sorted by timestamp. |
| timestampName | String | The name of the timestamp column used to determine recency. |
| nanos | long | The interval in nanoseconds between the last row in a partition and rows that match the filter. |
mostRecentRows
| Parameter | Type | Description |
|---|---|---|
| table | Table | The source table to filter. Must be add-only. |
| rowCount | long | The number of rows to include per partition. |
Returns
A table containing only the most recent values from each partition in the source table.
How it works
For each partition, the filter uses the last row's timestamp as the reference point. It subtracts the specified period from this timestamp and performs a binary search to identify rows within that time window.
The filter makes these assumptions:
- The source table is add-only (no modifications, shifts, or removals).
- Each partition is sorted by timestamp.
- Null timestamps are not permitted.
If any of these assumptions are violated, the result table is undefined.
Examples
Filter by time period
This example uses a time table and filters to show only rows from the last 10 seconds:
This filters to show only rows where the timestamp is within 10 seconds of the most recent row in the table.
Filter by time in nanoseconds
This example filters to show rows from the last 5 seconds (5 billion nanoseconds):
Filter by row count
The mostRecentRows method filters to show a specified number of rows from the end of each partition: