Rolling calculations with update_by
This guide will show you how to use the update_by
table operation in your queries. update_by
creates a new table with resultant columns containing aggregated calculations (referred to as UpdateByOperations
) of columns in a source table. The calculations can be cumulative, windowed by rows (ticks), or windowed by time. The calculations are optionally done on a per-group basis, where groups are defined by one or more key columns.
Available UpdateByOperations
The calculations (UpdateByOperations
) that can be performed with update_by
are:
cum_max
cum_min
cum_prod
cum_sum
delta
ema_tick
ema_time
emmax_tick
emmax_time
emmin_tick
emmin_time
ems_tick
ems_time
emstd_tick
emstd_time
forward_fill
rolling_avg_tick
rolling_avg_time
rolling_formula_tick
rolling_formula_time
rolling_group_tick
rolling_group_time
rolling_max_tick
rolling_max_time
rolling_min_tick
rolling_min_time
rolling_prod_tick
rolling_prod_time
rolling_std_tick
rolling_std_time
rolling_sum_tick
rolling_sum_time
rolling_wavg_tick
rolling_wavg_time
The use of update_by
requires one or more of the calculations in the list above, as well as zero or more key columns to define groups. The resultant table contains all columns from the source table, as well as new columns if the output of the UpdateByOperation
renames them. If no key columns are given, then the calculations are applied to all rows in the specified columns. If one or more key columns are given, the calculations are applied to each unique group in the key column(s).
Examples
Each of the following subsections illustrates how to use update_by
.
A single UpdateByOperation
with no grouping columns
The following example calculates the tick-based rolling sum of the X
column in the source
table. No key columns are provided, so a single group exists that contains all rows of the table.
from deephaven.updateby import rolling_sum_tick
from deephaven import empty_table
source = empty_table(20).update(["X = i"])
result = source.update_by(
ops=rolling_sum_tick(cols=["RollingSumX = X"], rev_ticks=3, fwd_ticks=0)
)
- source
- result
Multiple UpdateByOperations
with no grouping columns
The following example builds on the previous by performing two UpdateByOperations
in a single update_by
. The cumulative minimum and maximum are calculated, and the range is derived from them.
from deephaven.updateby import cum_min, cum_max
from deephaven import empty_table
source = empty_table(20).update(["X = randomInt(0, 25)"])
result = source.update_by(
ops=[cum_min(cols=["MinX = X"]), cum_max(cols=["MaxX = X"])]
).update(["RangeX = MaxX - MinX"])
- source
- result
Multiple UpdateByOperations
with a single grouping column
The following example builds on the previous by specifying a grouping column. The grouping column is Letter
, which contains alternating letters A
and B
. As a result, the cumulative minimum, maximum, and range are calculated on a per-letter basis. The result
table is split by letter via where
to show this.
from deephaven.updateby import cum_min, cum_max
from deephaven import empty_table
source = empty_table(20).update(
["Letter = (i % 2 == 0) ? `A` : `B`", "X = randomInt(0, 25)"]
)
result = source.update_by(
ops=[cum_min(cols=["MinX = X"]), cum_max(cols=["MaxX = X"])], by=["Letter"]
).update(["RangeX = MaxX - MinX"])
result_a = result.where(["Letter == `A`"])
result_b = result.where(["Letter == `B`"])
- source
- result
- result_a
- result_b
A single UpdateByOperation
applied to multiple columns with multiple grouping columns
The following example builds on the previous by applying a single UpdateByOperation
to multiple columns as well as specifying multiple grouping columns. The grouping columns, Letter
and Truth
, contain alternating letters and random true/false values. Thus, groups are defined by unique combinations of letter and boolean. The result
table is split by letter and truth value to show the unique groups.
from deephaven.updateby import rolling_sum_tick, cum_max, cum_min
from deephaven import empty_table
source = empty_table(20).update(
[
"Letter = (i % 2 == 0) ? `A` : `B`",
"Truth = randomBool()",
"X = randomInt(0, 25)",
"Y = randomInt(50, 75)",
]
)
rolling_sum_ops = rolling_sum_tick(
cols=["RollingSumX = X", "RollingSumY = Y"], rev_ticks=5, fwd_ticks=0
)
min_ops = cum_min(cols=["MinX = X", "MinY = Y"])
max_ops = cum_max(cols=["MaxX = X", "MaxY = Y"])
result = source.update_by(
ops=[rolling_sum_ops, min_ops, max_ops], by=["Letter", "Truth"]
).update(["RangeX = MaxX - MinX", "RangeY = MaxY - MinY"])
result_a_true = result.where(["Letter == `A`", "Truth == true"])
result_a_false = result.where(["Letter == `A`", "Truth == false"])
result_b_true = result.where(["Letter == `B`", "Truth == true"])
result_b_false = result.where(["Letter == `B`", "Truth == false"])
- source
- result
- result_a_true
- result_a_false
- result_b_true
- result_b_false
Applying an UpdateByOperation
to all columns
The following example uses forward_fill
to fill null values with the most recent previous non-null value. No columns are given to forward_fill
, so the forward-fill is applied to all columns in the source
table except for the specified key column(s). This also means that the X
column is replaced in the result
table by the forward-filled X values.
from deephaven.updateby import forward_fill
from deephaven.constants import NULL_INT
from deephaven import empty_table
source = empty_table(10).update(
[
"Letter = (i % 2 == 0) ? `A` : `B`",
"X = (i % 3 == 0) ? NULL_INT : i",
"Y = (i % 5 == 2) ? i : NULL_INT",
]
)
result = source.update_by(ops=forward_fill(cols=[]), by=["Letter"])
- source
- result
Tick-based windowed calculations
There are multiple UpdateByOperations
that are windowed by ticks. When an operation is windowed, the window is defined when creating the operation.
For all tick-based windowed calculations, the window size and location relative to the current row are defined by two input parameters: rev_ticks
and fwd_ticks
. The former defines how far backwards the window goes, whereas the latter defines how far forwards it goes. rev_ticks
is inclusive of the current row: rev_ticks = 1
means the window starts at the current row. fwd_ticks
is not inclusive of the current row: fwd_ticks = 0
means the window ends at the current row. Both of these values can be either positive or negative. The bulleted list below gives several examples of these two parameters and the rolling window they create.
rev_ticks = 1, fwd_ticks = 0
- Contains only the current row.rev_ticks = 10, fwd_ticks = 0
- Contains 9 previous rows and the current row.rev_ticks = 0, fwd_ticks = 10
- Contains the following 10 rows; excludes the current row.rev_ticks = 10, fwd_ticks = 10
- Contains the previous 9 rows, the current row and the 10 rows following.rev_ticks = 10, fwd_ticks = -5
- Contains 5 rows, beginning at 9 rows before, ending at 5 rows before the current row (inclusive).rev_ticks = 11, fwd_ticks = -1
- Contains 10 rows, beginning at 10 rows before, ending at 1 row before the current row (inclusive).rev_ticks = -5, fwd_ticks = 10
- Contains 5 rows, beginning 5 rows following, ending at 10 rows following the current row (inclusive).
The following example:
- Creates a static source table with two columns.
- Calculates the rolling sum of
X
grouped byLetter
.- Three rolling sums are calculated using a window before, containing, and after to the current row.
- Splits the
result
table by letter viawhere
to show how the windowed calculations are performed on a per-group basis.
from deephaven.updateby import rolling_sum_tick
from deephaven import empty_table
source = empty_table(20).update(["X = i", "Letter = (i % 2 == 0) ? `A` : `B`"])
op_contains = rolling_sum_tick(cols=["ContainsX = X"], rev_ticks=1, fwd_ticks=1)
op_before = rolling_sum_tick(cols=["PriorX = X"], rev_ticks=3, fwd_ticks=-1)
op_after = rolling_sum_tick(cols=["PosteriorX = X"], rev_ticks=-1, fwd_ticks=3)
result = source.update_by(ops=[op_contains, op_before, op_after], by=["Letter"])
result_a = result.where(["Letter == `A`"])
result_b = result.where(["Letter == `B`"])
- source
- result
- result_a
- result_b
Time-based windowed calculations
There are multiple UpdateByOperations
that are windowed by time. When an operation is windowed, the window is defined when creating the operation. These operations require the source table to contain a column of date-times.
For all time-based windowed calculations, the window size and location relative to the current row are defined by two input parameters: fwd_time
and rev_time
. The former defines how far forward the window goes, whereas the latter defines how far backwards it goes. These parameters parameter can be given as an integer number of nanoseconds or a string in the form HH:MM:SS.dddd
. The bulleted list below explains how window sizes vary based on the two parameters:
rev_time = "PT00:00:00", fwd_time = "PT00:00:00"
- Contains rows that exactly match the current timestamp.rev_time = "PT00:10:00", fwd_time = "PT00:00:00"
- Contains rows from 10m earlier through the current timestamp (inclusive).rev_time = "PT00:00:00", fwd_time = "PT00:10:00"
- Contains rows from the current timestamp through 10m following the current row timestamp (inclusive).rev_time = int(60e9), fwd_time = int(60e9)
- Contains rows from 1m earlier through 1m3 following the current timestamp (inclusive).rev_time = "PT00:10:00", fwd_time = "-PT00:05:00"
- Contains rows from 10m earlier through 5m before the current timestamp (inclusive). This is a purely backwards-looking window.rev_time = int(-5e9), fwd_time = int(10e9)
- Contains rows from 5s following through 10s following the current timestamp (inclusive). This is a purely forwards-looking window.
The following example:
- Creates a static source table with three columns.
- Calculates the rolling sum of
X
grouped byLetter
.- Three rolling sums are calculated using a window before, containing, and after the current timestamp.
- Splits the
result
table by letter viawhere
to show how the windowed calculations are performed on a per-group basis.
from deephaven.updateby import rolling_sum_time
from deephaven.time import to_j_instant
from deephaven import empty_table
base_time = to_j_instant("2023-01-01T00:00:00 ET")
source = empty_table(20).update(
["Timestamp = base_time + i * SECOND", "X = i", "Letter = (i % 2 == 0) ? `A` : `B`"]
)
op_before = rolling_sum_time(
ts_col="Timestamp", cols=["PriorX = X"], rev_time="PT00:00:03", fwd_time=int(-1e9)
)
op_contains = rolling_sum_time(
ts_col="Timestamp",
cols=["ContainsX = X"],
rev_time="PT00:00:01",
fwd_time="PT00:00:01",
)
op_after = rolling_sum_time(
ts_col="Timestamp",
cols=["PosteriorX = X"],
rev_time="-PT00:00:01",
fwd_time=int(3e9),
)
result = source.update_by(ops=[op_before, op_contains, op_after], by=["Letter"])
result_a = result.where(["Letter == `A`"])
result_b = result.where(["Letter == `B`"])
- source
- result
- result_a
- result_b
Handling erroneous data
It's common for tables to contain null, NaN, or other erroneous values. Certain update_by
operations can be told how to handle these through the use of the op_control
input parameter. They are:
To see how erroneous data can be handled differently, see the OperationControl reference guide.
Related documentation
- Create an empty table
- How to use EMA
cum_max
cum_min
cum_prod
cum_sum
delta
DeltaControl
ema_tick
ema_time
emmax_tick
emmax_time
emmin_tick
emmin_time
ems_tick
ems_time
emstd_tick
emstd_time
forward_fill
OperationControl
rolling_avg_tick
rolling_avg_time
rolling_formula_tick
rolling_formula_time
rolling_group_tick
rolling_group_time
rolling_max_tick
rolling_max_time
rolling_min_tick
rolling_min_time
rolling_prod_tick
rolling_prod_time
rolling_std_tick
rolling_std_time
rolling_sum_tick
rolling_sum_time
rolling_wavg_tick
rolling_wavg_time