Other rolling utilities with update_by
This guide covers general rolling operations in the updateby
Python module. To learn about cumulative, rolling, and moving statistics, see our related guide.
Rolling formulas
The updateby
module enables users to create custom rolling aggregations with the rolling_formula_tick
and rolling_formula_time
functions. For more information on tick vs. time operations, see this section of the rolling statistics guide.
The user-defined formula can utilize any of Deephaven's built-in functions, arithmetic operators, or even user-defined Python functions.
rolling_formula_tick
Use rolling_formula_tick
to create custom tick-based rolling formulas. Here's an example that computes the rolling geometric mean of a column X
by group:
from deephaven import empty_table
import deephaven.updateby as uby
source = empty_table(100).update(
["Letter = (i % 2 == 0) ? `A` : `B`", "X = randomInt(0, 100)"]
)
result = source.update_by(
uby.rolling_formula_tick(
formula="pow(product(x), 1/count(x))",
formula_param="x",
cols="GeomMeanX=X",
rev_ticks=3,
),
by="Letter",
)
- result
rolling_formula_time
To create custom time-based rolling formulas, use rolling_formula_time
. You must supply a timestamp column, and can specify the time window as backward-looking, forward-looking, or both. Here's an example that computes the 5-second rolling geometric mean of a column X
by group:
from deephaven import empty_table
import deephaven.updateby as uby
source = empty_table(100).update(
[
"Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
"Letter = (i % 2 == 0) ? `A` : `B`",
"X = randomInt(0, 25)",
]
)
result = source.update_by(
uby.rolling_formula_time(
ts_col="Timestamp",
formula="pow(product(x), 1/count(x))",
formula_param="x",
cols="GeomMeanX=X",
rev_time="PT5s",
),
by="Letter",
)
- result
Rolling groups
In addition to custom rolling formulas, updateby
provides the ability to create rolling groups with rolling_group_tick
and rolling_group_time
. The grouped data are represented as arrays. See the guide on how to work with arrays for more details.
rolling_group_tick
Use rolling_group_tick
to create tick-based rolling groups, where each group will have a specified number of entries determined by rev_ticks
and fwd_ticks
. Here's an example that creates rolling groups with the three previous rows and the current row:
from deephaven import empty_table
import deephaven.updateby as uby
source = empty_table(100).update(
["Letter = randomBool() ? `A` : `B`", "X = randomInt(0, 100)"]
)
result = source.update_by(
uby.rolling_group_tick(cols="GroupX=X", rev_ticks=4), by="Letter"
)
- result
To create groups that include data after the current row, use the fwd_ticks
parameter. This example creates a group that consists of the two previous rows, the current row, and the next four rows:
from deephaven import empty_table
import deephaven.updateby as uby
source = empty_table(100).update(
["Letter = randomBool() ? `A` : `B`", "X = randomInt(0, 100)"]
)
result = source.update_by(
uby.rolling_group_tick(cols="GroupX=X", rev_ticks=3, fwd_ticks=4), by="Letter"
)
- result
rolling_group_time
Similarly, use rolling_group_time
to create time-based rolling groups:
from deephaven import empty_table
import deephaven.updateby as uby
source = empty_table(100).update(
[
"Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
"Letter = randomBool() ? `A` : `B`",
"X = randomInt(0, 25)",
]
)
result = source.update_by(
uby.rolling_group_time(ts_col="Timestamp", cols="GroupX=X", rev_time="PT3s"),
by="Letter",
)
- result
These groups are timestamp-based, so they are not guaranteed to contain elements from any previous row. This is in contrast to rolling_group_tick
, which always yields groups of a fixed size after that size has been reached.
The fwd_time
parameter is used to create groups that include rows occuring after the current row. Here's an example that creates rolling groups out of every row within five seconds of the current row:
from deephaven import empty_table
import deephaven.updateby as uby
source = empty_table(100).update(
[
"Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
"Letter = randomBool() ? `A` : `B`",
"X = randomInt(0, 25)",
]
)
result = source.update_by(
uby.rolling_group_time(
ts_col="Timestamp", cols="GroupX=X", rev_time="PT5s", fwd_time="PT5s"
),
by="Letter",
)
- result
Sequential differences with delta
Deephaven's delta
function can be used to compute sequential differences in a column of a table. Here's a simple example:
from deephaven import empty_table
import deephaven.updateby as uby
source = empty_table(100).update(
[
"Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
"X = randomInt(0, 25)",
]
)
result = source.update_by(uby.delta(cols="DiffX=X"))
- result
Like all other updateby
functions, the by
argument is used to specify grouping columns, so that sequential differences can be computed on a per-group basis:
from deephaven import empty_table
import deephaven.updateby as uby
source = empty_table(100).update(
[
"Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
"Letter = randomBool() ? `A` : `B`",
"X = randomInt(0, 25)",
]
)
result = source.update_by(uby.delta(cols="DiffX=X"), by="Letter")
- result
Detrend time-series data
Sequential differencing is often used as a first measure for detrending time-series data. The updateby
module provides the delta
function to make this easy:
from deephaven import empty_table
import deephaven.updateby as uby
from deephaven.plot import Figure
source = empty_table(1000).update(
[
"Timestamp='2023-01-13T12:00 ET' + i*MINUTE",
"Ticker = i%2==0 ? `ABC` : `XYZ`",
"Price = i%2==0 ? 100*sin(i/40)+100*random() : 100*cos(i/40)+100*random()+i/2",
]
)
result = source.update_by(uby.delta("DiffPrice=Price"), by="Ticker")
no_detrend = (
Figure()
.plot_xy(
series_name="ABC", t=result.where("Ticker == `ABC`"), x="Timestamp", y="Price"
)
.plot_xy(
series_name="XYZ", t=result.where("Ticker == `XYZ`"), x="Timestamp", y="Price"
)
.show()
)
detrend = (
Figure()
.plot_xy(
series_name="ABC",
t=result.where("Ticker == `ABC`"),
x="Timestamp",
y="DiffPrice",
)
.plot_xy(
series_name="XYZ",
t=result.where("Ticker == `XYZ`"),
x="Timestamp",
y="DiffPrice",
)
.show()
)
- no_detrend
- detrend
- source
- result
Handle nulls with DeltaControl
The delta
function takes an optional argument delta_control
that is used to determine how null values are treated. To use this argument, you must supply a DeltaControl
instance. The following behaviors are available via DeltaControl
:
DeltaControl.NULL_DOMINATES
: A valid value following a null value returns null.DeltaControl.VALUE_DOMINATES
: A valid value following a null value returns the valid value.DeltaControl.ZERO_DOMINATES
: A valid value following a null value returns zero.
To see how each of these behave in context, consider the following example:
from deephaven import empty_table
import deephaven.updateby as uby
source = empty_table(100).update(
[
"Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
"Letter = randomBool() ? `A` : `B`",
"X = randomInt(0, 25)",
]
)
result = source.update_by(
[
uby.delta(cols="DefaultDeltaX=X"),
uby.delta(
cols="NullDomDeltaX=X",
delta_control=uby.DeltaControl(uby.DeltaControl.NULL_DOMINATES),
),
uby.delta(
cols="ValueDomDeltaX=X",
delta_control=uby.DeltaControl(uby.DeltaControl.VALUE_DOMINATES),
),
uby.delta(
cols="ZeroDomDeltaX=X",
delta_control=uby.DeltaControl(uby.DeltaControl.ZERO_DOMINATES),
),
],
by="Letter",
)
- result
By default, delta
uses NULL_DOMINATES
, so differencing a number from a null will always return a null.
Handle nulls with forward_fill
The updateby
module provides the forward_fill
function to help deal with null data values. It fills in null values with the most recent non-null values, and like all updateby
operations, can do so on a per-group basis.
Here's an example of using forward_fill
to fill up null values by group:
from deephaven import empty_table
import deephaven.updateby as uby
source = empty_table(100).update(
[
"Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
"Letter = randomBool() ? `A` : `B`",
"X = randomBool() ? NULL_INT : randomInt(0, 25)",
]
)
result = source.update_by(uby.forward_fill("FillX=X"), by="Letter")
- result