Other rolling utilities with update_by

This guide covers general rolling operations in the updateBy Groovy module. To learn about cumulative, rolling, and moving statistics, see our related guide.

Rolling formulas

The updateBy module enables users to create custom rolling aggregations with the RollingFormula function. For more information on tick vs. time operations, see this section of the rolling statistics guide.

The user-defined formula can utilize any of Deephaven's built-in functions, arithmetic operators, or even user-defined Groovy functions.

`RollingFormula`

Use RollingFormula to create custom tick-based rolling formulas. Here's an example that computes the rolling geometric mean of a column X by group:

source = emptyTable(100).update("Letter = (i % 2 == 0) ? `A` : `B`", "X = randomInt(0, 100)")

result = source.updateBy(
    RollingFormula(3, "pow(product(x), 1/count(x))", "x", "GeomMeanX=X"), "Letter"
)

`RollingFormula`

RollingFormula can also be used to create custom time-based rolling formulas. You must supply a timestamp column, and can specify the time window as backward-looking, forward-looking, or both. Here's an example that computes the 5-second rolling geometric mean of a column X by group:

source = emptyTable(100).update(
        "Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
        "Letter = (i % 2 == 0) ? `A` : `B`",
        "X = randomInt(0, 25)",
)

result = source.updateBy(
    RollingFormula("Timestamp", 5_000_000, "pow(product(x), 1/count(x))", "x", "GeomMeanX=X"),
    "Letter"
)

Rolling groups

In addition to custom rolling formulas, updateby provides the ability to create rolling groups with RollingGroup. The grouped data are represented as arrays. See the guide on how to work with arrays for more details.

Tick-based `RollingGroup`

Use RollingGroup to create tick-based rolling groups, where each group will have a specified number of entries determined by revTicks and fwdTicks. Here's an example that creates rolling groups with the three previous rows and the current row:

source = emptyTable(100).update("Letter = randomBool() ? `A` : `B`", "X = randomInt(0, 100)")

result = source.updateBy(RollingGroup(4, "GroupX=X"), "Letter")

To create groups that include data after the current row, use the fwdTicks parameter. This example creates a group that consists of the two previous rows, the current row, and the next four rows:

source = emptyTable(100).update("Letter = randomBool() ? `A` : `B`", "X = randomInt(0, 100)")

result = source.updateBy(RollingGroup(3, 4, "GroupX=X"), "Letter")

Time-based `RollingGroup`

Similarly, use RollingGroup to create time-based rolling groups:

source = emptyTable(100).update(
        "Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
        "Letter = randomBool() ? `A` : `B`",
        "X = randomInt(0, 25)",
)

result = source.updateBy(
    RollingGroup("Timestamp", parseDuration("PT3s"), "GroupX=X"),
    "Letter",
)

These groups are timestamp-based, so they are not guaranteed to contain elements from any previous row. This is in contrast to tick-based RollingGroup, which always yields groups of a fixed size after that size has been reached.

The fwdTime parameter is used to create groups that include rows occuring after the current row. Here's an example that creates rolling groups out of every row within five seconds of the current row:

source = emptyTable(100).update(
        "Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
        "Letter = randomBool() ? `A` : `B`",
        "X = randomInt(0, 25)",
)

result = source.updateBy(RollingGroup("Timestamp", parseDuration("PT5s"), parseDuration("PT5s"), "GroupX=X"), "Letter")

Sequential differences with `Delta`

Deephaven's Delta function can be used to compute sequential differences in a column of a table. Here's a simple example:

source = emptyTable(100).update(
    "Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
    "X = randomInt(0, 25)",
)

result = source.updateBy(Delta("DiffX=X"))

Like all other updateby functions, the by argument is used to specify grouping columns, so that sequential differences can be computed on a per-group basis:

source = emptyTable(100).update(
        "Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
        "Letter = randomBool() ? `A` : `B`",
        "X = randomInt(0, 25)",
)

result = source.updateBy(Delta("DiffX=X"), "Letter")

Detrend time-series data

Sequential differencing is often used as a first measure for detrending time-series data. The updateby module provides the Delta function to make this easy:

source = emptyTable(1000).update(
    "Timestamp='2023-01-13T12:00 ET' + i*MINUTE",
    "Ticker = i%2==0 ? `ABC` : `XYZ`",
    "Price = i%2==0 ? 100*sin(i/40)+100*random() : 100*cos(i/40)+100*random()+i/2",
)

result = source.updateBy(Delta("DiffPrice=Price"), "Ticker")

noDetrend = plot("ABC", result.where("Ticker = `ABC`"), "Timestamp", "Price")
    .plot("XYZ", result.where("Ticker = `XYZ`"), "Timestamp", "Price")
    .show()

detrend = plot("ABC", result.where("Ticker = `ABC`"), "Timestamp", "DiffPrice")
    .plot("XYZ", result.where("Ticker = `XYZ`"), "Timestamp", "DiffPrice")
    .show()

Handle nulls with `DeltaControl`

The Delta function takes an optional argument DeltaControl that is used to determine how null values are treated. To use this argument, you must supply a DeltaControl instance. The following behaviors are available via DeltaControl:

DeltaControl.NULL_DOMINATES: A valid value following a null value returns null.
DeltaControl.VALUE_DOMINATES: A valid value following a null value returns the valid value.
DeltaControl.ZERO_DOMINATES: A valid value following a null value returns zero.

To see how each of these behave in context, consider the following example:

source = emptyTable(100).update(
    "Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
    "Letter = randomBool() ? `A` : `B`",
    "X = randomInt(0, 25)",
)

result = source.updateBy(
    [
        Delta("DefaultDeltaX=X"),
        Delta(DeltaControl.NULL_DOMINATES, "NullDomDeltaX=X"),
        Delta(DeltaControl.VALUE_DOMINATES, "ValueDomDeltaX=X"),
        Delta(DeltaControl.ZERO_DOMINATES, "ZeroDomDeltaX=X"),
    ],
    "Letter",
)

By default, Delta uses NULL_DOMINATES, so differencing a number from a null will always return a null.

Handle nulls with `Fill`

The updateby module provides the Fill function to help deal with null data values. It fills in null values with the most recent non-null values, and like all updateby operations, can do so on a per-group basis.

Here's an example of using fill to fill up null values by group:

source = emptyTable(100).update(
        "Timestamp = '2023-01-01T00:00:00 ET' + i * SECOND",
        "Letter = randomBool() ? `A` : `B`",
        "X = randomBool() ? NULL_INT : randomInt(0, 25)",
)

result = source.updateBy(Fill("FillX=X"), "Letter")

Other rolling utilities with update_by

Rolling formulas

RollingFormula

RollingFormula

Rolling groups

Tick-based RollingGroup

Time-based RollingGroup

Sequential differences with Delta

Detrend time-series data

Handle nulls with DeltaControl

Handle nulls with Fill

Related documentation

`RollingFormula`

`RollingFormula`

Tick-based `RollingGroup`

Time-based `RollingGroup`

Sequential differences with `Delta`

Handle nulls with `DeltaControl`

Handle nulls with `Fill`