# How to use update_by

This guide will show you how to use the `update_by`

table operation in your queries. `update_by`

creates a new table with resultant columns containing aggregated calculations (referred to as `UpdateByOperations`

) of columns in a source table. The calculations can be cumulative, windowed by rows (ticks), or windowed by time. The calculations are optionally done on a per-group basis, where groups are defined by one or more key columns.

## Available `UpdateByOperations`

The calculations (`UpdateByOperations`

) that can be performed with `update_by`

are:

`cum_max`

`cum_min`

`cum_prod`

`cum_sum`

`delta`

`ema_tick`

`ema_time`

`emmax_tick`

`emmax_time`

`emmin_tick`

`emmin_time`

`ems_tick`

`ems_time`

`emstd_tick`

`emstd_time`

`forward_fill`

`rolling_avg_tick`

`rolling_avg_time`

`rolling_group_tick`

`rolling_group_time`

`rolling_max_tick`

`rolling_max_time`

`rolling_min_tick`

`rolling_min_time`

`rolling_prod_tick`

`rolling_prod_time`

`rolling_std_tick`

`rolling_std_time`

`rolling_sum_tick`

`rolling_sum_time`

`rolling_wavg_tick`

`rolling_wavg_time`

The use of `update_by`

requires one or more of the calculations in the list above, as well as zero or more key columns to define groups. The resultant table contains all columns from the source table, as well as new columns if the output of the `UpdateByOperation`

renames them. If no key columns are given, then the calculations are applied to all rows in the specified columns. If one or more key columns are given, the calculations are applied to each unique group in the key column(s).

## Examples

Each of the following subsections illustrates how to use `update_by`

.

### A single `UpdateByOperation`

with no grouping columns

The following example calculates the tick-based rolling sum of the `X`

column in the `source`

table. No key columns are provided, so a single group exists that contains all rows of the table.

`from deephaven.updateby import rolling_sum_tick`

from deephaven import empty_table

source = empty_table(20).update(["X = i"])

result = source.update_by(ops=rolling_sum_tick(cols=["RollingSumX = X"], rev_ticks=3, fwd_ticks=0))

- source
- result

### Multiple `UpdateByOperations`

with no grouping columns

The following example builds on the previous by performing two `UpdateByOperations`

in a single `update_by`

. The cumulative minimum and maximum are calculated, and the range is derived from them.

`from deephaven.updateby import cum_min, cum_max`

from deephaven import empty_table

source = empty_table(20).update(["X = randomInt(0, 25)"])

result = source.update_by(ops=[cum_min(cols=["MinX = X"]), cum_max(cols=["MaxX = X"])]).update(["RangeX = MaxX - MinX"])

- source
- result

### Multiple `UpdateByOperations`

with a single grouping column

The following example builds on the previous by specifying a grouping column. The grouping column is `Letter`

, which contains alternating letters `A`

and `B`

. As a result, the cumulative minimum, maximum, and range are calculated on a per-letter basis. The `result`

table is split by letter via `where`

to show this.

`from deephaven.updateby import cum_min, cum_max`

from deephaven import empty_table

source = empty_table(20).update(["Letter = (i % 2 == 0) ? `A` : `B`", "X = randomInt(0, 25)"])

result = source.update_by(ops=[cum_min(cols=["MinX = X"]), cum_max(cols=["MaxX = X"])], by=["Letter"]).update(["RangeX = MaxX - MinX"])

result_a = result.where(["Letter == `A`"])

result_b = result.where(["Letter == `B`"])

- source
- result
- result_a
- result_b

### A single `UpdateByOperation`

applied to multiple columns with multiple grouping columns

The following example builds on the previous by applying a single `UpdateByOperation`

to multiple columns as well as specifying multiple grouping columns. The grouping columns, `Letter`

and `Truth`

, contain alternating letters and random true/false values. Thus, groups are defined by unique combinations of letter and boolean. The `result`

table is split by letter and truth value to show the unique groups.

`from deephaven.updateby import rolling_sum_tick, cum_max, cum_min`

from deephaven import empty_table

source = empty_table(20).update(["Letter = (i % 2 == 0) ? `A` : `B`", "Truth = randomBool()", "X = randomInt(0, 25)", "Y = randomInt(50, 75)"])

rolling_sum_ops = rolling_sum_tick(cols=["RollingSumX = X", "RollingSumY = Y"], rev_ticks=5, fwd_ticks=0)

min_ops = cum_min(cols=["MinX = X", "MinY = Y"])

max_ops = cum_max(cols=["MaxX = X", "MaxY = Y"])

result = source.update_by(ops=[rolling_sum_ops, min_ops, max_ops], by=["Letter", "Truth"]).update(["RangeX = MaxX - MinX", "RangeY = MaxY - MinY"])

result_a_true = result.where(["Letter == `A`", "Truth == true"])

result_a_false = result.where(["Letter == `A`", "Truth == false"])

result_b_true = result.where(["Letter == `B`", "Truth == true"])

result_b_false = result.where(["Letter == `B`", "Truth == false"])

- source
- result
- result_a_true
- result_a_false
- result_b_true
- result_b_false

### Applying an `UpdateByOperation`

to all columns

The following example uses `forward_fill`

to fill null values with the most recent previous non-null value. No columns are given to `forward_fill`

, so the forward-fill is applied to *all* columns in the `source`

table except for the specified key column(s). This also means that the `X`

column is replaced in the `result`

table by the forward-filled X values.

`from deephaven.updateby import forward_fill`

from deephaven.constants import NULL_INT

from deephaven import empty_table

source = empty_table(10).update(["Letter = (i % 2 == 0) ? `A` : `B`", "X = (i % 3 == 0) ? NULL_INT : i", "Y = (i % 5 == 2) ? i : NULL_INT"])

result = source.update_by(ops=forward_fill(cols=[]), by=["Letter"])

- source
- result

### Tick-based windowed calculations

There are multiple `UpdateByOperations`

that are windowed by ticks. When an operation is windowed, the window is defined when creating the operation.

For all tick-based windowed calculations, the window size and location relative to the current row are defined by two input parameters: `rev_ticks`

and `fwd_ticks`

. The former defines how far *backwards* the window goes, whereas the latter defines how far *forwards* it goes. `rev_ticks`

is *inclusive* of the current row: `rev_ticks = 1`

means the window starts at the current row. `fwd_ticks`

is *not inclusive* of the current row: `fwd_ticks = 0`

means the window ends at the current row. Both of these values can be either positive or negative. The bulleted list below gives several examples of these two parameters and the rolling window they create.

`rev_ticks = 1, fwd_ticks = 0`

- Contains only the current row.`rev_ticks = 10, fwd_ticks = 0`

- Contains 9 previous rows and the current row.`rev_ticks = 0, fwd_ticks = 10`

- Contains the following 10 rows; excludes the current row.`rev_ticks = 10, fwd_ticks = 10`

- Contains the previous 9 rows, the current row and the 10 rows following.`rev_ticks = 10, fwd_ticks = -5`

- Contains 5 rows, beginning at 9 rows before, ending at 5 rows before the current row (inclusive).`rev_ticks = 11, fwd_ticks = -1`

- Contains 10 rows, beginning at 10 rows before, ending at 1 row before the current row (inclusive).`rev_ticks = -5, fwd_ticks = 10`

- Contains 5 rows, beginning 5 rows following, ending at 10 rows following the current row (inclusive).

The following example:

- Creates a static source table with two columns.
- Calculates the rolling sum of
`X`

grouped by`Letter`

.- Three rolling sums are calculated using a window before, containing, and after to the current row.

- Splits the
`result`

table by letter via`where`

to show how the windowed calculations are performed on a per-group basis.

`from deephaven.updateby import rolling_sum_tick`

from deephaven import empty_table

source = empty_table(20).update(["X = i", "Letter = (i % 2 == 0) ? `A` : `B`"])

op_contains = rolling_sum_tick(cols=["ContainsX = X"], rev_ticks=1, fwd_ticks=1)

op_before = rolling_sum_tick(cols=["PriorX = X"], rev_ticks=3, fwd_ticks=-1)

op_after = rolling_sum_tick(cols=["PosteriorX = X"], rev_ticks=-1, fwd_ticks=3)

result = source.update_by(ops=[op_contains, op_before, op_after], by=["Letter"])

result_a = result.where(["Letter == `A`"])

result_b = result.where(["Letter == `B`"])

- source
- result
- result_a
- result_b

### Time-based windowed calculations

There are multiple `UpdateByOperations`

that are windowed by time. When an operation is windowed, the window is defined when creating the operation. These operations *require* the source table to contain a column of date-times.

For all time-based windowed calculations, the window size and location relative to the current row are defined by two input parameters: `fwd_time`

and `rev_time`

. The former defines how far *forward* the window goes, whereas the latter defines how far *backwards* it goes. These parameters parameter can be given as an integer number of nanoseconds or a string in the form `HH:MM:SS.dddd`

. The bulleted list below explains how window sizes vary based on the two parameters:

`rev_time = "PT00:00:00", fwd_time = "PT00:00:00"`

- Contains rows that exactly match the current timestamp.`rev_time = "PT00:10:00", fwd_time = "PT00:00:00"`

- Contains rows from 10m earlier through the current timestamp (inclusive).`rev_time = "PT00:00:00", fwd_time = "PT00:10:00"`

- Contains rows from the current timestamp through 10m following the current row timestamp (inclusive).`rev_time = int(60e9), fwd_time = int(60e9)`

- Contains rows from 1m earlier through 1m3 following the current timestamp (inclusive).`rev_time = "PT00:10:00", fwd_time = "-PT00:05:00"`

- Contains rows from 10m earlier through 5m before the current timestamp (inclusive). This is a purely backwards-looking window.`rev_time = int(-5e9), fwd_time = int(10e9)`

- Contains rows from 5s following through 10s following the current timestamp (inclusive). This is a purely forwards-looking window.

The following example:

- Creates a static source table with three columns.
- Calculates the rolling sum of
`X`

grouped by`Letter`

.- Three rolling sums are calculated using a window before, containing, and after the current timestamp.

- Splits the
`result`

table by letter via`where`

to show how the windowed calculations are performed on a per-group basis.

`from deephaven.updateby import rolling_sum_time`

from deephaven.time import to_j_instant

from deephaven import empty_table

base_time = to_j_instant("2023-01-01T00:00:00 ET")

source = empty_table(20).update(["Timestamp = base_time + i * SECOND", "X = i", "Letter = (i % 2 == 0) ? `A` : `B`"])

op_before = rolling_sum_time(ts_col="Timestamp", cols=["PriorX = X"], rev_time="PT00:00:03", fwd_time=int(-1e9))

op_contains = rolling_sum_time(ts_col="Timestamp", cols=["ContainsX = X"], rev_time="PT00:00:01", fwd_time="PT00:00:01")

op_after = rolling_sum_time(ts_col="Timestamp", cols=["PosteriorX = X"], rev_time="-PT00:00:01", fwd_time=int(3e9))

result = source.update_by(ops=[op_before, op_contains, op_after], by=["Letter"])

result_a = result.where(["Letter == `A`"])

result_b = result.where(["Letter == `B`"])

- source
- result
- result_a
- result_b

## Handling erroneous data

It's common for tables to contain null, NaN, or other erroneous values. Certain `update_by`

operations can be told how to handle these through the use of the `op_control`

input parameter. They are:

To see how erroneous data can be handled differently, see the OperationControl reference guide.

## Related documentation

- How to create an empty table
- How to use EMA
`cum_max`

`cum_min`

`cum_prod`

`cum_sum`

`delta`

`DeltaControl`

`ema_tick`

`ema_time`

`emmax_tick`

`emmax_time`

`emmin_tick`

`emmin_time`

`ems_tick`

`ems_time`

`emstd_tick`

`emstd_time`

`forward_fill`

- `OperationControl
`rolling_avg_tick`

`rolling_avg_time`

`rolling_group_tick`

`rolling_group_time`

`rolling_max_tick`

`rolling_max_time`

`rolling_min_tick`

`rolling_min_time`

`rolling_prod_tick`

`rolling_prod_time`

`rolling_std_tick`

`rolling_std_time`

`rolling_sum_tick`

`rolling_sum_time`

`rolling_wavg_tick`

`rolling_wavg_time`