Each Deephaven Core release brings new features to get excited about. One of this year's biggest and most exciting is the `update_by`

table operation. Its introduction to the table API has transformed what used to be difficult and tedious doable in a single function call.

## What is `update_by`

?

`update_by`

adds new capability for computing inter-row statistics. This enables efficient and expressive cross-row calculations, such as the differences between rows, or an exponential moving average. It can perform two types of operations: cumulative and rolling.

- Cumulative operations are applied to an entire table, such as a cumulative sum.
- Rolling operations are applied to a subset of rows in a table, such as a rolling sum over the previous ten rows or the previous ten seconds.

`update_by`

is highly efficient and will only recompute the absolute minimum amount of data required for each calculation. It retains the current table rows and simply appends the aggregating columns. It's also parallelized - concurrency can be achieved even in Python, which is typically limited by the GIL.

## Why is it called `update_by`

?

This operation's name is derived from the combination of two of Deephaven's most commonly used operations - `update`

and `agg_by`

.

- It's like
`update`

because it updates table data, creating new columns with specified calculations. - It's like
`agg_by`

because the calculations implemented by`update_by`

are done on grouped (aggregated) sets of data. As with other aggregations, they can be done on a per-group basis, or applied to an entire table.

## Examples

Examples are more illustrative of capabilities than a long list. So, here are a few that show what `update_by`

can do.

### Rolling averages

The following example calculates row-based and time-based rolling averages of a column with numeric data. No grouping columns are given, so the calculations are applied to the whole table. The row-based moving average calculates an average given the previous row, current row, and next row. The same is true for the time-based average, which includes all rows within plus or minus 1.5 seconds from the current row.

`from deephaven.updateby import rolling_avg_tick, rolling_avg_time`

from deephaven.time import parse_instant

from deephaven import empty_table

base_time = parse_instant("2023-04-01T10:00:00 ET")

source = empty_table(25).update(["Timestamp = base_time + i * SECOND", "X = i"])

row_based_avg = rolling_avg_tick(cols=["RowAvgX = X"], rev_ticks=2, fwd_ticks=1)

time_based_avg = rolling_avg_time(ts_col="Timestamp", cols=["TimeAvgX = X"], rev_time="PT1.5S", fwd_time="PT1.5S")

result = source.update_by(ops=[row_based_avg, time_based_avg], by=[])

### EMAs grouped by a symbol

The following example calculates row and time-based EMAs of the same data as the example above. This time around, the `Symbol`

column is given as the grouping column. It contains alternating symbols `Tau`

and `Epsilon`

, so the EMAs are calculated on a per-symbol basis.

`from deephaven.updateby import ema_tick, ema_time`

from deephaven.time import parse_instant

from deephaven import empty_table

base_time = parse_instant("2023-04-01T10:00:00 ET")

source = empty_table(25).update(["Timestamp = base_time + i * SECOND", "Symbol = (i % 2 == 0) ? `Tau` : `Epsilon`", "X = i"])

row_based_ema = ema_tick(decay_ticks=5, cols=["RowEmaX = X"])

time_based_ema = ema_time(ts_col="Timestamp", decay_time="PT5S", cols=["TimeEmaX = X"])

result = source.update_by(ops=[row_based_ema, time_based_ema], by=["Symbol"])

result_tau = result.where(["Symbol = `Tau`"])

result_epsilon = result.where(["Symbol = `Epsilon`"])

### Rolling group in real-time

The following example calculates a rolling group on real-time data.

`from deephaven.updateby import rolling_group_tick, rolling_group_time`

from deephaven.time import parse_instant

from deephaven import time_table

base_time = parse_instant("2023-04-01T10:00:00 ET")

source = time_table("PT0.5S").update(["Symbol = (i % 2 == 0) ? `Tau` : `Epsilon`", "X = i"])

row_group_op = rolling_group_tick(cols=["RowGroupX = X"], rev_ticks=4, fwd_ticks=5)

time_group_op = rolling_group_time(ts_col="Timestamp", cols=["TimeGroupX = X"], rev_time="PT5S", fwd_time="PT3S")

result = source.update_by(ops=[row_group_op, time_group_op], by=["Symbol"])

result_tau = result.where(["Symbol = `Tau`"])

result_epsilon = result.where(["Symbol = `Epsilon`"])

## What calculations can be performed with `update_by`

?

A lot. Here's the full list:

Operation name | Description |
---|---|

`cum_max` | Cumulative maximum |

`cum_min` | Cumulative minimum |

`cum_prod` | Cumulative product |

`cum_sum` | Cumulative Sum |

`delta` | Difference between adjacent rows |

`ema_tick` | Row-based EMA |

`ema_time` | Time-based EMA |

`emmax_tick` | Row-based exponential moving maximum |

`emmax_time` | Time-based exponential moving maximum |

`emmin_tick` | Row-based exponential moving minimum |

`emmin_time` | Time-based exponential moving mimnimum |

`ems_tick` | Row-based exponential moving sum |

`ems_time` | Time-based exponential moving sum |

`forward_fill` | Forward-fill null values |

`rolling_avg_tick` | Row-based moving average |

`rolling_avg_time` | Time-based moving average |

`rolling_count_tick` | Row-based moving count |

`rolling_count_time` | Time-based moving count |

`rolling_group_tick` | Row-based moving group |

`rolling_group_time` | Time-based moving group |

`rolling_std_tick` | Row-based moving standard deviation |

`rolling_std_time` | Time-based moving standard deviation |

`rolling_sum_tick` | Row-based moving sum |

`rolling_sum_time` | Time-based moving sum |

`rolling_wavg_tick` | Row-based moving weighted average |

`rolling_wavg_time` | Time-based moving weighted average |

## Reach out

Have any questions for us? Reach out on Slack!