Skip to main content

Real-time windowed and cumulative calculations made easy

· 5 min read
DALL·E prompt: lots of numbers around angled blue window floating in the air in a blue room with blue floors, monochromatic
JJ Brosnan
Performance and efficiency with expressive syntax

Each Deephaven Core release brings new features to get excited about. One of this year's biggest and most exciting is the update_by table operation. Its introduction to the table API has transformed what used to be difficult and tedious doable in a single function call.

Efficiently perform windowed or cumulative calculations, compute inter-row statistics, aggregate, and group data all with one function.

What is update_by?

update_by adds new capability for computing inter-row statistics. This enables efficient and expressive cross-row calculations, such as the differences between rows, or an exponential moving average. It can perform two types of operations: cumulative and rolling.

  • Cumulative operations are applied to an entire table, such as a cumulative sum.
  • Rolling operations are applied to a subset of rows in a table, such as a rolling sum over the previous ten rows or the previous ten seconds.

update_by is highly efficient and will only recompute the absolute minimum amount of data required for each calculation. It retains the current table rows and simply appends the aggregating columns. It's also parallelized - concurrency can be achieved even in Python, which is typically limited by the GIL.

Why is it called update_by?

This operation's name is derived from the combination of two of Deephaven's most commonly used operations - update and agg_by.

  • It's like update because it updates table data, creating new columns with specified calculations.
  • It's like agg_by because the calculations implemented by update_by are done on grouped (aggregated) sets of data. As with other aggregations, they can be done on a per-group basis, or applied to an entire table.

Examples

Examples are more illustrative of capabilities than a long list. So, here are a few that show what update_by can do.

Rolling averages

The following example calculates row-based and time-based rolling averages of a column with numeric data. No grouping columns are given, so the calculations are applied to the whole table. The row-based moving average calculates an average given the previous row, current row, and next row. The same is true for the time-based average, which includes all rows within plus or minus 1.5 seconds from the current row.

from deephaven.updateby import rolling_avg_tick, rolling_avg_time
from deephaven.time import parse_instant
from deephaven import empty_table

base_time = parse_instant("2023-04-01T10:00:00 ET")

source = empty_table(25).update(["Timestamp = base_time + i * SECOND", "X = i"])

row_based_avg = rolling_avg_tick(cols=["RowAvgX = X"], rev_ticks=2, fwd_ticks=1)
time_based_avg = rolling_avg_time(ts_col="Timestamp", cols=["TimeAvgX = X"], rev_time="PT1.5S", fwd_time="PT1.5S")

result = source.update_by(ops=[row_based_avg, time_based_avg], by=[])

EMAs grouped by a symbol

The following example calculates row and time-based EMAs of the same data as the example above. This time around, the Symbol column is given as the grouping column. It contains alternating symbols Tau and Epsilon, so the EMAs are calculated on a per-symbol basis.

from deephaven.updateby import ema_tick, ema_time
from deephaven.time import parse_instant
from deephaven import empty_table

base_time = parse_instant("2023-04-01T10:00:00 ET")

source = empty_table(25).update(["Timestamp = base_time + i * SECOND", "Symbol = (i % 2 == 0) ? `Tau` : `Epsilon`", "X = i"])

row_based_ema = ema_tick(decay_ticks=5, cols=["RowEmaX = X"])
time_based_ema = ema_time(ts_col="Timestamp", decay_time="PT5S", cols=["TimeEmaX = X"])

result = source.update_by(ops=[row_based_ema, time_based_ema], by=["Symbol"])

result_tau = result.where(["Symbol = `Tau`"])
result_epsilon = result.where(["Symbol = `Epsilon`"])

Rolling group in real-time

The following example calculates a rolling group on real-time data.

from deephaven.updateby import rolling_group_tick, rolling_group_time
from deephaven.time import parse_instant
from deephaven import time_table

base_time = parse_instant("2023-04-01T10:00:00 ET")

source = time_table("PT0.5S").update(["Symbol = (i % 2 == 0) ? `Tau` : `Epsilon`", "X = i"])

row_group_op = rolling_group_tick(cols=["RowGroupX = X"], rev_ticks=4, fwd_ticks=5)
time_group_op = rolling_group_time(ts_col="Timestamp", cols=["TimeGroupX = X"], rev_time="PT5S", fwd_time="PT3S")

result = source.update_by(ops=[row_group_op, time_group_op], by=["Symbol"])

result_tau = result.where(["Symbol = `Tau`"])
result_epsilon = result.where(["Symbol = `Epsilon`"])

img

What calculations can be performed with update_by?

A lot. Here's the full list:

Operation nameDescription
cum_maxCumulative maximum
cum_minCumulative minimum
cum_prodCumulative product
cum_sumCumulative Sum
deltaDifference between adjacent rows
ema_tickRow-based EMA
ema_timeTime-based EMA
emmax_tickRow-based exponential moving maximum
emmax_timeTime-based exponential moving maximum
emmin_tickRow-based exponential moving minimum
emmin_timeTime-based exponential moving mimnimum
ems_tickRow-based exponential moving sum
ems_timeTime-based exponential moving sum
forward_fillForward-fill null values
rolling_avg_tickRow-based moving average
rolling_avg_timeTime-based moving average
rolling_count_tickRow-based moving count
rolling_count_timeTime-based moving count
rolling_group_tickRow-based moving group
rolling_group_timeTime-based moving group
rolling_std_tickRow-based moving standard deviation
rolling_std_timeTime-based moving standard deviation
rolling_sum_tickRow-based moving sum
rolling_sum_timeTime-based moving sum
rolling_wavg_tickRow-based moving weighted average
rolling_wavg_timeTime-based moving weighted average

Reach out

Have any questions for us? Reach out on Slack!