UpdateBy

UpdateBy provides users with a way to apply operations that must account for the values of other rows within the same table in an efficient manner that properly takes updates into account.

The updateBy method applies a set of user-specified operations to all rows within the table, producing a new table with the same index as the parent. Users may provide one or more operations to compute at once.

// For a single operation.
result = source.updateBy(@NotNull final UpdateByClause operations, final String... byColumns)

// For multiple operations,  Clauses may be grouped together with UpdateByClause.of()
result = source.updateBy(@NotNull final Collection<UpdateByClause> operations, final String... byColumns)

Note

UpdateByClause must be imported into groovy sessions before use with import com.illumon.iris.db.v2.updateby.UpdateByClause

Example

The following will perform a forward fill on one column and a cumulative sum on another grouped by Sym.

result = parent.updateBy(UpdateByClause.of(
                         UpdateByClause.fill("Price"),
                         UpdateByClause.sum("VolSum=Volume")), "Sym")

from Deephaven import UpdateBy as ubc
result = parent.updateBy(ubc.of(\
                         ubc.fill("Price")\
                         ubc.sum("VolSum=Volume")), "Sym)

Forward Fill

Forward fill will replace any null values found in the current column and group with the most recent non-null value that occurred before each row. If no columns are passed, all non-grouping columns will be filled.

UpdateByClause.fill(String... columnsToFill)

ubc.fill(columns_to_fill)

Cumulative Sum

Cumulative Sum computes the sum of each row and all previous rows. If no columns are passed, all non-grouping columns will be computed. Only numerical types such as byte, short, int, long ,float, double, BigInteger and BigDecimal may be summed The result column for integral types such as byte, short, int and long will be long. The result column for float, double, BigDecimal and BigInteger will be the same type.

UpdateByClause.sum(String... columnsToSum)

ubc.sum(columns_to_sum)

Cumulative Product

Cumulative Product computes the product of each row and all previous rows. If no columns are passed, all non-grouping columns will be computed. Only numerical types such as byte, short, int, long ,float, double, BigInteger and BigDecimal may participate in a cumulative product. The result column for integral types such as byte, short, int and long will be long. The result column for floating point types float and double, as well as BigDecimal and BigInteger will be the same type.

UpdateByClause.prod(String... columns)

ubc.prod(columns)

Cumulative Minimum and Maximum

Cumulative Minimum and Maximum computes the minimum or maximum value of a column at each row. If no columns are passed, all non-grouping columns will be computed.

UpdateByClause.min(String... columns)
UpdateByClause.max(String... columns)

ubc.min(columns)
ubc.max(columns)

Exponential Moving Average (EMA)

Compute an EMA for a column. Deephaven supports computing EMAs using either time or ticks as the base time unit. The formula used for the tick-based version is:

a = e^{-1/\tau}

Where the time decay unit, $\tau$ , is the number of ticks.

The formula for the time-based version is:

a = e^{-\Delta t/\tau}

Where the time decay unit, $\tau$ , is time in nanoseconds.

The EMA function is:

EMA(col_n) = a * EMA(col_{n-1}) + (1 - a) * col_n

You can also control how the EMA function handles different forms of bad input data, including null values, null timestamps, negative timestamps, NaNs and $\Delta t = 0$ using the EmaControl object.

See the Javadoc for more details.

UpdateByClause.ema("Timestamp",
        30 * MINUTE,
        EmaControl.get().onNullValue(BadDataBehavior.Reset),
        "Price")

from deephaven import UpdateBy as ubc
from deephaven.UpdateBy import EmaControl as emac

ubc.ema("Timestamp", \
        30 * MINUTE, \
        emac.get().onNullValue(BadDataBehavior.Reset), \
        "Price")