Exponential moving standard deviation by group with time as the decay unit
uby_emstd_time.Rd
Creates an exponential moving standard deviation (EMSTD) UpdateByOp for each column in cols
, using time as the decay unit.
Arguments
- decay_time
ISO-8601-formatted duration string specifying the decay rate.
- cols
String or list of strings denoting the column(s) to operate on. Can be renaming expressions, i.e. “new_col = col”. Default is to compute the exponential moving standard deviation for all non-grouping columns.
- operation_control
OperationControl that defines how special cases will behave. See
?op_control
for more information.
Details
The formula used is $$a_i = e^{\frac{-dt_i}{\tau}}$$ $$s^2_0 = 0$$ $$s^2_i = a_i*(s^2_{i-1} + (1-a_i)*(x_i - \bar{x}_{i-1})^2)$$ $$s_i = \sqrt{s^2_i}$$
Where:
\(dt_i\) is the difference between time \(t_i\) and \(t_{i-1}\) in nanoseconds.
\(\tau\) is
decay_time
in nanoseconds, an input parameter to the method.\(\bar{x}_i\) is the exponential moving average of column \(X\) at step \(i\).
\(s_i\) is the exponential moving standard deviation of column \(X\) at time step \(i\).
\(x_i\) is the current value.
\(i\) denotes the time step, ranging from \(i=1\) to \(i = n-1\), where \(n\) is the number of elements in \(X\).
Note that in the above formula, \(s^2_0 = 0\) yields the correct results for subsequent calculations. However,
sample variance for fewer than two data points is undefined, so the first element of an EMSTD calculation will always be NaN
.
This function acts on aggregation groups specified with the by
parameter of the update_by()
caller function.
The aggregation groups are defined by the unique combinations of values in the by
columns. For example,
if by = c("A", "B")
, then the aggregation groups are defined by the unique combinations of values in the
A
and B
columns.
This function, like other Deephaven uby
functions, is a generator function. That is, its output is another
function called an UpdateByOp
intended to be used in a call to update_by()
. This detail is typically
hidden from the user. However, it is important to understand this detail for debugging purposes, as the output of
a uby
function can otherwise seem unexpected.
For more information, see the vignette on uby
functions by running
vignette("update_by")
.
Examples
if (FALSE) { # \dontrun{
library(rdeephaven)
# connecting to Deephaven server
client <- Client$new("localhost:10000", auth_type = "psk", auth_token = "my_secret_token")
# create data frame, push to server, retrieve TableHandle
df <- data.frame(
timeCol = seq.POSIXt(as.POSIXct(Sys.Date()), as.POSIXct(Sys.Date() + 0.01), by = "1 sec")[1:500],
boolCol = sample(c(TRUE, FALSE), 500, TRUE),
col1 = sample(10000, size = 500, replace = TRUE),
col2 = sample(10000, size = 500, replace = TRUE),
col3 = 1:500
)
th <- client$import_table(df)
# compute 10-second exponential moving standard deviation of col1 and col2
th1 <- th$
update_by(uby_emstd_time(ts_col = "timeCol", decay_time = "PT10s", cols = c("col1Emstd = col1", "col2Emstd = col2")))
# compute 5-second exponential moving standard deviation of col1 and col2, grouped by boolCol
th2 <- th$
update_by(uby_emstd_time(ts_col = "timeCol", decay_time = "PT5s", cols = c("col1Emstd = col1", "col2Emstd = col2")), by = "boolCol")
# compute 20-second exponential moving standard deviation of col1 and col2, grouped by boolCol and parity of col3
th3 <- th$
update("col3Parity = col3 % 2")$
update_by(uby_emstd_time(ts_col = "timeCol", decay_time = "PT20s", cols = c("col1Emstd = col1", "col2Emstd = col2")), by = c("boolCol", "col3Parity"))
client$close()
} # }