Master groovy closures

Advanced Formulas and Groovy Closures

A Groovy closure is a function stored as a variable. They can be used to embed complex logic formulas and filters in the Deephaven Query Language.

Defining a closure

myClosure = { String var1, int var2, double var3 ->
     /* code goes here */
}

For example, the following closure will return a string stating the values of the arguments ‘x’ and ‘str’:

myClosure = { int x, String str ->
     return "myClosure called with x=" + x + " and str='" + str + '"'
}

Invoking a closure

In Groovy code, a closure can be invoked like a regular method, or through the callmethod. The following two statements are equivalent:

println myClosure(12, "hello")
println myClosure.call(12, "hello")

Closures can also be used in the Deephaven Query Language:

myTable = newTable(intCol("IntCol", 12, 14), col("StrCol", "hello", "goodbye"))
myTable2 = myTable.update("Result = (String) myClosure.call(IntCol, StrCol)")

Note that the closure return value must be explicitly cast to the expected type. Since it is not possible to automatically determine the return type of a closure at compile time, a closure result will be typed as Object by default, which will reduce its usefulness later in the query.

Handling Nulls

Deephaven represents null data for primitive types (e.g., int, long, double) using sentinel values. Closures should be written to detect and handle these values. Nulls can be identified with the isNull() method (for both primitive and reference types), and null primitives can be returned by using the appropriate constant from com.illumon.util.QueryConstants. These constants (NULL_INT, NULL_LONG, NULL_DOUBLE, etc.) are automatically imported in Deephaven Groovy consoles.

A common practice is to return null if any of the closure arguments are null, as in the nullCheckingClosure example below.

nullCheckingClosure = { int arg1, double arg2, String arg3 ->
    if (isNull(arg1) || isNull(arg2) || isNull(arg3)) {
        return NULL_LONG
    }

    /* proceed to calculate and return a non-null long value */
}

The example closure myClosure from above can be changed to handle nulls by checking whether the integer argument x is null before converting it to a string:

myClosureNullCheck = { int x, String str ->
    String xStr = isNull(x) ? null : Integer.toString(x)
    return "myClosure called with x=" + xStr + " and str='" + str + '"'
}

The following code snippet demonstrates the difference between the original and null-safe versions of the closure:

myTableWithNull = newTable(intCol("IntCol", 12, 14, NULL_INT), col("StrCol", "hello", "goodbye", null))
myTableWithNull2 = myTableWithNull.update(
    "ResultNoNullCheck = (String) myClosure.call(IntCol, StrCol)",
    "ResultWithNullCheck = (String) myClosureNullCheck.call(IntCol, StrCol)"
)

Common Uses for Closures

Advanced logic in filters or formulas

Closures can be used to define some multi-step formulas that are difficult to implement in the query language. This can include parsing or cleaning data, complex calculations, and calls to external libraries.

For example, consider a dataset where one field has been overloaded to contain three values: a food item’s name, its category, and the number in inventory:

FoodInfoStr
apple_fruit:24
banana_fruit:8
onion_vegetable:5
chocolate_candy:15
foods = newTable(col("FoodInfoStr", "apple_fruit:24", "banana_fruit:8", "onion_vegetable:5", "chocolate_candy:15"))

If we are only interested in the middle term — fruit, vegetable, or candy — we can write a closure to extract it from the strings in the DataStr column:

parseCategory = { String str ->
    if (str == null) return null
    int startOfCategoryIdx = str.indexOf("_")
    int endOfCategoryIdx = str.lastIndexOf(":")
    if (startOfCategoryIdx < 0 || endOfCategoryIdx < 0) {
    return null
    }
    return str.substring(startOfCategoryIdx+1, endOfCategoryIdx)
}

Using the closure in an update statement, we can parse out the category into a new column:

foods2 = foods.update("Category = (String) parseCategory.call(FoodInfoStr)”)

FoodInfoStrCategory
apple_fruit:24fruit
banana_fruit:8fruit
onion_vegetable:5vegetable
chocolate_candy:15candy

Closures also provide a way of accessing additional language features, such as switch statements, that are not available in query language expressions. The findIsHealthy closure below uses a switch statement to classify:

findIsHealthy = { String category ->
    switch (category) {
        case "fruit":
        case "vegetable":
            return true;
        case "candy":
            return false;
        default:
            return null;
    }
}
foods3 = foods2.update("IsHealthy = (Boolean) findIsHealthy.call(Category)")
FoodInoStrCategoryIsHealthy
apple_fruit:24fruittrue
banana_fruit:8fruittrue
onion_vegetable:5vegetabletrue
chocolate_candy:15candyfalse

It is also possible to parse multiple fields with a single closure.

parseCategoryAndQty = { String str ->
    if (str == null) return null
    int startOfCategoryIdx = str.indexOf("_")
    int endOfCategoryIdx = str.lastIndexOf(":")

    String category;
    Integer qty
    if (startOfCategoryIdx < 0 || endOfCategoryIdx < 0) {
        category = null
        qty = null
    } else {
        category = str.substring(startOfCategoryIdx+1, endOfCategoryIdx)
    try {
        qty = Integer.parseInt(str.substring(endOfCategoryIdx+1))
    } catch (Exception ex) {
        qty = null
      }
    }
    return [ category, qty ] as Object[] // return the results in an array
}

foods2 = foods
    .update("InfoParsed= (Object[]) parseCategoryAndQty.call(FoodInfoStr)")
    .updateView(
        "Category = (String) InfoParsed[0]",
        "Qty = (Integer) InfoParsed[1]"
    )

Iterating over a table with stateful processing

When evaluating a formula with the update() method, Deephaven will process the table rows sequentially. This allows closures to be used for calculations that require maintaining state while iterating over a table.

The calcEma example below is a closure that will calculate an EMA of the values in a table.

emaState = NULL_DOUBLE
calcEma = { double value ->
    if(!isNull(value)) {
        if(isNull(emaState)) {
            emaState = value
        } else {
            emaState = emaState * 0.5 + value * 0.5
        }
     }
     return emaState
}

emaTable = newTable(
     doubleCol("MyDouble", 5, 3, 3, 7, 1, NULL_DOUBLE, 7, 8, 9, 10)
).update("Ema = (double) calcEma.call(MyDouble)")

Array column processing

Closures can be useful for writing custom operations involving arrays, which can be used to supplement the dozens of built-in array operations.

The following closure will iterate through a DbDoubleArray and return the index of the minimum value within the array, or NULL_LONG if the array is null, empty, or contains only nulls.

import com.illumon.iris.db.tables.dbarrays.DbIntArray
findIndexOfMin = { DbIntArray x ->
    long xSize = x.size()
    long idxOfMin = NULL_LONG
    int oldMin = NULL_INT
    int ii
    for(ii = 0; ii < xSize; ii++) {
        int val = x.get(ii)
        if(!isNull(val) && (isNull(oldMin) || val < oldMin)) {
            oldMin = val
            idxOfMin = ii
        }
    }
    return idxOfMin
}

The code snippet below demonstrates Calling the closure with the example table below find the index of the minimum value:

myTable = newTable(
    col("StrCol", "A", "A", "A", "B", "B", "B"),
    intCol("IntCol", 4, 0, 3, NULL_INT, 10, 1)
)
myTable2 = myTable
    .by("StrCol") // roll up data into arrays for each value of StrCol
    .update("IdxOfMinVal = (long) findIndexOfMin.call(IntCol)")
    .updateView("MinVal = IntCol[IdxOfMinVal]")

img

StrColIntCol
A4
A0
A3
BNULL_INT
B10
B1

img

StrColIntColIdxOfMinValMinVal
A[4, 0, 3]10
B[NULL_INT, 10, 1]21

Additional Concerns

Developers working with fast-ticking data or very large datasets should take care to ensure closures perform well and are thread-safe.

Performance

It is best to use static types in closures, as this allows Groovy to compile them into much faster code.

Consider the following closure, which declares arguments without explicit data types:

xPlusY = { x, y ->
    return x + y
}

Since Groovy cannot tell the data types of x and y, it must check what they are at runtime and what "plus" should mean for those two arguments. If x and y are numbers, it will add x and y; if x or y is a string, it will concatenate x and y. Checking the data types and determining how to handle them can be much slower than the addition or concatenation itself.

When the data types of are known in advance, the ambiguity can be removed explicitly specifying them, allowing the Groovy compiler to create more efficient code:

parseCategoryAndQty = { String x, String y ->
    return x + y
}

Synchronization

Care must be taken when adding, modifying, or removing binding variables within a closure, as the binding is an unsynchronized global map. A closure that uses binding variables — such as the calcEma closure above — must only be called with update(), never updateView(), to ensure that it does not alter the binding without appropriate synchronization.

If a closure must mutate data that is modified or read outside the LiveTableMonitor lock, external synchronization must be used, such as with an AtomicInteger or an explicit synchronized block. (Note that implementing caches in closures is typically unnecessary, as caching of expensive calculations is possible with the lazyUpdate method.)

Synchronization concerns are not relevant for closures that only access variables within the closure scope.