Work with arrays

This guide will show you how to work with arrays in your query strings.

When performing complex analyses, arrays are an invaluable tool. Arrays group related data together and provide an easy way to access offset data from a time series. Arrays are built into the Deephaven Query Language.

Create an array from a column

Every column in a table has an associated array variable, which can be accessed by adding an underscore after the column name. For example, a column called X can be accessed as an array by using the column name X_.

In the following example, the data in column X can be accessed through the X_ array.

from deephaven import empty_table

source = empty_table(10).update(formulas=["X = ii"])
result = source.update(formulas=["A = X_"])

Let's walk through the query step-by-step.

First, we create a simple table with ten rows, each populated with a numeric value ranging from 0 to 9.
- Variable ii is a special, built-in variable that contains the row number and is useful for accessing arrays.
- As a consequence, each row in column X contains a value that indicates its index within the column. For example, the first row has a value of 0 while the fifth row has a value of 4.
Next, we create a new column A, using the data from the first column accessed as an array. Since A accessed as an array contains its values from each row, this puts a 10-element array in each row for column B.

Warning

The special variables, i and ii, are unreliable within a ticking table. Inconsistent results occur since previously created row indexes do not automatically update.

Access array elements

You can use bracket [] syntax to access elements in the array at specific indexes for your queries. The special variables i and ii can be useful for indexing based on row number.

In the following example, we access various elements from the data in column X to add columns to the result table.

from deephaven import empty_table

source = empty_table(10).update(formulas=["X = ii"])
result = source.update(formulas=["A = X_[ii - 1]", "B = X_[ii + 1]", "C = X_.size()"])

Column X is referred to as X_ to access it as an array, and then the values of specific rows within the column are accessed by using X_[ii-1] and X_[ii+1].
Columns A and B are created as offset values of column X by using this array access.
Column C contains the size of column X, defined by the X_.size() function.
Variable ii is a special, built-in variable that contains the row number and is useful for accessing arrays.
When you index an out-of-bounds element in the array, you get a null result.

Create arrays by grouping

Arrays can also be created using the group_by method to group data.

from deephaven import empty_table

source = empty_table(10).update(formulas=["X = (ii % 2 == 0) ? `A` : `B` ", "Y = ii"])

result = source.group_by(by=["X"])

The source table contains two columns:
- X with alternating values of A and B in each row,
- Y containing each row's index.
The result table uses the group_by operation to group based on the values in column X. The resulting grouped column contains arrays of values.

Access specific array elements

In the following example, we start by creating the result table from the previous example, then access specific array indexes from the Y column.

from deephaven import empty_table

result = (
    empty_table(10)
    .update(formulas=["X = (ii % 2 == 0) ? `A` : `B` ", "Y = ii"])
    .group_by(by=["X"])
)
indexing_result = result.update(formulas=["Element2 = Y[2]", "Element3 = Y[3]"])

Slice arrays

You can grab slices of an array by using the Array.subVector(fromIndexInclusive, toIndexExclusive) method. fromIndexInclusive and toIndexExclusive are both longs showing where the slice starts (inclusive) and ends (exclusive).

In the following example, we can make subarrays and also access specific elements from the subarrays.

from deephaven import empty_table

result = (
    empty_table(10)
    .update(formulas=["X = (ii % 2 == 0) ? `A` : `B` ", "Y = ii"])
    .group_by(by=["X"])
)
slice = result.update(
    formulas=[
        "SubArray = Y.subVector(2, 4)",
        "SubSlice = SubArray[1]",
    ]
)

Use aggregations on arrays

You can use aggregation functions on arrays. The following example uses the sum and avg functions on a column containing arrays.

from deephaven import empty_table

result = (
    empty_table(10)
    .update(formulas=["X = (ii % 2 == 0) ? `A` : `B` ", "Y = ii"])
    .group_by(by=["X"])
)
sum_result = result.update(formulas=["ArraySum = sum(Y)", "ArrayAvg = avg(Y)"])

Get array length

The len method returns the length of the given input. This is useful in query strings where a user needs to get the size of a Vector or a Java array.

from deephaven import empty_table

source = empty_table(10).update(formulas=["X = i"]).group_by()
result = source.update("LenX = len(X)")

Use the power of arrays in the Deephaven Query Language to make your queries more powerful and concise.