Work with arrays
This guide will show you how to work with arrays in your query strings.
When performing complex analyses, arrays are an invaluable tool. Arrays group related data together and provide an easy way to access offset data from a time series. Arrays are built into the Deephaven Query Language.
Create an array from a column
Every column in a table has an associated array variable, which can be accessed by adding an underscore after the column name. For example, a column called X
can be accessed as an array by using the column name X_
.
In the following example, the data in column X
can be accessed through the X_
array.
from deephaven import empty_table
source = empty_table(10).update(formulas=["X = ii"])
result = source.update(formulas=["A = X_"])
- source
- result
Let's walk through the query step-by-step.
- First, we create a simple table with ten rows, each populated with a numeric value ranging from 0 to 9.
- Variable
ii
is a special, built-in variable that contains the row number and is useful for accessing arrays. - As a consequence, each row in column
X
contains a value that indicates its index within the column. For example, the first row has a value of 0 while the fifth row has a value of 4.
- Variable
- Next, we create a new column
A
, using the data from the first column accessed as an array. SinceA
accessed as an array contains its values from each row, this puts a 10-element array in each row for column B.
The special variables, i
and ii
, are unreliable within a ticking table. Inconsistent results occur since previously created row indexes do not automatically update.
Access array elements
You can use bracket []
syntax to access elements in the array at specific indexes for your queries. The special variables i
and ii
can be useful for indexing based on row number.
In the following example, we access various elements from the data in column X
to add columns to the result table.
from deephaven import empty_table
source = empty_table(10).update(formulas=["X = ii"])
result = source.update(formulas=["A = X_[ii - 1]", "B = X_[ii + 1]", "C = X_.size()"])
- source
- result
- Column
X
is referred to asX_
to access it as an array, and then the values of specific rows within the column are accessed by usingX_[ii-1]
andX_[ii+1]
. - Columns
A
andB
are created as offset values of columnX
by using this array access. - Column
C
contains the size of columnX
, defined by theX_.size()
function. - Variable
ii
is a special, built-in variable that contains the row number and is useful for accessing arrays. - When you index an out-of-bounds element in the array, you get a null result.
Create arrays by grouping
Arrays can also be created using the group_by
method to group data.
from deephaven import empty_table
source = empty_table(10).update(formulas=["X = (ii % 2 == 0) ? `A` : `B` ", "Y = ii"])
result = source.group_by(by=["X"])
- source
- result
- The source table contains two columns:
X
with alternating values ofA
andB
in each row,Y
containing each row's index.
- The result table uses the
group_by
operation to group based on the values in column X. The resulting grouped column contains arrays of values.
Access specific array elements
In the following example, we start by creating the result table from the previous example, then access specific array indexes from the Y
column.
from deephaven import empty_table
result = (
empty_table(10)
.update(formulas=["X = (ii % 2 == 0) ? `A` : `B` ", "Y = ii"])
.group_by(by=["X"])
)
indexing_result = result.update(formulas=["Element2 = Y[2]", "Element3 = Y[3]"])
- result
- indexing_result
Slice arrays
You can grab slices of an array by using the Array.subVector(fromIndexInclusive, toIndexExclusive)
method. fromIndexInclusive
and toIndexExclusive
are both longs showing where the slice starts (inclusive) and ends (exclusive).
In the following example, we can make subarrays and also access specific elements from the subarrays.
from deephaven import empty_table
result = (
empty_table(10)
.update(formulas=["X = (ii % 2 == 0) ? `A` : `B` ", "Y = ii"])
.group_by(by=["X"])
)
slice = result.update(
formulas=[
"SubArray = Y.subVector(2, 4)",
"SubSlice = SubArray[1]",
]
)
- result
- slice
Use aggregations on arrays
You can use aggregation functions on arrays. The following example uses the sum
and avg
functions on a column containing arrays.
from deephaven import empty_table
result = (
empty_table(10)
.update(formulas=["X = (ii % 2 == 0) ? `A` : `B` ", "Y = ii"])
.group_by(by=["X"])
)
sum_result = result.update(formulas=["ArraySum = sum(Y)", "ArrayAvg = avg(Y)"])
- result
- sum_result
Get array length
The len
method returns the length of the given input. This is useful in query strings where a user needs to get the size of a Vector
or a Java array.
from deephaven import empty_table
source = empty_table(10).update(formulas=["X = i"]).group_by()
result = source.update("LenX = len(X)")
- source
- result
Use the power of arrays in the Deephaven Query Language to make your queries more powerful and concise.