How to programmatically generate query strings with Python
The Deephaven Query Language allows users to write very powerful queries to filter and modify tables of data. As an example, consider this case that uses a formula to compute a new column and another formula to filter the resulting table.
from deephaven import new_table
from deephaven.column import int_col
source = new_table([int_col("Value", [0, 1, 2, 3, 4, 5, 6])])
result = source.update("X = sqrt(Value) + i").where("2 < X && X < 8")
- result
- source
At their heart, Deephaven query strings are just Python strings. As such, all of the power of Python can be used to generate query strings. This can be convenient when working with complex queries. Let's work though a few examples that are simplified by using Python to generate query strings.
This guide assumes you are familiar with the use of strings, f-strings, loops, and list comprehension in Python. If not, please refer to the Python documentation for more information.
Many columns
In practice, queries may have a large number of inputs, making it inconvenient to type in each column name. Other times, the input column names are determined by user inputs and are not known when the query is written. Both of these situations can be addressed by using a list of column names to generate queries.
In the following example, an f-string and str.join
are used to create a query string to sum up all of the columns and then take the square root.
from deephaven import new_table
from deephaven.column import int_col
cols = ["A", "B", "C", "D"]
source = new_table([int_col(c, [0, 1, 2, 3, 4, 5, 6]) for c in cols])
result = source.update(f"X = sqrt(sum({','.join(cols)}))")
- result
- source
If the list of columns changes, the query string programatically adapts:
from deephaven import new_table
from deephaven.column import int_col
cols = ["A", "B", "C", "D", "E"]
source = new_table([int_col(c, [0, 1, 2, 3, 4, 5, 6]) for c in cols])
result = source.update(f"X = sqrt(sum({','.join(cols)}))")
- result
- source
Repeated logic
Some queries repeat the same logic -- with minor tweaks. For example, a query may add columns containing data from 1, 5, and 10 minutes ago. Generated query strings can also help simplify these situations.
In the following example, an f-string is used to create columns of data from 1, 5, and 10 rows before.
from deephaven import empty_table
source = empty_table(100).update("X = ii")
offsets = [1, 5, 10]
result = source
for offset in offsets:
result = result.update(f"X{offset} = X_[ii-{offset}]")
- result
- source
This can be simplified further by using a list comprehension.
from deephaven import empty_table
source = empty_table(100).update("X = ii")
offsets = [1, 5, 10]
result = source.update([f"X{offset} = X_[ii-{offset}]" for offset in offsets])
- result
- source
Be creative!
Programatically generating query strings works for all Deephaven operations, not just update
. For example, this case uses multiple programatically generated query strings while performing a join.
from deephaven import empty_table
source = empty_table(100).update(["X = ii", "Y = X", "Z = sqrt(X)"])
offsets = [1, 5, 10]
result = source
for offset in offsets:
result = result.natural_join(
source.update(f"XOffset = X+{offset}"),
on="X=XOffset",
joins=[f"Y{offset}=Y", f"Z{offset}=Z"],
)
- result
- source