An overview of query strings

This page provides a succinct overview of query strings in Deephaven. Query strings are how you interact with Deephaven's query engine. They are used in almost every query written, so a conceptual understanding of the Deephaven Query Language (DQL) and query strings is essential for writing correct, performant, and effective queries. The guides are organized to help you learn everything you need to know about query strings and the DQL. This page provides a high-level overview of each item in the sidebar. For a more in-depth explanation of each topic, refer to the corresponding page.

What is a query string?

A query string is a string that gets passed into a table operation as input. More specifically, a query string gets passed to certain table operations such as select, where, and more. A single query string is used to either create a new column or filter data in a table.

Query strings utilize the "Deephaven Query Language," or DQL for short. DQL is simple and natural to use. It combines features of Java and Python to allow complex and powerful queries. Don't worry; you don't need to be proficient in Java to write query strings. Query strings can use Python variables, functions, and classes as well as Java variables, functions, and classes.

The following code block provides an example of a simple query string that creates a new column called NewColumn in a table called source. The string passed into update is the query string. The query string NewColumn = 1 defines a formula, where the left-hand side (LHS) is the name of the column to be created and the right-hand side (RHS) defines the values that will be assigned to that column. In this case, each value in NewColumn is 1.

from deephaven import empty_table

source = empty_table(10).update("NewColumn = 1")

Query string examples

A query string defines a formula or a filter:

Query strings can contain a wide array of expressions. Different types of expressions are explored via examples in the following subsections.

Formulas

Formulas contain assignments. These assignments usually come in the form of LHS = RHS, where the left-hand-side (LHS) contains the name of the new column that will be created, and the right-hand-side (RHS) defines the values that will be assigned to that column.

The following example creates a table with five columns. Each column is created via a formula:

from deephaven import empty_table

source = empty_table(20).update(
    [
        "X = 0.2 * ii",
        "Y = sin(X)",
        "Z = (ii % 2 == 0) ? `Even` : `Odd`",
        "RandomNumber = randomDouble(-1, 1)",
        "Timestamp = now() + ii * SECOND",
    ]
)

The formulas used above all leverage query language built-ins including built-in constants, built-in variables, and built-in functions.

Filters

Filters evaluate to True/False and are used to determine which rows are included in the result table. Consider the following example, which uses the table created in the previous section:

result_lessthan = source.where("X <= 1.2")
result_even = source.where("Z == `Even`")
result_rng = source.where("RandomNumber >= 0.21")
result_conjunctive = source.where(["X <= 2.4", "Z == `Even`"])
result_disjunctive = source.where("X > 1.6 || Z == `Even`")

For more on filtering table data, see Use filters.

Literals

A literal is a fixed value that's used in a query string. The previous section used numeric literals in some of the filter statements. The value is defined in the query string itself, not outside of it and then passed in as a parameter. The following example creates a table with several columns, each containing a literal value. The table metadata is included to show the resultant column data types.

from deephaven import empty_table

literals = empty_table(1).update(
    [
        "BooleanLiteral = true",  # Boolean literal - note the lowercase `true`
        "IntLiteral = 42",  # 32-bit integer literal
        "LongLiteral = 42L",  # 64-bit integer literal
        "DoubleLiteral = 3.14",  # 64-bit floating point literal
        "StringLiteral = `Hello, world!`",  # String literal, enclosed in backticks (`)
        "DateTimeLiteral = '2023-01-01T00:00:00Z'",  # Date-time (Instant) literal, enclosed in single quotes (')
    ]
)
literals_meta = literals.meta_table

For more on literals, see any of the following links:

Query language built-ins

The query language offers many constants, variables, and functions that can be called with no additional import statements or setup. These built-ins are generally the most performant way to accomplish tasks in a query string. There are too many built-ins to list here; the following query uses just a few of them:

from deephaven import empty_table

source = empty_table(10).update(
    [
        "X = 0.1 * ii",
        "Y = sin(X)",
        "NoisyY = Y + randomDouble(-0.2, 0.2)",
        "MaxLong = MAX_LONG",
        "NullDouble = NULL_DOUBLE",
        "Letter = (ii % 2 == 0) ? `A` : `B`",
        "Timestamp = now() + ii * SECOND",
    ]
)

The above example uses the following built-ins:

  • ii: A built-in variable representing the current row number as a 64-bit integer.
  • sin: The sine function.
  • randomDouble: A function that generates a random double floating-point precision value between the two input arguments.
  • MAX_LONG: A constant that equals the maximum value of a Java primitive long.
  • NULL_DOUBLE: A constant that equals null for a Java primitive double.
  • now(): A function that returns the current date-time as an Instant.

For more on query language built-in constants, variables, and functions, see:

Python

Deephaven's Python-Java interoperability allows you to use Python code in query strings. This powerful feature is facilitated by jpy, a bidirectional Python-Java bridge usable from both languages.

Caution

Care should be taken when calling Python in query strings for performance reasons.

The following example uses a Python variable, function, and class in a query string.

from deephaven import empty_table
from random import choice
import numpy as np

names_list = ["James", "Jessica", "Albert", "Ophelia", "Sophia", "Mark"]
python_variable = 3


def rand_name() -> str:
    return choice(names_list)


class MyClass:
    a = 1
    my_value = 3.14

    def __init__(self, b):
        self.b = b

    @classmethod
    def change_value(cls, new_value) -> np.intc:
        MyClass.my_value = new_value
        return new_value


my_class = MyClass(2)

source = empty_table(10).update(
    [
        "PythonVariable = python_variable",
        "Name = rand_name()",
        "X = MyClass.a",
        "Y = my_class.b",
        "Z = (int)MyClass.change_value(5)",
    ]
)

Arrays

Columns can contain array data. These array columns are most commonly created with the group_by table operation. Deephaven supports the manipulation of array columns, and provides a suite of built-in array methods for working with them.

The following example creates an array column, then uses some built-in array methods to access elements and subsets of elements in the array.

from deephaven import empty_table

source = empty_table(10).update("X = ii").group_by()
result = source.update(
    [
        "SubsetX = X.subVector(3, 8)",
        "StringX = X.toString(10)",
        "FifthElement = X.get(5)",
    ]
)
result_meta = result.meta_table

Additionally, all columns in tables are backed by arrays. You can leverage this with the underscore operator (_). For example, the following code grabs previous and next elements from a column:

Caution

The special row variable ii is not safe in ticking tables.

from deephaven import empty_table

source = empty_table(10).update("X = ii")
result = source.update(
    [
        "PreviousElement = X_[ii - 1]",
        "NextElement = X_[ii + 1]",
    ]
)
result_meta = result.meta_table

For full coverage on arrays in tables, see Arrays in Deephaven.