Deephaven's Query Strings

Deephaven query strings are the primary way of expressing commands directly to the Deephaven engine. They translate the user's intention into compiled code that the engine can execute. These query strings can contain a mix of Java and Python code and are the entry point to a universe of powerful built-in tools and Python-Java interoperability.

Syntax

Query strings are just Python strings that get passed into table operations. Deephaven highly recommends using double quotes to encapsulate query strings.

Here, the query string NewColumn = 1 defines a formula for the engine to execute, and the update table operation understands that formula as a recipe for creating a new column called NewColumn that contains the value of 1 in each row.

Literals

Query strings often use literals. How the engine interprets a literal depends on how it's written in the query string.

  • Literals not encapsulated by any special characters are interpreted as booleans, numeric values, column names, or variables.
  • Literals encapsulated in backticks (`) are interpreted as strings.
  • Literals encapsulated in single quotes (') are interpreted as date-time values.

The meta_table attribute is useful for assessing a table's schema. You can use it to confirm that the resulting columns are of the correct type.

Special variables and constants

Deephaven provides several special variables and constants. The most commonly used of these are i and ii. They represent the row indices as 32-bit and 64-bit integers, int and long, respectively.

NOTE: The special variables i and ii can only be used in append-only tables.

Additionally, Deephaven provides a range of common constants that can be accessed from query strings. These constants are always written with snake case in capital letters. They include minimum and maximum values for various data types, conversion factors for time types, and more. Of particular interest are the null constants for primitive types.

These are useful for representing and handling null values of a specific type. Built-in query language functions handle null values. For example, sqrt(NULL_DOUBLE) returns NULL_DOUBLE. Custom functions need to handle null values appropriately.

Common operations

Numeric types support mathematical operations such as +, -, *, /, and %.

String concatenation is also supported via the + operator.

The + and - operators are defined for date-time types, making arithmetic on timestamp data easy.

Logical operations, expressions, and comparison operators are supported.

Deephaven provides an inline conditional operator (ternary-if) that makes writing conditional expressions compact and efficient.

In Deephaven, typecasting is easy.

There are many more such operators supported in Deephaven. See the guide on operators to learn more.

Built-in functions

Aside from the common operations, Deephaven hosts a large library of functions known as built-in or auto-imported functions that can be used in query strings.

The numeric subset of this library is full of functions that perform common mathematical operations on numeric types. These include exponentials, trigonometric functions, random number generators, and more.

These functions can be combined with previously discussed literals and operators to generate complex expressions.

The built-in library also contains many functions for working with date-time values.

Functions that begin with parse are useful for converting strings to date-time types.

upperBin and lowerBin bin timestamps into buckets. They are particularly useful in aggregation operations, as aggregated statistics are commonly computed over temporal buckets.

The time user guide provides a comprehensive overview of working with date-time data in Deephaven.

These functions provide only a glimpse of what the built-in library offers. There are modules for sorting, searching, string parsing, null handling, and much more. See the document on auto-imported functions for a comprehensive list of what's available or the module summary page for a high-level overview of what's offered.

Java methods

The data structures that underlie Deephaven tables are Java data structures. So, many of the literals, operators, and functions we've spoken about are, at some level, Java. Java objects have methods attached to them that can be called from query strings, unlocking new levels of functionality and efficiency for Deephaven users.

To discover these, use meta_table to inspect a column's underlying data type.

This column is a Java Instant. Java's documentation provides all of the available methods that can be called. Here are just a few.

Some basic understanding of how to read Javadocs will help you make the most of these built-in methods.

Additionally, there are several ways to create Java objects for use in query strings. The following example uses (1) the new keyword and (2) the Python jpy package to create new instances of Java's URL class.

For more information, see the jpy guide.

Arrays

Deephaven tables can have array columns. Array columns often come from a group_by operation.

Many built-in functions support array arguments.

These results can then be ungrouped with ungroup, which is essentially the inverse of group_by.

NOTE: Aggregations done with deephaven.agg are more performant than with array functions.

Deephaven provides array indexing and slicing operations.

Lastly, the columns of a Deephaven table can be interpreted as arrays.

This functionality is only supported for static and append-only ticking tables. See working with arrays for more information.

Python in query strings

Python objects can be used in query strings. Variables are the simplest case.

Python functions can also be used.

So can classes.

Without any type casts or type hints, the Deephaven query engine cannot infer what datatype results from a Python function. It stores the result as a Java PyObject or Object.

This isn't ideal, as neither of these data types support many of the DQL features we've covered. To rectify this, Python functions should utilize type hints. The engine will infer the correct column types for functions that use type hints. Class methods don't support type hints yet, so a typecast in the query string is required.

To learn more about using Python in query strings, see Python in query strings.

Scoping in Deephaven follows Python's LEGB scoping rules. Functions that return tables or otherwise make use of query strings should pay careful attention to scoping details.

For more information, see scoping rules.

Be mindful of whether or not Python functions are stateless or stateful. Generally, stateless functions have no side effects - they don't modify any objects outside of their scope. Also, they are invariant to execution order, so function calls can be evaluated in any order without affecting the result. This stateless function extracts elements from a list in a query string.

get_element is stateless because it does not modify any objects outside its local scope. It could be evaluated in any order and give the same result.

Stateful functions modify objects outside their local scope - they do not leave the world as they found it. They also may depend on execution order. This stateful function achieves the same resulting table.

Print idx to verify it's been changed.

Now that get_element is stateful, it must be evaluated in the correct order to give the correct result.

Queries should use stateless functions whenever possible because:

  • They minimize side effects when called.
  • They are deterministic.
  • They can be efficiently parallelized.