Skip to main content
Version: Python

Deephaven's Query Strings

Deephaven query strings are the primary way of expressing commands directly to the Deephaven engine. They translate the user's intention into compiled code that the engine can execute. These query strings can contain a mix of Java and Python code and are the entry point to a universe of powerful built-in tools and Python-Java interoperability.

Syntax

Query strings are just Python strings that get passed into table operations. Deephaven highly recommends using double quotes to encapsulate query strings.

from deephaven import empty_table

t = empty_table(10).update("NewColumn = 1")

Here, the query string NewColumn = 1 defines a formula for the engine to execute, and the update table operation understands that formula as a recipe for creating a new column called NewColumn that contains the value of 1 in each row.

Literals

Query strings often use literals. How the engine interprets a literal depends on how it's written in the query string.

  • Literals not encapsulated by any special characters are interpreted as booleans, numeric values, column names, or variables.
  • Literals encapsulated in backticks (`) are interpreted as strings.
  • Literals encapsulated in single quotes (') are interpreted as date-time values.
literals = empty_table(10).update(
[
# Booleans are represented by 'true' and 'false'
"BooleanCol = true",
# 32b integers are represented by integers such as '1'
"IntCol = 1",
# 64b integers are represented by integers with an 'L' suffix such as '10L'
"LongCol = 10L",
# 32b floating point numbers are represented by numbers with an 'f' suffix such as '1.2345f'
"FloatCol = 1.2345f",
# 64b floating point numbers are represented by numbers with a decimal point such as '2.3456'
"DoubleCol = 2.3456",
# Strings must be enclosed with backticks
"StringCol = `Hello!`",
# Date-time values must be enclosed with single quotes
"DateCol = '2020-01-01'",
"TimeCol = '12:30:45.000'",
"DateTimeCol = '2020-01-01T12:30:45.000Z'",
# Durations must be enclosed with single quotes and have a leading PT
"DurationCol = 'PT1m'",
# For object types, 'null' is the literal for null values
"NullCol = null",
]
)

The meta_table attribute is useful for assessing a table's schema. You can use it to confirm that the resulting columns are of the correct type.

literals_meta = literals.meta_table

Special variables and constants

Deephaven provides several special variables and constants. The most commonly used of these are i and ii. They represent the row indices as 32-bit and 64-bit integers, int and long, respectively.

special_vars = empty_table(10).update(["IdxInt = i", "IdxLong = ii"])

special_meta = special_vars.meta_table

NOTE: The special variables i and ii can only be used in append-only tables.

Additionally, Deephaven provides a range of common constants that can be accessed from query strings. These constants are always written with snake case in capital letters. They include minimum and maximum values for various data types, conversion factors for time types, and more. Of particular interest are the null constants for primitive types.

null_values = empty_table(1).update(
[
"StandardNull = null",
"BoolNull = NULL_BOOLEAN",
"CharNull = NULL_CHAR",
"ByteNull = NULL_BYTE",
"ShortNull = NULL_SHORT",
"IntNull = NULL_INT",
"LongNull = NULL_LONG",
"FloatNull = NULL_FLOAT",
"DoubleNull = NULL_DOUBLE",
]
)

null_values_meta = null_values.meta_table

These are useful for representing and handling null values of a specific type. Built-in query language functions handle null values. For example, sqrt(NULL_DOUBLE) returns NULL_DOUBLE. Custom functions need to handle null values appropriately.

Common operations

Numeric types support mathematical operations such as +, -, *, /, and %.

math_ops = empty_table(10).update(
[
"Col1 = ii",
# Math operations work between columns and literals
"Col2 = 10 * (Col1 + 1.2345) / 4",
# And they work between columns
"Col3 = (Col1 - Col2) % 6",
]
)

String concatenation is also supported via the + operator.

add_strings = empty_table(10).update("StringCol = `Hello ` + `there!`")

The + and - operators are defined for date-time types, making arithmetic on timestamp data easy.

time_ops = empty_table(10).update(
[
# Times in nanoseconds can be added or subtracted from date-times
"Timestamp = '2021-07-11T12:00:00.000Z' + (ii * HOUR)",
# Durations or Periods can be added or subtracted from date-times
"TimestampPlusOneYear = Timestamp + 'P365d'",
"TimestampMinusOneHour = Timestamp - 'PT1h'",
# Timestamps can be subtracted to get their difference in nanoseconds
# Use constants for unit conversion
"DaysBetween = ('2023-01-02T20:00:00.000Z' - Timestamp) / DAY",
]
)

Logical operations, expressions, and comparison operators are supported.

logical_ops = empty_table(10).update(
[
"IdxCol = ii",
"IsEven = IdxCol % 2 == 0",
"IsDivisibleBy6 = IsEven && (IdxCol % 3 == 0)",
"IsGreaterThan6 = IdxCol > 6",
]
)

Deephaven provides an inline conditional operator (ternary-if) that makes writing conditional expressions compact and efficient.

conditional = empty_table(10).update(
[
# Read as '<condition> ? <if-true> : <if-false>'
"Parity = ii % 2 == 0 ? `Even!` : `Odd...`",
# Any logical expression is a valid condition
"IsDivisibleBy6 = ((ii % 2 == 0) && (ii % 3 == 0)) ? true : false",
# In-line conditionals can be chained together
"RandomNumber = randomGaussian(0.0, 1.0)",
"Score = RandomNumber < -1.282 ? `Bottom 10%` : RandomNumber > 1.282 ? `Top 10%` : `Middle of the pack`",
]
)

In Deephaven, typecasting is easy.

typecasted = empty_table(10).update(
[
"LongCol = ii",
# Use (type) to cast a column to the desired type
"IntCol = (int) LongCol",
"FloatCol = (float) LongCol",
# Or, use a cast as an intermediate step
"IntSquare = i * (int) LongCol",
]
)

There are many more such operators supported in Deephaven. See the guide on operators to learn more.

Built-in functions

Aside from the common operations, Deephaven hosts a large library of functions known as built-in or auto-imported functions that can be used in query strings.

The numeric subset of this library is full of functions that perform common mathematical operations on numeric types. These include exponentials, trigonometric functions, random number generators, and more.

math_functions = empty_table(10).update(
[
# Each function can be applied to many different types
"Exponential = exp(ii)",
"NatLog = log(i)",
"Square = pow((float)ii, 2)",
"Sqrt = sqrt((double)i)",
"Trig = sin(ii)",
# Functions can be composed
"AbsVal = abs(cos(ii))",
"RandInt = randomInt(0, 5)",
"RandNormal = randomGaussian(0, 1)",
]
)

These functions can be combined with previously discussed literals and operators to generate complex expressions.

fake_data = empty_table(100).update(
[
"Group = randomInt(1, 4)",
"GroupIntercept = Group == 1 ? 0 : Group == 2 ? 5 : 10",
"GroupSlope = abs(GroupIntercept * randomGaussian(0, 1))",
"GroupVariance = pow(sin(Group), 2)",
"Data = GroupIntercept + GroupSlope * ii + randomGaussian(0.0, GroupVariance)",
]
)

The built-in library also contains many functions for working with date-time values.

time_functions = empty_table(10).update(
[
# The now() function provides the current time according to the Deephaven clock
"CurrentTime = now()",
"Timestamp = CurrentTime + (ii * DAY)",
# Many time functions require timezone information, typically provided as literals
"DayOfWeek = dayOfWeek(Timestamp, 'ET')",
"DayOfMonth = dayOfMonth(Timestamp, 'ET')",
"Weekend = DayOfWeek == 6 || DayOfWeek == 7 ? true : false",
"SecondsSinceY2K = nanosToSeconds(Timestamp - '2000-01-01T00:00:00 ET')",
]
)

Functions that begin with parse are useful for converting strings to date-time types.

string_date_times = empty_table(10).update(
[
"DateTime = `2022-04-03T19:34:22.000 UTC`",
"Date = `2013-07-07`",
"Time = `14:04:39.123`",
"Duration = `PT6m`",
"Period = `P10d`",
"TimeZone = `America/Chicago`",
]
)

converted_date_times = string_date_times.update(
[
"DateTime = parseInstant(DateTime)",
"Date = parseLocalDate(Date)",
"Time = parseLocalTime(Time)",
"Duration = parseDuration(Duration)",
"Period = parsePeriod(Period)",
"TimeZone = parseTimeZone(TimeZone)",
]
)

string_meta = string_date_times.meta_table
converted_meta = converted_date_times.meta_table

upperBin and lowerBin bin timestamps into buckets. They are particularly useful in aggregation operations, as aggregated statistics are commonly computed over temporal buckets.

binned_timestamps = empty_table(60).update(
[
"Timestamp = now() + ii * MINUTE",
# Rounds Timestamp to the last 5 minute mark
"Last5Mins = lowerBin(Timestamp, 5 * MINUTE)",
# Rounds Timestamp to the next 10 minute mark
"Next10Mins = upperBin(Timestamp, 10 * MINUTE)",
"Value = random()",
]
)

last_by_bin = binned_timestamps.last_by("Last5Mins")

The time user guide provides a comprehensive overview of working with date-time data in Deephaven.

These functions provide only a glimpse of what the built-in library offers. There are modules for sorting, searching, string parsing, null handling, and much more. See the document on auto-imported functions for a comprehensive list of what's available or the module summary page for a high-level overview of what's offered.

Java methods

The data structures that underlie Deephaven tables are Java data structures. So, many of the literals, operators, and functions we've spoken about are, at some level, Java. Java objects have methods attached to them that can be called from query strings, unlocking new levels of functionality and efficiency for Deephaven users.

To discover these, use meta_table to inspect a column's underlying data type.

call_methods = empty_table(1).update("Timestamp = '2024-03-03T15:00:00.000 UTC'")

call_methods_meta = call_methods.meta_table

This column is a Java Instant. Java's documentation provides all of the available methods that can be called. Here are just a few.

call_methods = call_methods.update(
[
"SecondsSinceEpoch = Timestamp.getEpochSecond()",
"TimestampInTokyo = Timestamp.atZone('Asia/Tokyo')",
"StringRepresentation = Timestamp.toString()",
]
)

Some basic understanding of how to read Javadocs will help you make the most of these built-in methods.

Additionally, there are several ways to create Java objects for use in query strings. The following example uses (1) the new keyword and (2) the Python jpy package to create new instances of Java's URL class.

import jpy

# Use 'new' keyword to create a new URL object, then call methods
t1 = empty_table(2).update(
[
"URL = new java.net.URL(`https://deephaven.io:` + i)",
"Protocol = URL.getProtocol()",
"Host = URL.getHost()",
"Port = URL.getPort()",
"URI = URL.toURI()",
]
)
m1 = t1.meta_table

# Use 'jpy' to create a new URL object, pass object to query string, then call methods
URL = jpy.get_type("java.net.URL")
url = URL("https://deephaven.io")

t2 = empty_table(1).update(
[
"URL = url",
"Protocol = URL.getProtocol()",
"Host = URL.getHost()",
"Port = URL.getPort()",
"URI = URL.toURI()",
]
)
m2 = t2.meta_table

# Use 'jpy' to create a new URL object, call methods, then pass results to query string
URL = jpy.get_type("java.net.URL")
url = URL("https://deephaven.io")
protocol = url.getProtocol()
host = url.getHost()
port = url.getPort()
uri = url.toURI()

t3 = empty_table(1).update(
[
"URL = url",
"Protocol = protocol",
"Host = host",
"Port = port",
"URI = uri",
]
)
m3 = t3.meta_table

For more information, see the jpy guide.

Arrays

Deephaven tables can have array columns. Array columns often come from a group_by operation.

t = empty_table(10).update(["Group = ii % 2 == 0 ? `A` : `B`", "X = ii"])

t_grouped = t.group_by("Group")

Many built-in functions support array arguments.

t_array_funcs = t_grouped.update(
[
"IsEven = X % 2 == 0",
"PlusOne = X + 1",
"GroupSum = sum(X)",
"GroupAvg = avg(X)",
"NumUnique = countDistinct(X)",
]
)

These results can then be ungrouped with ungroup, which is essentially the inverse of group_by.

t_array_funcs_ungrouped = t_array_funcs.ungroup()

NOTE: Aggregations done with deephaven.agg are more performant than with array functions.

Deephaven provides array indexing and slicing operations.

t_indexed = t_grouped.update(
[
"First = X[0]",
"Third = X[2]",
# Note that .subVector() includes the first arg and does not include the second
"FirstThree = X.subVector(0, 3)",
"MiddleThree = X.subVector(1, 4)",
# Indexing outside the range returns null
"OffTheFront = X[-1]",
"OffTheEnd = X.subVector(3,6)",
]
)

Lastly, the columns of a Deephaven table can be interpreted as arrays.

column_as_array = empty_table(10).update(
[
"X = ii",
# To interpret X as an array, use a trailing underscore
"ArrX = X_",
# Functions that take array inputs can now be used
"SumX = sum(X_)",
"AvgX = avg(X_)",
# Indexing and slicing work just the same
"XMinus2 = X_[ii-2]",
"RollingGroup = X_.subVector(i, i+3)",
"RollingAvg = avg(RollingGroup)",
]
)

This functionality is only supported for static and append-only ticking tables. See working with arrays for more information.

Python in query strings

Python objects can be used in query strings. Variables are the simplest case.

a = 1
b = 2

add_vars = empty_table(1).update("Sum = a + b")

Python functions can also be used.

def my_sum(x, y):
return x + y


a = 1
b = 2

add_vars_func = empty_table(1).update(["Sum1 = my_sum(1, 2)", "Sum2 = my_sum(a, b)"])

So can classes.

class MyMathClass:

def __init__(self, x):
self.x = x

def sum(self, y):
return self.x + y

@classmethod
def class_sum(self, x, y):
return x + y

@staticmethod
def static_sum(x, y):
return x + y


a = 1
b = 2
class_instance = MyMathClass(a)

add_vars_class = empty_table(1).update(
[
"Sum1 = class_instance.sum(b)",
"Sum2 = MyMathClass.class_sum(a, b)",
"Sum3 = MyMathClass.static_sum(a, b)",
]
)

Without any typecasts or type hints, the Deephaven query engine cannot infer what datatype results from a Python function. It stores the result as a Java PyObject or Object.

add_vars_meta = add_vars.meta_table
add_vars_func_meta = add_vars_func.meta_table
add_vars_class_meta = add_vars_class.meta_table

This isn't ideal, as neither of these data types support many of the DQL features we've covered. To rectify this, Python functions should utilize type hints. The engine will infer the correct column types for functions that use type hints. Class methods don't support type hints yet, so a typecast in the query string is required.

from deephaven import empty_table


def my_sum(x, y) -> int:
return x + y


class MyMathClass:

def __init__(self, x):
self.x = x

def sum(self, y) -> int:
return self.x + y

@classmethod
def class_sum(self, x, y) -> int:
return x + y

@staticmethod
def static_sum(x, y) -> int:
return x + y


a = 1
b = 2
class_instance = MyMathClass(a)

add_vars_func = empty_table(1).update(["Sum1 = my_sum(1, 2)", "Sum2 = my_sum(a, b)"])

# Note the (int) casts
add_vars_class = empty_table(1).update(
[
"Sum1 = (int) class_instance.sum(b)",
"Sum2 = (int) MyMathClass.class_sum(a, b)",
"Sum3 = (int) MyMathClass.static_sum(a, b)",
]
)

add_vars_func_meta = add_vars_func.meta_table
add_vars_class_meta = add_vars_class.meta_table

To learn more about using Python in query strings, see the user guides on functions and classes.

Scoping in Deephaven follows Python's LEGB scoping rules. Functions that return tables or otherwise make use of query strings should pay careful attention to scoping details.

def f(a, b) -> int:
return a * b


def compute(source, a):
return source.update("X = f(a, A)")


source = empty_table(5).update("A = i")

result1 = compute(source, 10)
result2 = compute(source, 3)

For more information, see the scoping user guide.

Be mindful of whether or not Python functions are stateless or stateful. Generally, stateless functions have no side effects - they don't modify any objects outside of their scope. Also, they are invariant to execution order, so function calls can be evaluated in any order without affecting the result. This stateless function extracts elements from a list in a query string.

my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


def get_element_stateless(idx) -> int:
return my_list[idx]


t_stateless = empty_table(10).update("X = get_element_stateless(ii)")

get_element is stateless because it does not modify any objects outside its local scope. It could be evaluated in any order and give the same result.

Stateful functions modify objects outside their local scope - they do not leave the world as they found it. They also may depend on execution order. This stateful function achieves the same resulting table.

my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
idx = 0


def get_element_stateful() -> int:
global idx
idx += 1 # This modifies idx!
return my_list[idx - 1]


t_stateful = empty_table(10).update("X = get_element_stateful()")

Print idx to verify it's been changed.

print(idx)

Now that get_element is stateful, it must be evaluated in the correct order to give the correct result.

Queries should use stateless functions whenever possible because:

  • They minimize side effects when called.
  • They are deterministic.
  • They can be efficiently parallelized.