Deephaven's Query Strings
Deephaven query strings are the primary way of expressing commands directly to the Deephaven engine. They translate the user's intention into compiled code that the engine can execute. These query strings can contain a mix of Java and Python code and are the entry point to a universe of powerful built-in tools and Python-Java interoperability.
Syntax
Query strings are just Python strings that get passed into table operations. Deephaven highly recommends using double quotes to encapsulate query strings.
from deephaven import empty_table
t = empty_table(10).update("NewColumn = 1")
- t
Here, the query string NewColumn = 1
defines a formula for the engine to execute, and the update
table operation understands that formula as a recipe for creating a new column called NewColumn
that contains the value of 1
in each row.
Literals
Query strings often use literals. How the engine interprets a literal depends on how it's written in the query string.
- Literals not encapsulated by any special characters are interpreted as booleans, numeric values, column names, or variables.
- Literals encapsulated in backticks (
`
) are interpreted as strings. - Literals encapsulated in single quotes (
'
) are interpreted as date-time values.
literals = empty_table(10).update(
[
# Booleans are represented by 'true' and 'false'
"BooleanCol = true",
# 32b integers are represented by integers such as '1'
"IntCol = 1",
# 64b integers are represented by integers with an 'L' suffix such as '10L'
"LongCol = 10L",
# 32b floating point numbers are represented by numbers with an 'f' suffix such as '1.2345f'
"FloatCol = 1.2345f",
# 64b floating point numbers are represented by numbers with a decimal point such as '2.3456'
"DoubleCol = 2.3456",
# Strings must be enclosed with backticks
"StringCol = `Hello!`",
# Date-time values must be enclosed with single quotes
"DateCol = '2020-01-01'",
"TimeCol = '12:30:45.000'",
"DateTimeCol = '2020-01-01T12:30:45.000Z'",
# Durations must be enclosed with single quotes and have a leading PT
"DurationCol = 'PT1m'",
# For object types, 'null' is the literal for null values
"NullCol = null",
]
)
- literals
The meta_table
attribute is useful for assessing a table's schema. You can use it to confirm that the resulting columns are of the correct type.
literals_meta = literals.meta_table
- literals_meta
Special variables and constants
Deephaven provides several special variables and constants. The most commonly used of these are i
and ii
. They represent the row indices as 32-bit and 64-bit integers, int
and long
, respectively.
special_vars = empty_table(10).update(["IdxInt = i", "IdxLong = ii"])
special_meta = special_vars.meta_table
- special_meta
- special_vars
NOTE: The special variables
i
andii
can only be used in append-only tables.
Additionally, Deephaven provides a range of common constants that can be accessed from query strings. These constants are always written with snake case in capital letters. They include minimum and maximum values for various data types, conversion factors for time types, and more. Of particular interest are the null constants for primitive types.
null_values = empty_table(1).update(
[
"StandardNull = null",
"BoolNull = NULL_BOOLEAN",
"CharNull = NULL_CHAR",
"ByteNull = NULL_BYTE",
"ShortNull = NULL_SHORT",
"IntNull = NULL_INT",
"LongNull = NULL_LONG",
"FloatNull = NULL_FLOAT",
"DoubleNull = NULL_DOUBLE",
]
)
null_values_meta = null_values.meta_table
- null_values_meta
- null_values
These are useful for representing and handling null values of a specific type. Built-in query language functions handle null values. For example, sqrt(NULL_DOUBLE)
returns NULL_DOUBLE
. Custom functions need to handle null values appropriately.
Common operations
Numeric types support mathematical operations such as +
, -
, *
, /
, and %
.
math_ops = empty_table(10).update(
[
"Col1 = ii",
# Math operations work between columns and literals
"Col2 = 10 * (Col1 + 1.2345) / 4",
# And they work between columns
"Col3 = (Col1 - Col2) % 6",
]
)
- math_ops
String concatenation is also supported via the +
operator.
add_strings = empty_table(10).update("StringCol = `Hello ` + `there!`")
- add_strings
The +
and -
operators are defined for date-time types, making arithmetic on timestamp data easy.
time_ops = empty_table(10).update(
[
# Times in nanoseconds can be added or subtracted from date-times
"Timestamp = '2021-07-11T12:00:00.000Z' + (ii * HOUR)",
# Durations or Periods can be added or subtracted from date-times
"TimestampPlusOneYear = Timestamp + 'P365d'",
"TimestampMinusOneHour = Timestamp - 'PT1h'",
# Timestamps can be subtracted to get their difference in nanoseconds
# Use constants for unit conversion
"DaysBetween = ('2023-01-02T20:00:00.000Z' - Timestamp) / DAY",
]
)
- time_ops
Logical operations, expressions, and comparison operators are supported.
logical_ops = empty_table(10).update(
[
"IdxCol = ii",
"IsEven = IdxCol % 2 == 0",
"IsDivisibleBy6 = IsEven && (IdxCol % 3 == 0)",
"IsGreaterThan6 = IdxCol > 6",
]
)
- logical_ops
Deephaven provides an inline conditional operator (ternary-if) that makes writing conditional expressions compact and efficient.
conditional = empty_table(10).update(
[
# Read as '<condition> ? <if-true> : <if-false>'
"Parity = ii % 2 == 0 ? `Even!` : `Odd...`",
# Any logical expression is a valid condition
"IsDivisibleBy6 = ((ii % 2 == 0) && (ii % 3 == 0)) ? true : false",
# In-line conditionals can be chained together
"RandomNumber = randomGaussian(0.0, 1.0)",
"Score = RandomNumber < -1.282 ? `Bottom 10%` : RandomNumber > 1.282 ? `Top 10%` : `Middle of the pack`",
]
)
- conditional
In Deephaven, typecasting is easy.
typecasted = empty_table(10).update(
[
"LongCol = ii",
# Use (type) to cast a column to the desired type
"IntCol = (int) LongCol",
"FloatCol = (float) LongCol",
# Or, use a cast as an intermediate step
"IntSquare = i * (int) LongCol",
]
)
- typecasted
There are many more such operators supported in Deephaven. See the guide on operators to learn more.
Built-in functions
Aside from the common operations, Deephaven hosts a large library of functions known as built-in or auto-imported functions that can be used in query strings.
The numeric subset of this library is full of functions that perform common mathematical operations on numeric types. These include exponentials, trigonometric functions, random number generators, and more.
math_functions = empty_table(10).update(
[
# Each function can be applied to many different types
"Exponential = exp(ii)",
"NatLog = log(i)",
"Square = pow((float)ii, 2)",
"Sqrt = sqrt((double)i)",
"Trig = sin(ii)",
# Functions can be composed
"AbsVal = abs(cos(ii))",
"RandInt = randomInt(0, 5)",
"RandNormal = randomGaussian(0, 1)",
]
)
- math_functions
These functions can be combined with previously discussed literals and operators to generate complex expressions.
fake_data = empty_table(100).update(
[
"Group = randomInt(1, 4)",
"GroupIntercept = Group == 1 ? 0 : Group == 2 ? 5 : 10",
"GroupSlope = abs(GroupIntercept * randomGaussian(0, 1))",
"GroupVariance = pow(sin(Group), 2)",
"Data = GroupIntercept + GroupSlope * ii + randomGaussian(0.0, GroupVariance)",
]
)
- fake_data
The built-in library also contains many functions for working with date-time values.
time_functions = empty_table(10).update(
[
# The now() function provides the current time according to the Deephaven clock
"CurrentTime = now()",
"Timestamp = CurrentTime + (ii * DAY)",
# Many time functions require timezone information, typically provided as literals
"DayOfWeek = dayOfWeek(Timestamp, 'ET')",
"DayOfMonth = dayOfMonth(Timestamp, 'ET')",
"Weekend = DayOfWeek == 6 || DayOfWeek == 7 ? true : false",
"SecondsSinceY2K = nanosToSeconds(Timestamp - '2000-01-01T00:00:00 ET')",
]
)
- time_functions
Functions that begin with parse
are useful for converting strings to date-time types.
string_date_times = empty_table(10).update(
[
"DateTime = `2022-04-03T19:34:22.000 UTC`",
"Date = `2013-07-07`",
"Time = `14:04:39.123`",
"Duration = `PT6m`",
"Period = `P10d`",
"TimeZone = `America/Chicago`",
]
)
converted_date_times = string_date_times.update(
[
"DateTime = parseInstant(DateTime)",
"Date = parseLocalDate(Date)",
"Time = parseLocalTime(Time)",
"Duration = parseDuration(Duration)",
"Period = parsePeriod(Period)",
"TimeZone = parseTimeZone(TimeZone)",
]
)
string_meta = string_date_times.meta_table
converted_meta = converted_date_times.meta_table
- string_meta
- string_date_times
- converted_meta
- converted_date_times
upperBin
and lowerBin
bin timestamps into buckets. They are particularly useful in aggregation operations, as aggregated statistics are commonly computed over temporal buckets.
binned_timestamps = empty_table(60).update(
[
"Timestamp = now() + ii * MINUTE",
# Rounds Timestamp to the last 5 minute mark
"Last5Mins = lowerBin(Timestamp, 5 * MINUTE)",
# Rounds Timestamp to the next 10 minute mark
"Next10Mins = upperBin(Timestamp, 10 * MINUTE)",
"Value = random()",
]
)
last_by_bin = binned_timestamps.last_by("Last5Mins")
- binned_timestamps
- last_by_bin
The time user guide provides a comprehensive overview of working with date-time data in Deephaven.
These functions provide only a glimpse of what the built-in library offers. There are modules for sorting, searching, string parsing, null handling, and much more. See the document on auto-imported functions for a comprehensive list of what's available or the module summary page for a high-level overview of what's offered.
Java methods
The data structures that underlie Deephaven tables are Java data structures. So, many of the literals, operators, and functions we've spoken about are, at some level, Java. Java objects have methods attached to them that can be called from query strings, unlocking new levels of functionality and efficiency for Deephaven users.
To discover these, use meta_table
to inspect a column's underlying data type.
call_methods = empty_table(1).update("Timestamp = '2024-03-03T15:00:00.000 UTC'")
call_methods_meta = call_methods.meta_table
- call_methods_meta
- call_methods
This column is a Java Instant. Java's documentation provides all of the available methods that can be called. Here are just a few.
call_methods = call_methods.update(
[
"SecondsSinceEpoch = Timestamp.getEpochSecond()",
"TimestampInTokyo = Timestamp.atZone('Asia/Tokyo')",
"StringRepresentation = Timestamp.toString()",
]
)
- call_methods
Some basic understanding of how to read Javadocs will help you make the most of these built-in methods.
Additionally, there are several ways to create Java objects for use in query strings. The following example uses (1) the new
keyword and (2) the Python jpy package to create new instances of Java's URL class.
import jpy
# Use 'new' keyword to create a new URL object, then call methods
t1 = empty_table(2).update(
[
"URL = new java.net.URL(`https://deephaven.io:` + i)",
"Protocol = URL.getProtocol()",
"Host = URL.getHost()",
"Port = URL.getPort()",
"URI = URL.toURI()",
]
)
m1 = t1.meta_table
# Use 'jpy' to create a new URL object, pass object to query string, then call methods
URL = jpy.get_type("java.net.URL")
url = URL("https://deephaven.io")
t2 = empty_table(1).update(
[
"URL = url",
"Protocol = URL.getProtocol()",
"Host = URL.getHost()",
"Port = URL.getPort()",
"URI = URL.toURI()",
]
)
m2 = t2.meta_table
# Use 'jpy' to create a new URL object, call methods, then pass results to query string
URL = jpy.get_type("java.net.URL")
url = URL("https://deephaven.io")
protocol = url.getProtocol()
host = url.getHost()
port = url.getPort()
uri = url.toURI()
t3 = empty_table(1).update(
[
"URL = url",
"Protocol = protocol",
"Host = host",
"Port = port",
"URI = uri",
]
)
m3 = t3.meta_table
- t1
- m1
- t2
- m2
- t3
- m3
For more information, see the jpy guide.
Arrays
Deephaven tables can have array columns. Array columns often come from a group_by
operation.
t = empty_table(10).update(["Group = ii % 2 == 0 ? `A` : `B`", "X = ii"])
t_grouped = t.group_by("Group")
- t
- t_grouped
Many built-in functions support array arguments.
t_array_funcs = t_grouped.update(
[
"IsEven = X % 2 == 0",
"PlusOne = X + 1",
"GroupSum = sum(X)",
"GroupAvg = avg(X)",
"NumUnique = countDistinct(X)",
]
)
- t_array_funcs
These results can then be ungrouped with ungroup
, which is essentially the inverse of group_by
.
t_array_funcs_ungrouped = t_array_funcs.ungroup()
- t_array_funcs_ungrouped
NOTE: Aggregations done with
deephaven.agg
are more performant than with array functions.
Deephaven provides array indexing and slicing operations.
t_indexed = t_grouped.update(
[
"First = X[0]",
"Third = X[2]",
# Note that .subVector() includes the first arg and does not include the second
"FirstThree = X.subVector(0, 3)",
"MiddleThree = X.subVector(1, 4)",
# Indexing outside the range returns null
"OffTheFront = X[-1]",
"OffTheEnd = X.subVector(3,6)",
]
)
- t_indexed
Lastly, the columns of a Deephaven table can be interpreted as arrays.
column_as_array = empty_table(10).update(
[
"X = ii",
# To interpret X as an array, use a trailing underscore
"ArrX = X_",
# Functions that take array inputs can now be used
"SumX = sum(X_)",
"AvgX = avg(X_)",
# Indexing and slicing work just the same
"XMinus2 = X_[ii-2]",
"RollingGroup = X_.subVector(i, i+3)",
"RollingAvg = avg(RollingGroup)",
]
)
- column_as_array
This functionality is only supported for static and append-only ticking tables. See working with arrays for more information.
Python in query strings
Python objects can be used in query strings. Variables are the simplest case.
a = 1
b = 2
add_vars = empty_table(1).update("Sum = a + b")
- add_vars
Python functions can also be used.
def my_sum(x, y):
return x + y
a = 1
b = 2
add_vars_func = empty_table(1).update(["Sum1 = my_sum(1, 2)", "Sum2 = my_sum(a, b)"])
- add_vars_func
So can classes.
class MyMathClass:
def __init__(self, x):
self.x = x
def sum(self, y):
return self.x + y
@classmethod
def class_sum(self, x, y):
return x + y
@staticmethod
def static_sum(x, y):
return x + y
a = 1
b = 2
class_instance = MyMathClass(a)
add_vars_class = empty_table(1).update(
[
"Sum1 = class_instance.sum(b)",
"Sum2 = MyMathClass.class_sum(a, b)",
"Sum3 = MyMathClass.static_sum(a, b)",
]
)
- add_vars_class
Without any typecasts or type hints, the Deephaven query engine cannot infer what datatype results from a Python function. It stores the result as a Java PyObject or Object.
add_vars_meta = add_vars.meta_table
add_vars_func_meta = add_vars_func.meta_table
add_vars_class_meta = add_vars_class.meta_table
- add_vars_meta
- add_vars_func_meta
- add_vars_class_meta
This isn't ideal, as neither of these data types support many of the DQL features we've covered. To rectify this, Python functions should utilize type hints. The engine will infer the correct column types for functions that use type hints. Class methods don't support type hints yet, so a typecast in the query string is required.
from deephaven import empty_table
def my_sum(x, y) -> int:
return x + y
class MyMathClass:
def __init__(self, x):
self.x = x
def sum(self, y) -> int:
return self.x + y
@classmethod
def class_sum(self, x, y) -> int:
return x + y
@staticmethod
def static_sum(x, y) -> int:
return x + y
a = 1
b = 2
class_instance = MyMathClass(a)
add_vars_func = empty_table(1).update(["Sum1 = my_sum(1, 2)", "Sum2 = my_sum(a, b)"])
# Note the (int) casts
add_vars_class = empty_table(1).update(
[
"Sum1 = (int) class_instance.sum(b)",
"Sum2 = (int) MyMathClass.class_sum(a, b)",
"Sum3 = (int) MyMathClass.static_sum(a, b)",
]
)
add_vars_func_meta = add_vars_func.meta_table
add_vars_class_meta = add_vars_class.meta_table
- add_vars_func_meta
- add_vars_func
- add_vars_class_meta
- add_vars_class
To learn more about using Python in query strings, see the user guides on functions and classes.
Scoping in Deephaven follows Python's LEGB scoping rules. Functions that return tables or otherwise make use of query strings should pay careful attention to scoping details.
def f(a, b) -> int:
return a * b
def compute(source, a):
return source.update("X = f(a, A)")
source = empty_table(5).update("A = i")
result1 = compute(source, 10)
result2 = compute(source, 3)
- source
- result1
- result2
For more information, see the scoping user guide.
Be mindful of whether or not Python functions are stateless or stateful. Generally, stateless functions have no side effects - they don't modify any objects outside of their scope. Also, they are invariant to execution order, so function calls can be evaluated in any order without affecting the result. This stateless function extracts elements from a list in a query string.
my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
def get_element_stateless(idx) -> int:
return my_list[idx]
t_stateless = empty_table(10).update("X = get_element_stateless(ii)")
- t_stateless
get_element
is stateless because it does not modify any objects outside its local scope. It could be evaluated in any order and give the same result.
Stateful functions modify objects outside their local scope - they do not leave the world as they found it. They also may depend on execution order. This stateful function achieves the same resulting table.
my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
idx = 0
def get_element_stateful() -> int:
global idx
idx += 1 # This modifies idx!
return my_list[idx - 1]
t_stateful = empty_table(10).update("X = get_element_stateful()")
- t_stateful
Print idx
to verify it's been changed.
print(idx)
- Log
Now that get_element
is stateful, it must be evaluated in the correct order to give the correct result.
Queries should use stateless functions whenever possible because:
- They minimize side effects when called.
- They are deterministic.
- They can be efficiently parallelized.