User-defined functions
User-defined functions in Python can be used in Deephaven queries in several ways. This guide explains when and how they are used in Deephaven.
In Python, a user-defined function is defined using the def
keyword. User-defined functions can take an arbitrary number of input parameters, and have any arbitrary amount of return values. The function f
below returns the sum of its two input values.
def f(a, b):
return a + b
User-defined functions in table operations
User-defined functions can be used in query strings just like any method would. The query below uses the function f
in a table operation.
from deephaven import empty_table
source = empty_table(10).update(["X = i", "Y = 2 * i", "Z = f(X, Y)"])
- source
When using user-defined Python functions in table operations, there are several factors to consider. These are explored in each subsection below.
Returned data type
The Deephaven query engine is implemented largely in Java. Since these are Python functions, that means that Java is calling Python code. Unless the query engine is given some information about the returned data type, it will store the return value as a Java Object
or a jpy PyObject
.
Look at the output type of the function f
, called above.
source_meta = source.meta_table
- source_meta
Object
columns and PyObject
columns are incompatible with a large number of operations and are generally slow.
There are two ways to avoid these column types: type hints and typecasts.
See data types in Python and Java for more information on this topic.
Type hints
Type hints in Python tell the compiler what data types to expect as input and output. Type hints in input arguments are denoted by the colon character (:
) after the variable name. For function outputs, they are denoted by the use of ->
. The following example uses type hints in the function f
, which causes the update
operation that calls it to return a proper data type.
from deephaven import empty_table
def f(a: int, b: int) -> int:
return a + b
source = empty_table(10).update(["X = i", "Y = 2 * i", "Z = f(X, Y)"])
source_meta = source.meta_table
- source_meta
- source
Deephaven supports the following data types in type hints:
- Python
bool
and NumPynp.bool_
to Java primitiveboolean
- Python
int
and NumPynp.int_
to Java primitivelong
- Python
float
and NumPynp.double
to Java primitivedouble
- NumPy
np.intc
to Java primitiveint
- NumPy
np.single
to Java primitivefloat
- NumPy
np.byte
to Java primitivebyte
- NumPy
np.short
to Java primitiveshort
- Python
str
and NumPynp.str_
to Javajava.lang.String
- Python
datetime.datetime
, Pandaspd.Timestamp
, andnp.datetime64
to Javajava.time.Instant
- Python
Sequence
and NumPyndarray
to Java arrays - Python objects to Java
java.lang.Objects
Typecasts
Alternatively, a typecast can be used in place of a type hint. A typecast precedes a function call in the query language using parentheses that enclose the data type:
from deephaven import empty_table
def f(a, b):
return a + b
source = empty_table(10).update(["X = i", "Y = 2 * i", "Z = (int)f(X, Y)"])
source_meta = source.meta_table
- source_meta
- source
Query language methods
The second thing to consider when calling a user-defined function from the query language is:
- Can I perform the same operation using a built-in query language method?
For instance, the function sine
is called in the query below:
from deephaven import empty_table
import numpy as np
def sine(x):
return np.sin(x)
source = empty_table(100).update(["X = 0.1 * i", "Y = sine(X)"])
- source
Deephaven has a built-in sin
method that can be used from the query language with no imports. It will be faster than any Python method in a table operation, and will return the correct data type.
from deephaven import empty_table
source = empty_table(100).update(["X = 0.1 * i", "Y = sin(X)"])
- source
Deephaven has a large number of built-in methods that can be called from query strings without any imports or classpaths. For more information and a complete list of what's available, see auto-imported functions in Deephaven.
Multiple return values
Functions in Python can return an arbitrary number of values. The following function returns both the sum and the difference of the two input values.
def g(a, b):
return a + b, a - b
While this function can be called from Python without any problem, it will not work in a Deephaven table operation. The query engine is implemented largely in Java, which does not automatically unpack output values from functions.
Do not call functions with multiple outputs in table operations.
Instead of returning multiple values from a Python function, wrap the returned values in a single iterable such as a list or a NumPy array and unpack them manually in a query string.
Passing tables to functions
Python functions can take tables as input and return tables as output. Some Deephaven operations specifically require this usage pattern.
The following example uses the function do_agg
to perform multiple aggregations on a table.
from deephaven.agg import sum_, avg
from deephaven import empty_table
source = empty_table(20).update(
["Letter = (i % 2 == 0) ? `A` : `B`", "X = i", "Y = randomDouble(0.0, 10.0)"]
)
def do_agg(t):
return t.agg_by(
aggs=[sum_(cols=["SumX = X", "SumY = Y"]), avg(cols=["AvgX = X", "AvgY = Y"])],
by="Letter",
)
result = do_agg(source)
- result
- source
Deephaven tables are immutable. A function that modifies a table will have no effect unless a table is returned from it.
Partitioned tables
Some partitioned table operations require the use of user-defined Python functions that modify tables:
Partitioned tables are beyond the scope of this guide. See the links above for more information on user-defined functions and partitioned tables.
Import modules with user-defined functions
It's common in Python to use custom user-defined functions from a separate Python file or module. There are two ways to import these files:
When running Deephaven from Docker, the path must be visible inside of the Docker container. For more information, see Docker volumes.
The following example appends /data/storage/modules
to sys.path
so that Python files in it can be imported.
import sys
sys.path.append("/data/storage/modules")