Handle PyObjects in tables
This guide provides a comprehensive overview of the org.jpy.PyObject
data type, including its appearance in tables, its uses, limitations, and best practices.
This data type most commonly arises from Python functions called in query strings without type hints or type casts.
Since these objects can hold any arbitrary Python object, the Deephaven engine can infer very little about them. Thus, the supported operations on them are limited. Additionally, these columns are less performant than Java primitive columns. Despite this, there are still situations where these column types are useful. They are outlined below.
This data type is generally referred to as PyObject
for the remainder of this guide for brevity.
What is a PyObject?
A PyObject
is a Java wrapper around an arbitrary Python object. It is a product of jpy, the bidirectional Python-Java bridge used by Deephaven to facilitate calling Python from query strings. It is a highly flexible data type because it can hold many different Python data types. This flexibility comes at the cost of compatibility and speed.
Consider the following example, which calls three Python functions in query strings without any type hints. They return an int
, list
, and dict
, respectively.
Warning
It is best practice to use type hints in Python functions, especially those called in query strings.
from deephaven import empty_table
def func_return_scalar():
return 3
def func_return_list():
return [1, 2, 3]
def func_return_dict():
return {"A": 1, "B": 2, "C": 3}
source = empty_table(1).update(
["A = func_return_scalar()", "B = func_return_list()", "C = func_return_dict()"]
)
source_meta = source.meta_table
The source_meta
table shows that all three columns are PyObject
columns. Because all three functions lack type hints, the query engine does not have sufficient information to safely infer the returned data type. Therefore, the safe option is to return the PyObject
type.
When to use PyObjects
PyObject
columns have limitations, but are useful in some cases. They are best used when:
- The Python function performs operations not supported by any built-in query language functions.
- A Python function returns a data type with no Java analogue.
- A Python function may return different data types dependent on the input.
The following example demonstrates using Python's tuple data type, which has no direct Java equivalent. Thus, PyObject
columns are useful because they make calling these functions in query strings simple:
from typing import Tuple
from deephaven import empty_table
def func_tuple_typehint() -> Tuple[int, int, int]:
return 1, 2, 3
def func_tuple_no_typehint():
return 1, 2, 3
def func_sum(x) -> int:
return sum(x)
source = empty_table(1).update(
[
"X1 = func_tuple_typehint()",
"Y1 = func_sum(X1)",
"X2 = func_tuple_no_typehint()",
"Y2 = func_sum(X2)",
]
)
source_meta = source.meta_table
Limitations of PyObject columns
Compatibility
Many of Deephaven's built-in query language functions do not support PyObject
columns. Methods that take java.lang.Object
as input support PyObject
columns, since PyObject
extends Object
.
The following code attempts to call the built-in sin
function on a PyObject
column, but fails, since there is no overloaded method with that name that takes the proper data type as input:
from deephaven import empty_table
def func_without_type_hint(value):
return float(value)
source = empty_table(40).update("X = func_without_type_hint(ii)")
result = source.update("Y = sin(X)")
The stack trace contains the following line:
Value: table update operation failed. : Cannot find method sin(org.jpy.PyObject)
There is no method that can handle an input data type of PyObject
. Therefore, the query engine raises an error.
Memory management
Because PyObject
is a boxed type, columns of this type are less performant than Java primitive columns.
Performance
PyObject
columns are less performant than Java primitive columns for a couple of reasons:
- They are boxed types, which means that they require additional memory to store the data.
- Their use cases are limited almost entirely to Python functions in query strings, which must cross the Python-Java boundary.
How to avoid PyObject columns
For better performance and type safety, minimize the use of PyObject columns in your tables. These columns are only necessary when you specifically need to work with complex Python objects that don't have Java equivalents.
To prevent automatic PyObject creation when using Python functions in query strings:
- Add type hints to your functions: This allows Deephaven to convert return values to appropriate Java types
- Use explicit type casts: When type hints aren't possible, use Java-style casts like
(int)
or(String)
The following code block uses both to cast columns to appropriate Java primitive types:
from deephaven import empty_table
def my_func(x: float) -> float:
return x**0.5
def my_func_without_type_hint(x):
return x**0.5
source = empty_table(10).update(
[
"X = 0.1 * ii",
"Y = my_func(X)",
"Z = (double)my_func_without_type_hint(X)",
]
)
source_meta = source.meta_table