Handle PyObjects in tables

This guide provides a comprehensive overview of the org.jpy.PyObject data type, including its appearance in tables, its uses, limitations, and best practices.

This data type most commonly arises from Python functions called in query strings without type hints or type casts.

Since these objects can hold any arbitrary Python object, the Deephaven engine can infer very little about them. Thus, the supported operations on them are limited. Additionally, these columns are less performant than Java primitive columns. Despite this, there are still situations where these column types are useful. They are outlined below.

This data type is generally referred to as PyObject for the remainder of this guide for brevity.

What is a PyObject?

A PyObject is a Java wrapper around an arbitrary Python object. It is a product of jpy, the bidirectional Python-Java bridge used by Deephaven to facilitate calling Python from query strings. It is a highly flexible data type because it can hold many different Python data types. This flexibility comes at the cost of compatibility and speed.

Consider the following example, which calls three Python functions in query strings without any type hints. They return an int, list, and dict, respectively.

Warning

It is best practice to use type hints in Python functions, especially those called in query strings.

from deephaven import empty_table


def func_return_scalar():
    return 3


def func_return_list():
    return [1, 2, 3]


def func_return_dict():
    return {"A": 1, "B": 2, "C": 3}


source = empty_table(1).update(
    ["A = func_return_scalar()", "B = func_return_list()", "C = func_return_dict()"]
)
source_meta = source.meta_table

The source_meta table shows that all three columns are PyObject columns. Because all three functions lack type hints, the query engine does not have sufficient information to safely infer the returned data type. Therefore, the safe option is to return the PyObject type.

When to use PyObjects

PyObject columns have limitations, but are useful in some cases. They are best used when:

  • The Python function performs operations not supported by any built-in query language functions.
  • A Python function returns a data type with no Java analogue.
  • A Python function may return different data types dependent on the input.

The following example demonstrates using Python's tuple data type, which has no direct Java equivalent. Thus, PyObject columns are useful because they make calling these functions in query strings simple:

from typing import Tuple
from deephaven import empty_table


def func_tuple_typehint() -> Tuple[int, int, int]:
    return 1, 2, 3


def func_tuple_no_typehint():
    return 1, 2, 3


def func_sum(x) -> int:
    return sum(x)


source = empty_table(1).update(
    [
        "X1 = func_tuple_typehint()",
        "Y1 = func_sum(X1)",
        "X2 = func_tuple_no_typehint()",
        "Y2 = func_sum(X2)",
    ]
)
source_meta = source.meta_table

Limitations of PyObject columns

Compatibility

Many of Deephaven's built-in query language functions do not support PyObject columns. Methods that take java.lang.Object as input support PyObject columns, since PyObject extends Object.

The following code attempts to call the built-in sin function on a PyObject column, but fails, since there is no overloaded method with that name that takes the proper data type as input:

from deephaven import empty_table


def func_without_type_hint(value):
    return float(value)


source = empty_table(40).update("X = func_without_type_hint(ii)")
result = source.update("Y = sin(X)")

The stack trace contains the following line:

Value: table update operation failed. : Cannot find method sin(org.jpy.PyObject)

There is no method that can handle an input data type of PyObject. Therefore, the query engine raises an error.

Memory management

Because PyObject is a boxed type, columns of this type are less performant than Java primitive columns.

Performance

PyObject columns are less performant than Java primitive columns for a couple of reasons:

How to avoid PyObject columns

For better performance and type safety, minimize the use of PyObject columns in your tables. These columns are only necessary when you specifically need to work with complex Python objects that don't have Java equivalents.

To prevent automatic PyObject creation when using Python functions in query strings:

  1. Add type hints to your functions: This allows Deephaven to convert return values to appropriate Java types
  2. Use explicit type casts: When type hints aren't possible, use Java-style casts like (int) or (String)

The following code block uses both to cast columns to appropriate Java primitive types:

from deephaven import empty_table


def my_func(x: float) -> float:
    return x**0.5


def my_func_without_type_hint(x):
    return x**0.5


source = empty_table(10).update(
    [
        "X = 0.1 * ii",
        "Y = my_func(X)",
        "Z = (double)my_func_without_type_hint(X)",
    ]
)
source_meta = source.meta_table