Data types in Deephaven and Python

For performance reasons, the Deephaven engine is implemented in Java. As such, Deephaven tables use Java data types for columns. These include both Java primitive types and Java objects.

Deephaven Python queries combine the power of Python with Deephaven's tables. This mixing of Python and Java has some important ramifications for data types. A conceptual understanding of how Python and Java types relate is critical to writing effective queries.

Python data types

In Python, everything is an object. That includes even scalar types like int and float. This means that all Python objects have additional properties and methods. This is different from Java, which has primitive data types as well as objects (also known as boxed types). The mapping of Python data types to Java data types is given in the Java data types section.

Java data types

Primitive types

Java primitive types are tuned to physical hardware and are fixed sizes, unlike Python data types. This is the main reason they are preferred in tables - they are faster and more memory efficient.

The following table shows the mapping between Java primitive types, Java primitive type sizes, Python types, and NumPy types:

Java primitive typeJava primitive type sizePython typeNumPy type
boolean1 byteboolnp.bool_
byte1 byteN/Anp.byte
short2 bytesN/Anp.short
int4 bytesN/Anp.intc
long8 bytesintnp.int_
float4 bytesN/Anp.single
double8 bytesfloatnp.double
char2 bytesN/AN/A

Object types

Object types are data types that cannot be represented as primitive data types. This means they consume more memory and are generally less performant than their primitive counterparts. The concept is similar in both Java and Python.

Some object types are commonly used despite the overhead. For instance, java.lang.String is the best way to store text in tables.

The following table shows a mapping between some of the most commonly used Java object types and their Python and NumPy equivalents:

Array types

In Python, sequences are the overarching term used to define iterable, ordered collections of data. In Deephaven tables, there are two commonly used array types:

PyObjects

org.jpy.PyObject is a Java wrapper around arbitrary Python objects used by Deephaven's jpy bridge. They typically appear when Python functions without type hints or type casts are called in query strings.

While flexible enough to represent any Python object (including those without Java equivalents, like tuples or dictionaries), PyObjects have significant limitations:

  • Limited compatibility with Deephaven's built-in functions
  • Reduced performance compared to Java primitive types
  • Higher memory usage due to their boxed nature

Use PyObjects only when working with complex Python objects that have no Java equivalent. Otherwise, prefer type hints or explicit type casts to ensure better performance and type safety.

Data type conversions

There are two ways to ensure that columns in tables are of the appropriate type: type hints and type casts. The following code block shows a simple example of both:

from deephaven import empty_table


def func_without_type_hint(value):
    return value**0.5


def func_with_type_hint(value: int) -> float:
    return value**0.5


source = empty_table(10).update(
    [
        "Typecast = (double)func_without_type_hint(ii)",
        "TypeHint = func_with_type_hint(ii)",
    ]
)
source_meta = source.meta_table