Data types in Deephaven and Python
For performance reasons, the Deephaven engine is implemented in Java. As such, Deephaven tables use Java data types for columns. These include both Java primitive types and Java objects.
Deephaven Python queries combine the power of Python with Deephaven's tables. This mixing of Python and Java has some important ramifications for data types. A conceptual understanding of how Python and Java types relate is critical to writing effective queries.
Python data types
In Python, everything is an object. That includes even scalar types like int
and float
. This means that all Python objects have additional properties and methods. This is different from Java, which has primitive data types as well as objects (also known as boxed types). The mapping of Python data types to Java data types is given in the Java data types section.
Java data types
Primitive types
Java primitive types are tuned to physical hardware and are fixed sizes, unlike Python data types. This is the main reason they are preferred in tables - they are faster and more memory efficient.
The following table shows the mapping between Java primitive types, Java primitive type sizes, Python types, and NumPy types:
Java primitive type | Java primitive type size | Python type | NumPy type |
---|---|---|---|
boolean | 1 byte | bool | np.bool_ |
byte | 1 byte | N/A | np.byte |
short | 2 bytes | N/A | np.short |
int | 4 bytes | N/A | np.intc |
long | 8 bytes | int | np.int_ |
float | 4 bytes | N/A | np.single |
double | 8 bytes | float | np.double |
char | 2 bytes | N/A | N/A |
Object types
Object types are data types that cannot be represented as primitive data types. This means they consume more memory and are generally less performant than their primitive counterparts. The concept is similar in both Java and Python.
Some object types are commonly used despite the overhead. For instance, java.lang.String
is the best way to store text in tables.
The following table shows a mapping between some of the most commonly used Java object types and their Python and NumPy equivalents:
Java object type | Python type | NumPy type |
---|---|---|
java.lang.String | str | np.str_ |
java.time.Instant | datetime.datetime | np.datetime64 |
Array | Sequence | np.ndarray |
java.lang.Object | Object | np.object_ |
Array types
In Python, sequences are the overarching term used to define iterable, ordered collections of data. In Deephaven tables, there are two commonly used array types:
PyObjects
org.jpy.PyObject
is a Java wrapper around arbitrary Python objects used by Deephaven's jpy bridge. They typically appear when Python functions without type hints or type casts are called in query strings.
While flexible enough to represent any Python object (including those without Java equivalents, like tuples or dictionaries), PyObjects have significant limitations:
- Limited compatibility with Deephaven's built-in functions
- Reduced performance compared to Java primitive types
- Higher memory usage due to their boxed nature
Use PyObjects only when working with complex Python objects that have no Java equivalent. Otherwise, prefer type hints or explicit type casts to ensure better performance and type safety.
Data type conversions
There are two ways to ensure that columns in tables are of the appropriate type: type hints and type casts. The following code block shows a simple example of both:
from deephaven import empty_table
def func_without_type_hint(value):
return value**0.5
def func_with_type_hint(value: int) -> float:
return value**0.5
source = empty_table(10).update(
[
"Typecast = (double)func_without_type_hint(ii)",
"TypeHint = func_with_type_hint(ii)",
]
)
source_meta = source.meta_table