Reading and Writing Tables

While most use cases are satisfied using the db interfaces, on occasion you may need to read or write individual tables from disk. Tables written this way can only be read back as individual tables and are not accessible from db.liveTable() or db.historicalTable().

Formats

Core+ supports two table formats: Apache Parquet and Deephaven.

The Apache Parquet format is a standard format supported by many other tools and libraries. It supports compression and has advanced metadata that tools can use to make access fast. Unless you have a specific need for the additional append or read/write performance provided by the Deephaven format, this is the preferred static data format.

The Deephaven format was developed for fast columnar storage before Parquet was ubiquitous. Deephaven's format maximizes parallelism, allows arbitrary random reads, and is optimized for fast live append operations. It is very good at fast, parallel reading and writing. It can be efficiently appended to, but does not support any form of compression and is not interoperable with other standard tools. Only choose this format if you expect to append to the table very frequently or specifically require the additional read/write performance.

Parquet API

Reading and writing tables in Parquet format can be done directly through the Deephaven Core interfaces, which support both reading and writing single tables or a directory hierarchy of tables.

Deephaven API

Reading and writing Deephaven tables is done with the io.deephaven.enterprise.table.EnterpriseTableTools in Groovy or the deephaven_enterprise.table_tools module in Python.

For example:

Logging system tables from Core+

Core+ workers can log Table objects to a System table using the io.deephaven.enterprise.database.SystemTableLogger class in Groovy or the system_table_logger module in Python. As always, logging to a system table requires that a schema be defined in advance for the table.

You must specify what column partition to write to. This can be a fixed column partition or the current date (at the time the row was written; data is not introspected for a Timestamp).

The default behavior is to write via the Log Aggregator Service (LAS), but you can also write via binary logs. Because no code generation or listener versioning is performed, you must write columns in the format the listener expects. Logging tables for a non-default ZoneId (timezone) may only be done via direct binary logging, and cannot be done via LAS.

Row-by-row logging is not yet supported by Core+ workers. Existing binary loggers cannot be executed in the context of a Core+ worker because they reference classes that are shadowed (renamed). If row-level logging is required, you must use io.deephaven.shadow.enterprise.com.illumon.iris.binarystore.BinaryStoreWriterV2 directly.

Logging complex types

If you want to log complex types such as arrays or custom objects, you may use one of the built-in codecs or implement your own codec by extending the io.deephaven.util.codec.ObjectCodec class and placing the compiled jar file on both the Enterprise and Core+ classpaths. Deephaven provides codecs for several common types:

Data TypeCodec classDescription
byte[]io.deephaven.enterprise.codec.ByteArrayCodecEncodes a byte[]
char[]io.deephaven.enterprise.codec.CharArrayCodecEncodes a char[]
short[]io.deephaven.enterprise.codec.ShortArrayCodecEncodes a short[]
int[]io.deephaven.enterprise.codec.IntArrayCodecEncodes a int[]
long[]io.deephaven.enterprise.codec.LongArrayCodecEncodes a long[]
float[]io.deephaven.enterprise.codec.FloatArrayCodecEncodes a float[]
double[]io.deephaven.enterprise.codec.ByteArrayCodecEncodes a byte[]
String[]io.deephaven.enterprise.codec.StringArrayCodecEncodes a String[]

To use codecs, the Schema for your table must define the codec used for each column. For example, the following schema defines a table Example.Data that contains a plain integer column, two array columns, and a custom object column.

The code below can be used to write to the system table.

Note

See the Javadoc for complete options:

The Python version does not use options but named arguments. If you specify None for the column partition, the current date is used. The system_table_logger module provides methods for creating codecs for commonly used types, and custom codecs can be set using the system_table_logger.codec() method. See the Pydoc for more details.

Similarly, if you call log_table_incremental from Python, you must close the returned object (or use it as a context manager in a with statement).