Splayed tables
Deephaven provides a proprietary file format for storing tables on disk, which is used for both persistent and temporary storage. This columnar format is designed to be efficient for both storage and access and is optimized for the types of operations common in data analysis. These splayed tables can store the result of a simple query or act as the leaf nodes of a partitioned table.
Files
A splayed table is stored as a directory containing multiple files. The files in the directory include the following types:
.tbl
files: Store table metadata, including storage layout (e.g., splayed, partitioned), and details about each column such as order, name, data type, and any special functions..dat
files: Contain column data stored sequentially, prefixed by a serialized Java object containing metadata..ovr
files: Hold overflow metadata for.dat
files of the same name..bytes
files: Store BLOBs (Binary Large OBjects) referenced by offset and length from their associated.dat
files..sym
files: Provide a table of strings referenced by index from their associated.dat
files..sym.bytes
files: Store strings referenced by offset and length from their associated.sym
files.
Column files
Deephaven column files can store persistent data for all supported types.
- Java primitive types and their boxed representations: Stored directly in the data region of
.dat
files, one fixed-width value per row. Distinguished values (see theQueryConstants
class) represent null, negative infinity, and positive infinity when appropriate.- Boolean: 1 byte per row, with values in {-1 (null), 0 (false), 1 (true)}.
- byte: 1 byte per row.
- char: 2 bytes per row.
- double: 8 bytes per row.
- float: 4 bytes per row.
- int: 4 bytes per row.
- long: 8 bytes per row.
- short: 2 bytes per row.
java.time.Instant
: A nanosecond-resolution UTC timestamp stored as a column of longs. Uses 8 bytes per row in the associated.dat
file. Supports dates from 09/25/1677 to 04/11/2262.Symbol
: Stores String data with an associated lookup table capable of representing 231-1 (approximately 2 billion) unique values. Optimized for a small number of unique values. Each row consumes 4 bytes of storage in the.dat
file, and each unique non-null value consumes 8 bytes of storage in the.sym
file and a variable length record in the.sym.bytes
file.SymbolSet
: Allows efficient storage ofStringSets
from a universe of up to 64 symbol values. Uses the same symbol lookup table format as Symbol columns, and uses 8 bytes per row in the.dat
file.- Serializable or Externalizable Java classes: Deephaven can store columns of any Serializable or Externalizable Java class as BLOBs. BLOBs consume 8 bytes of storage per row in the
.dat
file, and a variable length record in the associated.bytes
file.