Splayed tables
Deephaven provides a proprietary file format for storing tables on disk, which is used for both persistent and temporary storage. This columnar format is designed to be efficient for both storage and access and is optimized for the types of operations common in data analysis. These splayed tables can store the result of a simple query or act as the leaf nodes of a partitioned table.
Files
A splayed table is stored as a directory containing multiple files. The files in the directory include the following types:
.tblfiles: Store table metadata, including storage layout (e.g., splayed, partitioned), and details about each column such as order, name, data type, and any special functions..datfiles: Contain column data stored sequentially, prefixed by a serialized Java object containing metadata..ovrfiles: Hold overflow metadata for.datfiles of the same name..bytesfiles: Store BLOBs (Binary Large OBjects) referenced by offset and length from their associated.datfiles..symfiles: Provide a table of strings referenced by index from their associated.datfiles..sym.bytesfiles: Store strings referenced by offset and length from their associated.symfiles.
Column files
Deephaven column files can store persistent data for all supported types.
- Java primitive types and their boxed representations: Stored directly in the data region of
.datfiles, one fixed-width value per row. Distinguished values (see theQueryConstantsclass) represent null, negative infinity, and positive infinity when appropriate.- Boolean: 1 byte per row, with values in {-1 (null), 0 (false), 1 (true)}.
- byte: 1 byte per row.
- char: 2 bytes per row.
- double: 8 bytes per row.
- float: 4 bytes per row.
- int: 4 bytes per row.
- long: 8 bytes per row.
- short: 2 bytes per row.
java.time.Instant: A nanosecond-resolution UTC timestamp stored as a column of longs. Uses 8 bytes per row in the associated.datfile. Supports dates from 09/25/1677 to 04/11/2262.Symbol: Stores String data with an associated lookup table capable of representing 231-1 (approximately 2 billion) unique values. Optimized for a small number of unique values. Each row consumes 4 bytes of storage in the.datfile, and each unique non-null value consumes 8 bytes of storage in the.symfile and a variable length record in the.sym.bytesfile.SymbolSet: Allows efficient storage ofStringSetsfrom a universe of up to 64 symbol values. Uses the same symbol lookup table format as Symbol columns, and uses 8 bytes per row in the.datfile.- Serializable or Externalizable Java classes: Deephaven can store columns of any Serializable or Externalizable Java class as BLOBs. BLOBs consume 8 bytes of storage per row in the
.datfile, and a variable length record in the associated.bytesfile.