Splayed tables

Deephaven provides a proprietary file format for storing tables on disk, which is used for both persistent and temporary storage. This columnar format is designed to be efficient for both storage and access and is optimized for the types of operations common in data analysis. These splayed tables can store the result of a simple query or act as the leaf nodes of a partitioned table.

Files

A splayed table is stored as a directory containing multiple files. The files in the directory include the following types:

  • .tbl files: Store table metadata, including storage layout (e.g., splayed, partitioned), and details about each column such as order, name, data type, and any special functions.
  • .dat files: Contain column data stored sequentially, prefixed by a serialized Java object containing metadata.
  • .ovr files: Hold overflow metadata for .dat files of the same name.
  • .bytes files: Store BLOBs (Binary Large OBjects) referenced by offset and length from their associated .dat files.
  • .sym files: Provide a table of strings referenced by index from their associated .dat files.
  • .sym.bytes files: Store strings referenced by offset and length from their associated .sym files.

Column files

Deephaven column files can store persistent data for all supported types.

  • Java primitive types and their boxed representations: Stored directly in the data region of .dat files, one fixed-width value per row. Distinguished values (see the QueryConstants class) represent null, negative infinity, and positive infinity when appropriate.
    • Boolean: 1 byte per row, with values in {-1 (null), 0 (false), 1 (true)}.
    • byte: 1 byte per row.
    • char: 2 bytes per row.
    • double: 8 bytes per row.
    • float: 4 bytes per row.
    • int: 4 bytes per row.
    • long: 8 bytes per row.
    • short: 2 bytes per row.
  • java.time.Instant: A nanosecond-resolution UTC timestamp stored as a column of longs. Uses 8 bytes per row in the associated .dat file. Supports dates from 09/25/1677 to 04/11/2262.
  • Symbol: Stores String data with an associated lookup table capable of representing 231-1 (approximately 2 billion) unique values. Optimized for a small number of unique values. Each row consumes 4 bytes of storage in the .dat file, and each unique non-null value consumes 8 bytes of storage in the .sym file and a variable length record in the .sym.bytes file.
  • SymbolSet: Allows efficient storage of StringSets from a universe of up to 64 symbol values. Uses the same symbol lookup table format as Symbol columns, and uses 8 bytes per row in the .dat file.
  • Serializable or Externalizable Java classes: Deephaven can store columns of any Serializable or Externalizable Java class as BLOBs. BLOBs consume 8 bytes of storage per row in the .dat file, and a variable length record in the associated .bytes file.