Schema overview

All Deephaven tables stored in the database (e.g., can be read through db.live_table in a query) have a schema that defines a table's namespace, name, column names, and data types. In addition to specifying the structure of the data, a schema can also include:

Columns

Deephaven schemas define the names and data types for each column in a table. Below are some columns from the DbInternal.AuditEventLog table:

<Column name="Date" dataType="String" columnType="Partitioning" />
<Column name="Timestamp" dataType="DateTime" />
<Column name="ClientHost" dataType="String" />
<Column name="ClientPort" dataType="int" />
<Column name="Details" dataType="String" symbolTable="None" encoding="UTF_8" />

The Column element can also specify how the data is stored on disk. For example, the DbInternal.AuditEventLog table is partitioned on Date. See the full list of available column attributes in the table and schemas concept guide.

Data types

Data types can generally be any Java class, such as Java primitive types, arrays of primitive types, and Strings. Column codecs provide custom serialization logic for complex data types. See dataType for more information.

Historical data

There are two main categories of data storage in Deephaven: intraday and historical. Some historical storage options can be configured in the schema.

Merge attributes

Intraday data can be merged to historical storage in Deephaven or Apache Parquet formats. When merging data to Parquet, a default compression codec can be chosen by adding a MergeAttributes element with an appropriate Parquet-supported codec.

Extended layouts

Extended layouts are available for users with complex Parquet layouts that are created by other tools such as Apache Hadoop. Extended layouts also allow you to use multiple partitioning columns.

Data ingestion

Schemas can be extended with metadata to control how data is ingested into Deephaven. This includes:

Managing schema files:

Schemas are stored in etcd and can be imported to or exported from Deephaven using dhconfig schemas. Special care must be taken when updating a schema during the ingestion window.

Schema inference

Deephaven provides tools that can make writing new schemas easier by automatically inferring the schema from the data source. Schema Inference is available for the following data sources:

CopyTable schemas

One table layout may be used for multiple system tables. When this is required, it is not necessary to replicate the entire source schema definition for each new table. See CopyTable for more information.