Schema overview

All Deephaven tables stored in the database (e.g., can be read through db.live_table in a query) have a schema that defines a table's namespace, name, column names, and data types. In addition to specifying the structure of the data, a schema can also include:

Directives controlling how data is imported and stored, such as encoding formats for String columns and codecs for custom serialization of complex data types.
Metadata for data ingestion, such as custom DateTime converters.
Data validation rules for ensuring data quality during a merge.

Columns

Deephaven schemas define the names and data types for each column in a table. Below are some columns from the DbInternal.AuditEventLog table:

<Column name="Date" dataType="String" columnType="Partitioning" />
<Column name="Timestamp" dataType="DateTime" />
<Column name="ClientHost" dataType="String" />
<Column name="ClientPort" dataType="int" />
<Column name="Details" dataType="String" symbolTable="None" encoding="UTF_8" />

The Column element can also specify how the data is stored on disk. For example, the DbInternal.AuditEventLog table is partitioned on Date. See the full list of available column attributes in the table and schemas concept guide.

Data types

Data types can generally be any Java class, such as Java primitive types, arrays of primitive types, and Strings. Column codecs provide custom serialization logic for complex data types. See dataType for more information.

Historical data

There are two main categories of data storage in Deephaven: intraday and historical. Some historical storage options can be configured in the schema.

Merge attributes

Intraday data can be merged to historical storage in Deephaven or Apache Parquet formats. When merging data to Parquet, a default compression codec can be chosen by adding a MergeAttributes element with an appropriate Parquet-supported codec.

Extended layouts

Extended layouts are available for users with complex Parquet layouts that are created by other tools such as Apache Hadoop. Extended layouts also allow you to use multiple partitioning columns.

Data ingestion

Schemas can be extended with metadata to control how data is ingested into Deephaven. This includes:

DateTime converters for parsing date strings
Custom field writers for importing data from CSV, JSON, JDBC, and XML files
Data validation rules for ensuring data quality during a merge

Managing schema files:

Schemas are stored in etcd and can be imported to or exported from Deephaven using dhconfig schemas. Special care must be taken when updating a schema during the ingestion window. See Deploying Schemas during Intraday Data Ingestion.

Schema inference

Deephaven provides tools that can make writing new schemas easier by automatically inferring the schema from the data source. Schema Inference is available for the following data sources:

CopyTable schemas

One table layout may be used for multiple system tables. When this is required, it is not necessary to replicate the entire source schema definition for each new table. See CopyTable for more information.