Iceberg

Iceberg tables can be referenced via Deephaven Schemas using the Extended Storage feature. This provides immediate access for Deephaven to read existing Iceberg tables as Deephaven tables, including support for Iceberg schema evolution and nested Iceberg structures.

Configuration

The easiest way to configure Iceberg tables for Deephaven is to use the built-in inference provided by LoadTableOptions. Advanced users can customize the mapping with a Resolver, or with custom inference options.

A type or catalog-impl key is required for an Iceberg catalog. Additional parameters may be necessary depending on the type of catalog. Commonly configured properties may include warehouse and uri. See Iceberg Catalog properties and your specific catalog implementation for more details on configuration. An Iceberg table identifier is also required.

If you have a working Spark configuration, that can typically be translated into the necessary Catalog properties by removing the Spark prefix.

For example, the following Spark properties:

Translate into the given BuildCatalogOptions:

See BuildCatalogOptions and LoadTableOptions for more details on these structures.

Deployment

An iris-schemamanagers user is required to deploy the Schema.

Serialization format

Caution

The creation and deployment of a Deephaven Iceberg schema is typically performed programmatically, as shown in the previous sections. Exercise caution when manually creating or editing a schema.

An Iceberg table is referenced in a Deephaven table's schema using an ExtendedStorage element with the attribute type set to iceberg.

Catalog element

The Catalog element is a serialization of the core BuildCatalogOptions. It is composed of a Name, Properties, and optional HadoopConfig element. The Properties element is a map of string keys to string values. The optional HadoopConfig element is optional and is an additional map for Hadoop catalogs. For example:

The injection attribute on the Properties element controls whether Deephaven may automatically add properties that work around known upstream issues and/or supply defaults needed for Deephaven's Iceberg usage. The valid values are enabled and disabled. It is recommended to set this to enabled.

Table element

The Table element is a serialization of the core LoadTableOptions. It is composed of a TableIdentifier, Resolver, and NameMapping element.

The Resolver element contains a ColumnInstructions, Schema, and optional PartitionSpec element. The ColumnInstructions element contains the mapping from Deephaven column names to Iceberg fieldId, Iceberg partitionFieldId, or type unmapped. The Schema element contains the Iceberg Schema JSON. The optional PartitionSpec element contains the Iceberg Partition Spec JSON.

The NameMapping element provides fallback field ids to be used when a data file does not contain field id information. It has three different types, specified via the type attribute.

The table type means to read the Name Mapping from the Iceberg Table property schema.name-mapping.default (see https://iceberg.apache.org/spec/#column-projection).

The empty type means to not use name mapping.

The json type uses Iceberg Name Mapping JSON.

Full example

Let's assume that we have an existing Iceberg Glue Catalog that contains a table mycatalog.cities with the Iceberg Schema:

To create a new Deephaven Schema with namespace DhExample and table name Cities that references this Iceberg Table, we would execute the following once:

This would result in the following Deephaven Schema: