Barrage Schema Annotation

Deephaven tables support Object-typed columns that can hold arbitrary Java objects. When exporting these tables over Flight using the Barrage format, Deephaven uses Apache Arrow schemas to describe the data. By default, if a column is typed as Object, the Arrow schema may not capture the intended structure of the data, which can lead to inefficient serialization or loss of type information. Use the Table.BARRAGE_SCHEMA_ATTRIBUTE to inject explicit Arrow schema information, which ensures that the Flight export uses the correct wire format.

Use this when your Deephaven column type is too generic for the intended wire type (for example, Object columns that should be exported as Union or Map), or when you want to opt into a wire-level compression such as Run-End Encoding. This guide includes examples of the Union, Map, and RunEndEncoded types, which are supported by Deephaven.

How It Works

  1. Extract a base schema with BarrageUtil.schemaFromTable(...). Manages basic type mapping for primitive types and collections of primitives.
  2. Replace the target field with explicit Arrow types.
  3. Attach the schema using withAttributes(Map.of(Table.BARRAGE_SCHEMA_ATTRIBUTE, newSchema)).

Note

withAttributes(...) returns a new table. If you later transform the table (for example, with select, view, or update), attributes may not be preserved and you may need to re-apply the schema. Ideally, you would apply the schema as late as possible before export to minimize this risk.

Example: Annotate Union<String, Double> Columns

The following example creates a table with a column of Objects (limited for this example to String and Double). The Arrow schema annotates the column as a dense union with String and Double branches. The final table can be exported over Flight / Barrage without error.

Example: Annotate Map<String, String> Columns

The following example creates a table with a column of Map<String, Double>. The Arrow schema annotates the column as an Arrow Map with the correct types for key and values. The final table can be exported over Flight / Barrage without error.

Example: Annotate Map<String, Integer> Columns

The following example creates a table with a column of Map<String, Integer>. The Arrow schema annotates the column as an Arrow Map with String keys and Integer values. The final table can be exported over Flight / Barrage without error.

Example: Annotate Map<String, Union> Columns

This example demonstrates the use of Union for values in a Map with String keys. The Union can contain a Double, String, Long, or Integer.

Example: Run-End Encoded (REE) Columns

Run-End Encoding is a wire-level optimization for columns with many repeated values. Instead of sending every value, the column is serialized as two child arrays:

  • run_ends — a non-nullable integer array of cumulative 1-based end indices, one per run. The last value always equals the logical row count.
  • values — the values that will be repeated in the run.

A column of 1,000 rows where the same integer repeats 100 times in a row costs 10 run_end entries + 10 value entries instead of 1,000 integers. Deephaven stores the column flat (unchanged type); REE is a transport-only optimization. The run_ends integer width is determined by the Arrow field structure you supply via BARRAGE_SCHEMA_ATTRIBUTE. Use Int32 unless you have a specific reason to use Int16. Note that Int16 run_ends constrain the effective batch size to at most Short.MAX_VALUE / 32,767 rows per record batch.