Schema management

Deephaven clusters use etcd to centralize storage of schema files. Once schemas have been defined or changed, they must be deployed to etcd before they can be used. dhconfig both deploys schemas to a Deephaven database, making their described tables available for use, and exports schemas from etcd to a specified directory.

Deploy schemas to Deephaven

Once schemas are defined, or after they are modified, they must be imported into Deephaven's centralized configuration database (backed by etcd). This is done using the dhconfig utility, which uses the parameters defined at the command line to find the schema files. The Schema Editor also has an option to deploy schemas.

Until this step is done, schema changes are not visible to Deephaven.

Use the following command to perform schema deployment. Note that this command must be run as a user that has access to the Deephaven configuration server (e.g., sudo -u irisadmin).

/usr/illumon/latest/bin/dhconfig schemas import <options>

See dhconfig schemas for more details about the command options.

For example, on most Deephaven installations, the following command will redeploy the LearnDeephaven schemas:

/usr/illumon/latest/bin/dhconfig schemas import --directory /etc/sysconfig/illumon.d/schema/LearnDeephaven --force <auth options>

This results in the following output:

Updating System.LearnDeephaven.EODTrades
Updating System.LearnDeephaven.StockQuotes
Updating System.LearnDeephaven.StockTrades

Edit schemas of existing tables

If a table already has data, multiple steps may be necessary to edit its schema. Schema updates should normally be deployed after hours.

Deploy schemas during Intraday Data ingestion

Schema changes made while intraday data is being written to a table (for instance, through the tailer and Data Import Server) will require special attention as Deephaven does not automatically handle schema changes while data is being ingested.

First, ensure the tailer is not sending data for the changed table by examining its log file (usually /var/log/deephaven/tailer/LogtailerMain1.log.current), and stop the tailer.

There are two options for handling existing binary log files for the day of the schema change:

  1. Move or rename any existing binary log files for that table for the day so they won't be found by the tailer. For example, if the UpdatePerformanceLogCoreV2 schema was being changed:
sudo mkdir /db/TempFiles/save
sudo mv /var/log/deephaven/binlogs/perflogs/DbInternal.UpdatePerformanceLogCoreV2*.bin.* /db/TempFiles/save

Specify a new internal partition when initializing your generated logger so that new intraday data will be written to a different location. Historically, tailer configuration may have been used to specify new internal partitions. If the table's internal partition is determined by the tailer configuration, update the tailer's XML configuration file and change (or add) the internalPartitionSuffix attribute for the changed table. For example: internalPartitionSuffix="TableVersion2". If the internal partition is instead determined by the internalPartitionRegex attribute, update the logger to ensure it's logging with a different filename.

  1. Add logFormat attributes to Listener sections of the schema as well as mappings so "old" version log files can populate the "new" version schema columns.

Once the new schema is deployed, the tailer can be restarted.

Add a column

Adding a new column to a schema will show values as null when viewing the table for partitions that already existed at the time the column was added.

The ColumnTools utility can be used to populate values for the new column in historical partitions stored in Deephaven format.

Note

ColumnTools will not work on historical partitions that are stored in Parquet. You must remerge the table to change table data in Parquet format.

Remove a column

Removing a column from a schema does not by itself remove data from disk, but the column will no longer be visible in queries.

The ColumnTools utility can be used to delete a column from historical partitions stored in the Deephaven format.

Change a column's data type

Changing the data type of a column requires rewriting the data in that column for all partitions. Until this is done, queries will fail when they attempt to read the column.

For intraday data, this can be done after deleting the intraday partition with the data control tool. For historical data stored in Deephaven format, the ColumnTools utility can be used to rewrite the column data.

Update loggers

Loggers that stream data to a table must match the current version of the schema. Whenever a table schema is modified and redeployed (e.g., with the dhconfig schema import command or the schema editor), any related loggers must be recreated.

Export schemas

The dhconfig schemas export command exports schemas from etcd to a specified directory as .schema files, named <namespace>.<table name>.schema. These files can then be examined and edited, and re-imported with dhconfig schemas import.

For example, the following command will export the LearnDeephaven schema files to the directory /tmp/export, overwriting any files that are already there.

/usr/illumon/latest/bin/dhconfig schemas export --directory /tmp/export --namespace LearnDeephaven --force

List schemas

The dhconfig schemas list command also has an option to simply list schemas rather than export them.

For example, the following command will list the LearnDeephaven schema files.

/usr/illumon/latest/bin/dhconfig schemas list --namespace LearnDeephaven

Delete schemas

The dhconfig schemas delete command deletes schemas from etcd.

You can also delete a schema in a query. In a console, run:

ss = io.deephaven.shadow.enterprise.com.illumon.iris.db.schema.SchemaServiceFactory.getDefault()
ss.deleteSchema("Namespace", "Table")
import jpy

ss = jpy.get_type(
    "io.deephaven.shadow.enterprise.com.illumon.iris.db.schema.SchemaServiceFactory"
).getDefault()
ss.deleteSchema("Namespace", "Table")

General best practices

Because etcd does not keep old versions of schema files, we recommend you keep schema files in source control (such as Git). This allows you to track changes and revert to previous versions if necessary. When editing an existing schema, exporting from etcd first ensures you are editing the latest version.