Schema management
Deephaven clusters use etcd to centralize storage of schema files. Once schemas have been defined or changed, they must be deployed to etcd before they can be used. dhconfig
both deploys schemas to a Deephaven database, making their described tables available for use, and exports schemas from etcd to a specified directory.
Deploy schemas to Deephaven
Once schemas are defined, or after they are modified, they must be imported into Deephaven's centralized configuration database (backed by etcd
). This is done using the dhconfig
utility, which uses the parameters defined at the command line to find the schema files. The Schema Editor also has an option to deploy schemas.
Until this step is done, schema changes are not visible to Deephaven.
Use the following command to perform schema deployment. Note that this command must be run as a user that has access to the Deephaven configuration server (e.g., sudo -u irisadmin
).
/usr/illumon/latest/bin/dhconfig schemas import <options>
See dhconfig schemas for more details about the command options.
For example, on most Deephaven installations, the following command will redeploy the LearnDeephaven
schemas:
/usr/illumon/latest/bin/dhconfig schemas import --directory /etc/sysconfig/illumon.d/schema/LearnDeephaven --force <auth options>
This results in the following output:
Updating System.LearnDeephaven.EODTrades
Updating System.LearnDeephaven.StockQuotes
Updating System.LearnDeephaven.StockTrades
Edit schemas of existing tables
If a table already has data, multiple steps may be necessary to edit its schema. Schema updates should normally be deployed after hours.
Deploy schemas during Intraday Data ingestion
Schema changes made while intraday data is being written to a table (for instance, through the tailer and Data Import Server) will require special attention as Deephaven does not automatically handle schema changes while data is being ingested.
First, ensure the tailer is not sending data for the changed table by examining its log file (usually /var/log/deephaven/tailer/LogtailerMain1.log.current
), and stop the tailer.
There are two options for handling existing binary log files for the day of the schema change:
- Move or rename any existing binary log files for that table for the day so they won't be found by the tailer. For example, if the
UpdatePerformanceLogCoreV2
schema was being changed:
sudo mkdir /db/TempFiles/save
sudo mv /var/log/deephaven/binlogs/perflogs/DbInternal.UpdatePerformanceLogCoreV2*.bin.* /db/TempFiles/save
Specify a new internal partition when initializing your generated logger so that new intraday data will be written to a different location.
Historically, tailer configuration may have been used to specify new internal partitions. If the table's internal partition is determined by the tailer configuration, update the tailer's XML configuration file and change (or add) the internalPartitionSuffix
attribute for the changed table. For example: internalPartitionSuffix="TableVersion2"
. If the internal partition is instead determined by the internalPartitionRegex
attribute, update the logger to ensure it's logging with a different filename.
- Add
logFormat
attributes to Listener sections of the schema as well as mappings so "old" version log files can populate the "new" version schema columns.
Once the new schema is deployed, the tailer can be restarted.
Add a column
Adding a new column to a schema will show values as null when viewing the table for partitions that already existed at the time the column was added.
The ColumnTools
utility can be used to populate values for the new column in historical partitions stored in Deephaven format.
Note
ColumnTools
will not work on historical partitions that are stored in Parquet. You must remerge the table to change table data in Parquet format.
Remove a column
Removing a column from a schema does not by itself remove data from disk, but the column will no longer be visible in queries.
The ColumnTools
utility can be used to delete a column from historical partitions stored in the Deephaven format.
Change a column's data type
Changing the data type of a column requires rewriting the data in that column for all partitions. Until this is done, queries will fail when they attempt to read the column.
For intraday data, this can be done after deleting the intraday partition with the data control tool. For historical data stored in Deephaven format, the ColumnTools
utility can be used to rewrite the column data.
Update loggers
Loggers that stream data to a table must match the current version of the schema. Whenever a table schema is modified and redeployed (e.g., with the dhconfig
schema import command or the schema editor), any related loggers must be recreated.
Export schemas
The dhconfig schemas export
command exports schemas from etcd to a specified directory as .schema files, named <namespace>.<table name>.schema
. These files can then be examined and edited, and re-imported with dhconfig schemas import
.
For example, the following command will export the LearnDeephaven schema files to the directory /tmp/export
, overwriting any files that are already there.
/usr/illumon/latest/bin/dhconfig schemas export --directory /tmp/export --namespace LearnDeephaven --force
List schemas
The dhconfig schemas list
command also has an option to simply list schemas rather than export them.
For example, the following command will list the LearnDeephaven schema files.
/usr/illumon/latest/bin/dhconfig schemas list --namespace LearnDeephaven
Delete schemas
The dhconfig schemas delete
command deletes schemas from etcd.
You can also delete a schema in a query. In a console, run:
ss = io.deephaven.shadow.enterprise.com.illumon.iris.db.schema.SchemaServiceFactory.getDefault()
ss.deleteSchema("Namespace", "Table")
import jpy
ss = jpy.get_type(
"io.deephaven.shadow.enterprise.com.illumon.iris.db.schema.SchemaServiceFactory"
).getDefault()
ss.deleteSchema("Namespace", "Table")
General best practices
Because etcd does not keep old versions of schema files, we recommend you keep schema files in source control (such as Git). This allows you to track changes and revert to previous versions if necessary. When editing an existing schema, exporting from etcd first ensures you are editing the latest version.