Schema management
Deephaven clusters use etcd to centralize storage of schema files. Once schemas have been defined or changed, they must be deployed to etcd before they can be used. dhconfig
both deploys schemas to a Deephaven database, making their described tables available for use, and exports schemas from etcd to a specified directory.
Deploy schemas to Deephaven
Once schemas are defined, or after they are modified, they must be imported into Deephaven's centralized configuration database (backed by etcd
). This is done using the dhconfig
utility, which uses the parameters defined at the command line to find the schema files. The Schema Editor also has an option to deploy schemas.
Until this step is done, schema changes are not visible to Deephaven.
Use the following command to perform schema deployment. Note that this command must be run as a user that has access to the Deephaven configuration server (e.g., sudo -u irisadmin
).
/usr/illumon/latest/bin/dhconfig schemas import <options>
See dhconfig schemas for more details about the command options.
For example, on most Deephaven installations, the following command will redeploy the LearnDeephaven
schemas:
/usr/illumon/latest/bin/dhconfig schemas import --directory /etc/sysconfig/illumon.d/schema/LearnDeephaven --force <auth options>
This results in the following output:
Updating System.LearnDeephaven.EODTrades
Updating System.LearnDeephaven.StockQuotes
Updating System.LearnDeephaven.StockTrades
Deploy schemas during Intraday Data ingestion
Schema changes made while intraday data is being written to a table (for instance, through the tailer and Data Import Server) will require special attention as Deephaven does not automatically handle schema changes while data is being ingested.
-
First, ensure the tailer is not sending data for the changed table by examining its log file (usually
/var/log/deephaven/tailer/LogtailerMain1.log.current
), and stop the tailer. -
Move or rename any existing binary log files for that table for the day so they won't be found by the tailer. For example, if the
UpdatePerformanceLogCoreV2
schema was being changed:sudo mkdir /db/TempFiles/save sudo mv /var/log/deephaven/binlogs/perflogs/DbInternal.UpdatePerformanceLogCoreV2*.bin.* /db/TempFiles/save
-
Specify a new internal partition when initializing your generated logger so that new intraday data will be written to a different location. Historically, tailer configuration may have been used to specify new internal partitions. If the table's internal partition is determined by the tailer configuration, update the tailer's XML configuration file and change (or add) the
internalPartitionSuffix
attribute for the changed table. For example:internalPartitionSuffix="TableVersion2"
. If the internal partition is instead determined by theinternalPartitionRegex
attribute, update the logger to ensure it's logging with a different filename. -
Finally, once the new schema is deployed, the tailer can be restarted.
Edit schemas of existing tables
Once a table has data, there are multiple caveats regarding editing its schema:
- Adding a new column to schema will show values as null when viewing the table for partitions that already existed at the time the column was added.
- Removing a column from schema does not by itself remove data from disk, but the column will no longer be visible in queries.
- Schema updates should normally be deployed after hours. If a schema update is applied midday, there are two options to maintain compatibility:
- Create a new internal partition to separate the data written with the "old" schema from new data being written with the "new" schema.
- Add logFormat attributes to Listener sections of the schema as well as mappings so "old" version log files can populate the "new" version schema columns.
- Changing the datatype of a column requires rewriting the data in that column for all partitions.
- Loggers that stream data to a table must match the schema in terms of which columns they provide.
Edit schemas with Column Tools
Deephaven also offers ColumnTools
as a way to automate and simplify schema editing tasks for tables that already have data. The tool is only available in the legacy Groovy API.
Export schemas
The dhconfig schemas export
command exports schemas from etcd to a specified directory as .schema files, named <namespace>.<table name>.schema
. These files can then be examined and edited, and re-imported with dhconfig schemas import
.
For example, the following command will export the LearnDeephaven schema files to the directory /tmp/export
, overwriting any files that are already there.
/usr/illumon/latest/bin/dhconfig schema export --directory /tmp/export --namespace LearnDeephaven --force
List schemas
The dhconfig schemas list
command also has an option to simply list schemas rather than export them.
For example, the following command will list the LearnDeephaven schema files.
/usr/illumon/latest/bin/dhconfig schema list --namespace LearnDeephaven
Delete schemas
The dhconfig schemas delete
command deletes schemas from etcd.
You can also delete a schema in a query. In a console, run:
ss = io.deephaven.shadow.enterprise.com.illumon.iris.db.schema.SchemaServiceFactory.getDefault()
ss.deleteSchema("Namespace", "Table")
import jpy
ss = jpy.get_type(
"io.deephaven.shadow.enterprise.com.illumon.iris.db.schema.SchemaServiceFactory"
).getDefault()
ss.deleteSchema("Namespace", "Table")
General best practices
Because etcd does not keep old versions of schema files, we recommend you keep schema files in source control (such as Git). This allows you to track changes and revert to previous versions if necessary. When editing an existing schema, exporting from etcd first ensures you are editing the latest version.