Data control tool
This guide will show you how to use the data control tool, /usr/illumon/latest/bin/dhctl, to manage intraday data and historical metadata indexes from the command line.
The following commands are available for intraday data with the dhctl intraday subcommand:
truncate: Remove data from and mark a partition as permanently truncated.delete: Fully delete (remove) a partition.rescan: Instruct each DIS to rescan active intraday locations for new directories.
The following commands are available for historical metadata with the dhctl metadata subcommand:
list: List the metadata index for the specified tables.validate: Report discrepancies between the specified metadata index and each location on disk.update: Read table data on disk and produce a new metadata index for each table specified.
Each command above has a --help option that shows syntax and available options:
Where <data_type> is either intraday or metadata, and <command> is one of the commands listed above.
Intraday
On any Deephaven server, dhctl intraday commands instruct the Data Import Server (DIS) instances to truncate or delete intraday partitions, or to scan for new data.
The data control tool uses the data routing configuration to locate the DIS(es) responsible for the intraday data. All applicable DISes will be instructed to truncate, delete, or rescan the data location(s) unless specified otherwise.
Authentication is required to remove data or to rescan. dhctl will attempt to use the iris user's default private key for authentication if it is readable. In most installations, if you invoke dhctl with sudo -u irisadmin dhctl, the tool will automatically authenticate as the iris user. Otherwise, you need to provide a username and password, or the location of an authorized private key file. The authenticated user must be a member of the iris-superusers or iris-datamanagers group.
Note
The authenticated user is the user that authenticated to the system to initiate the operation. The effective user is the user that Deephaven enforces permission checks for.
A simple example is the "Operate as" login, where the user who logs in (via password, SAML, etc.) is the Authenticated User, and the user they are operating as is the Effective User.
Intraday data deletion
The truncate and delete commands serve different purposes:
-
truncate: Removes intraday data and permanently marks the partition as truncated. The location will no longer accept new data. Use this when you want to permanently delete data and never ingest to this location again. -
delete: Resets a previously truncated partition so it can accept new data. Only use this if you intend to re-ingest data to the same location.
Recommended approach: Truncate only
For most use cases where you simply want to remove intraday data, truncate alone is sufficient:
This removes the data and prevents any future ingestion to that location, which is typically the desired behavior.
Re-enabling a location for new data
If you need to ingest new data to a location that was previously truncated, you must:
truncatethe partition (if not already truncated).- Remove any binary log (.bin) files for that partition that are still within the tailer's lookback window. Binary log files are stored by default in
/var/log/deephaven/binlogs/. Remove them using standard file system commands: deletethe partition to reset it for new ingestion.
Caution
If you delete a partition without removing the binary log files, and the tailer restarts, the tailer may re-ingest data from those files — causing the "deleted" data to reappear. Always remove the source binary files before running delete if you want to prevent this.
Note
- These operations make a best-effort attempt on all appropriate Data Import Servers. The operation is not atomic, so partial success is possible. Always check all results.
- It is possible to delete data on one Data Import Server and leave it on another (e.g., a backup). Be extremely careful with this, as it can create confusion.
Truncate
The truncate command removes data from an intraday partition and marks it as permanently truncated. Any tailers for that partition will be disconnected, and any future attempt to tail data for that partition will be rejected.
Run dhctl intraday truncate --help to see the command syntax and available options:
One of --singlePartition or --partitions must be specified.
Examples
This command is a dry run truncating all internal partitions for 2021-07-07, excluding a backup DIS.
This command is similar to the previous one but truncates only a single location, having an internal partition value of "query_server".
Note that in these examples, the results include hasActiveProcessor=true. A value of true here indicates that the partition has an active tailer, so it’s likely this partition is not ready to be truncated yet.
Delete
The delete command removes a previously truncated partition and resets it so new data can be ingested. Only use this if you intend to re-ingest data to the same location.
Run dhctl intraday delete --help to see the command syntax and available options:
One of --singlePartition or --partitions must be specified.
Examples
The delete will not proceed unless all partitions on all Import Servers will succeed, and the dry run output indicates this.
Rescan
If you add data to /db/Intraday without going through the DIS (e.g., batch CSV import, or by copying table data from another server), the DIS will not be aware of the new data if it has already scanned that table. The rescan command instructs each DIS to rescan active intraday locations for new directories.
Run dhctl intraday rescan --help to see the command syntax and available options:
This command instructs the DIS handling table DbInternal.ProcessEventLog to look for new data:
This command instructs all DISs to look for new data for all tables:
Historical
dhctl metadata subcommands can be used to inspect, validate, and update the state of historical metadata indexes. Each subcommand expects a list of one or more namespaces or tables to be specified as either * for everything, Namespace for an entire namespace, or Namespace.TableName for a specific table.
Run dhctl metadata --help to see the command syntax and available options:
List
The dhctl metadata list command lists the metadata index for the specified tables.
Run dhctl metadata list --help to see the command syntax and available options:
Example
This command lists all metadata for the LearnDeephaven namespace:
Validate
The dhctl metadata validate command processes the specified table metadata index, compares it with the state of each location on disk, and reports if there are any discrepancies.
Note
To validate the metadata index, every location must be visited and read off of disk. This can take a long time when reading many locations.
Run dhctl metadata validate --help to see the command syntax and available options:
Example
This command validates the metadata for the LearnDeephaven namespace:
Update
The dhctl metadata update command reads table data on disk and produces a new metadata index for each table specified. This command can be used to build a metadata index for a table that never had one, or to repair the index if problems were found using the validate subcommand.
Note
To update the metadata index, every location must be visited and read off of disk. This can take a long time when reading many locations.
Run dhctl metadata update --help to see the command syntax and available options:
Example
Update the metadata for the DbInternal.ProcessEventLog table:
Log output
The dhctl script creates a log file in /var/log/deephaven/misc if the current user has write permission there, /tmp if not. The name of the logfile is based upon the date and time the command was run.