Data control tool

This guide will show you how to use the data control tool, /usr/illumon/latest/bin/dhctl, to manage intraday data and historical metadata indexes from the command line.

The following commands are available for intraday data with the dhctl intraday subcommand:

  • truncate: Remove data from and mark a partition as permanently truncated.
  • delete: Fully delete (remove) a partition.
  • rescan: Instruct each DIS to rescan active intraday locations for new directories.

The following commands are available for historical metadata with the dhctl metadata subcommand:

  • list: List the metadata index for the specified tables.
  • validate: Report discrepancies between the specified metadata index and each location on disk.
  • update: Read table data on disk and produce a new metadata index for each table specified.

Each command above has a --help option that shows syntax and available options:

dhctl <data_type> <command> --help

Where <data_type> is either intraday or metadata, and <command> is one of the commands listed above.

Intraday

On any Deephaven server, dhctl intraday commands instruct the Data Import Server (DIS) instances to truncate or delete intraday partitions, or to scan for new data.

The data control tool uses the data routing configuration to locate the DIS(es) responsible for the intraday data. All applicable DISes will be instructed to truncate, delete, or rescan the data location(s) unless specified otherwise.

Authentication is required to remove data or to rescan. dhctl will attempt to use the iris user's default private key for authentication if it is readable. In most installations, if you invoke dhctl with sudo -u irisadmin dhctl, the tool will automatically authenticate as the iris user. Otherwise, you need to provide a username and password, or the location of an authorized private key file. The authenticated user must be a member of the iris-superusers or iris-datamanagers group.

Intraday data deletion

Deleting intraday data takes two steps: truncate and delete.

If you want to append new data to the same location (with the same internal partition value), first you need to truncate the partition, then remove any binary log (.bin) files the tailer would send for that partition, and then delete the partition. At that point, the system is prepared to accept new data for the location.

Note

Caveats

  • This method makes a best-effort attempt to delete everything on all appropriate Data Import Servers. This cannot be atomic, so the operation might have only partial success. Make sure you check all the results.
  • The truncated partitions are marked as permanently truncated, and further ingestion of data will be disallowed. This is to prevent confusion if loggers produce new data for the partition, or if tailers have not finished all existing data files.
  • Before logging new data for the truncated partitions, remove any existing data files (bin files), and then delete the partitions with dhctl intraday delete ....
  • It is possible to delete data on one Data Import Server and leave it on another (e.g., a backup). Be extremely careful with this, as it can create confusion.

Truncate

The first step is to truncate an intraday partition. This removes the data and marks the partition as permanently truncated. Any tailers for that partition will be disconnected, and any future attempt to tail data for that partition will be rejected.

Run dhctl intraday truncate --help to see the command syntax and available options:

usage: dhctl intraday truncate [-d] [-e <arg>] [-h] [-i <arg>] [-k <arg> | -user <arg>] [-part <arg>] [-pf <arg>] [-s
       <arg>]  [-v]
 -d,--dry-run                 print what actions would be performed without actually doing it
 -e,--exclude-dis <arg>       do not send requests to data import servers specified in an exclude parameter
 -h,--help                    print help for a intraday command
 -i,--include-dis <arg>       send requests only to data import servers specified in an include parameter
 -k,--key <arg>               specify a private key file to use for authentication
 -part,--partitions <arg>     partition to act on (all internal partitions), as namespace.tableName.columnPartition
 -pf,--pwfile <arg>           specify a file containing the base64 encoded password for the user that is set with --user
 -s,--singlePartition <arg>   single partition to act on, as namespace.tableName.internalPartition.columnPartition
 -user,--user <arg>           specify a user for authentication
 -v,--verbose                 print additional logging, progress messages, and full exception text

truncate removes data at the specified partition(s) and disables further data ingestion.
Once truncated, partitions may be deleted with the delete action, and then new data can be ingested.

One of --singlePartition or --partitions must be specified.

Examples

This command is a dry run truncating all internal partitions for 2021-07-07, excluding a backup DIS.

$ dhctl intraday truncate --user iris --exclude-dis db_dis_backup --partitions DbInternal.ProcessEventLog.2021-07-07 --dry-run
Authenticating connection using user and password…
...

DIS: db_dis_backup is excluded by command line parameter
DIS: db_dis_backup
    Result: SKIPPED
    Location results: 0
DIS: db_dis
    Result: SUCCESS
    Location results: 3
        {key=DbInternal.ProcessEventLog.I.query_server.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/query_server/2021-07-07/ProcessEventLog, message=Dry run, result=DRY_RUN, hasActiveProcessor=true}
        {key=DbInternal.ProcessEventLog.I.query_server.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/query_server/2021-07-07/ProcessEventLog, message=Dry run, result=DRY_RUN, hasActiveProcessor=true}
        {key=DbInternal.ProcessEventLog.I.db_merge_server_query_server.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/db_merge_server_query_server/2021-07-07/ProcessEventLog, message=Dry run, result=DRY_RUN, hasActiveProcessor=true}

This command is similar to the previous one but truncates only a single location, having an internal partition value of "query_server".

$ dhctl intraday truncate --user iris --exclude-dis db_dis_backup --singlePartition DbInternal.ProcessEventLog.query_server.2021-07-07 --dry-run
Authenticating connection using user and password…
...

DIS: db_dis_backup is excluded by command line parameter
DIS: db_dis_backup
    Result: SKIPPED
    Location results: 0
DIS: db_dis
    Result: SUCCESS
    Location results: 1
        {key=DbInternal.ProcessEventLog.I.query_server.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/query_server/2021-07-07/ProcessEventLog, message=Dry run, result=DRY_RUN, hasActiveProcessor=true}

Note that in these examples, the results include hasActiveProcessor=true. A value of true here indicates that the partition has an active tailer, so it’s likely this partition is not ready to be truncated yet.

Delete

After a partition has been truncated, you may fully remove the partition with the delete command.

Run dhctl intraday delete --help to see the command syntax and available options:

usage: dhctl intraday delete [-d] [-e <arg>] [-h] [-i <arg>] [-k <arg> | -user <arg>] [-part <arg>] [-pf <arg>] [-s
       <arg>]  [-v]
 -d,--dry-run                 print what actions would be performed without actually doing it
 -e,--exclude-dis <arg>       do not send requests to data import servers specified in an exclude parameter
 -h,--help                    print help for a intraday command
 -i,--include-dis <arg>       send requests only to data import servers specified in an include parameter
 -k,--key <arg>               specify a private key file to use for authentication
 -part,--partitions <arg>     partition to act on (all internal partitions), as namespace.tableName.columnPartition
 -pf,--pwfile <arg>           specify a file containing the base64 encoded password for the user that is set with --user
 -s,--singlePartition <arg>   single partition to act on, as namespace.tableName.internalPartition.columnPartition
 -user,--user <arg>           specify a user for authentication
 -v,--verbose                 print additional logging, progress messages, and full exception text

delete completely removes the directories for the specified partition(s), clearing the way for new data to be ingested.
The partitions must have been previously truncated with the truncate action.

One of --singlePartition or --partitions must be specified.

Examples

The delete will not proceed unless all partitions on all Import Servers will succeed, and the dry run output indicates this.

$dhctl intraday delete --user iris --exclude-dis db_dis_backup --partitions DbInternal.ProcessEventLog.2021-07-07 --dry-run
...
DIS: db_dis2
    Result: SKIPPED
    Location results: 0
DIS: db_dis
    Result: FAILURE
    Location results: 3
        {key=DbInternal.ProcessEventLog.I.query_server.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/query_server/2021-07-07/ProcessEventLog, message=Location has not been truncated, result=FAILED, hasActiveProcessor=true}
        {key=DbInternal.ProcessEventLog.I.query_server.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/query_server/2021-07-07/ProcessEventLog, message=Location has not been truncated, result=FAILED, hasActiveProcessor=true}
        {key=DbInternal.ProcessEventLog.I.db_merge_server_query_server.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/db_merge_server_query_server/2021-07-07/ProcessEventLog, message=Location has not been truncated, result=FAILED, hasActiveProcessor=true}

Rescan

If you add data to /db/Intraday without going through the DIS (e.g., batch CSV import, or by copying table data from another server), the DIS will not be aware of the new data if it has already scanned that table. The rescan command instructs each DIS to rescan active intraday locations for new directories.

Run dhctl intraday rescan --help to see the command syntax and available options:

usage: dhctl intraday rescan [-e <arg>] [-h] [-i <arg>] [-k <arg> | -user <arg>] [-pf <arg>] [-t <arg>]  [-v]
 -e,--exclude-dis <arg>   do not send requests to data import servers specified in an exclude parameter
 -h,--help                print help for a intraday command
 -i,--include-dis <arg>   send requests only to data import servers specified in an include parameter
 -k,--key <arg>           specify a private key file to use for authentication
 -pf,--pwfile <arg>       specify a file containing the base64 encoded password for the user that is set with --user
 -t,--table <arg>         table to act on, as namespace.tableName (assumes SYSTEM) or namespace.tableName.USER or
                          namespace.tableName.SYSTEM. If omitted, all intraday tables will be scanned.
 -user,--user <arg>       specify a user for authentication
 -v,--verbose             print additional logging, progress messages, and full exception text

This command instructs the DIS handling table DbInternal.ProcessEventLog to look for new data:

$ sudo -u irisadmin /usr/illumon/latest/bin/dhctl intraday rescan --table DbInternal.ProcessEventLog
...
Re-scan Result: SUCCESS
    DIS: db_dis(DbInternal.ProcessEventLog)
        Result: SUCCESS

This command instructs all DISs to look for new data for all tables:

$ sudo -u irisadmin /usr/illumon/latest/bin/dhctl intraday rescan
...
Re-scan Result: SUCCESS
    DIS: db_dis(all)
        Result: SUCCESS
    DIS: Ingester1(all)
        Result: SUCCESS

Historical

dhctl metadata subcommands can be used to inspect, validate, and update the state of historical metadata indexes. Each subcommand expects a list of one or more namespaces or tables to be specified as either * for everything, Namespace for an entire namespace, or Namespace.TableName for a specific table.

Run dhctl metadata --help to see the command syntax and available options:

usage: dhctl metadata update [-h] [-k <arg> | -user <arg>] [-pf <arg>] -t <arg>  [-v]

Updates the entire metadata index for the specified namespaces and tables.
 -h,--help               print help for a metadata command
 -k,--key <arg>          specify a private key file to use for authentication
 -pf,--pwfile <arg>      specify a file containing the base64 encoded password for the user that is set with --user
 -t,--table-name <arg>   The table or tables to act upon in Namespace.TableName or Namespace format.
                         A wildcard `*` may be used to select all tables in all system namespaces.
 -user,--user <arg>      specify a user for authentication
 -v,--verbose            print additional logging, progress messages, and full exception text

Example:
Update the metadata for the MarketUs namespace
    dhctl metadata update --table-name MarketUs

List

The dhctl metadata list command lists the metadata index for the specified tables.

Run dhctl metadata list --help to see the command syntax and available options:

usage: dhctl metadata list [-e <arg>] [-f <arg>] [-h] [-k <arg> | -user <arg>] [-p <arg>] [-pf <arg>] -t <arg>  [-v]

Lists all of the metadata snapshots in the specified namespaces and tables
 -e,--partition-filter <arg>   A filter expression to select which column partitions to validate.
                               The Partition column is always `Partition`
 -f,--file <arg>               A file to write the list output to. Optional. Defaults to stdout
 -h,--help                     print help for a metadata command
 -k,--key <arg>                specify a private key file to use for authentication
 -p,--partitions <arg>         The Column partition or partitions to validate.

 -pf,--pwfile <arg>            specify a file containing the base64 encoded password for the user that is set with
                               --user
 -t,--table-name <arg>         The table or tables to act upon in Namespace.TableName or Namespace format.
                               A wildcard `*` may be used to select all tables in all system namespaces.
 -user,--user <arg>            specify a user for authentication
 -v,--verbose                  print additional logging, progress messages, and full exception text

Examples:
List the metadata for every table in all system namespaces into /tmp/locationData.csv
    dhctl metadata list --table-name * --file /tmp/LocationData.csv
List all locations in Namespace1 and the table Namespace2.MyTable to stdout
    dhctl metadata list --table-name Namespace1 Namespace2.MyTable
List all locations with a partition of `2023-06-15` or `2023-07-16` in Namespace1 and the table Namespace2.MyTable to
stdout
    dhctl metadata list --table-name Namespace1 Namespace2.MyTable --partitions 2023-06-15 2023-07-15

Example

This command lists all metadata for the LearnDeephaven namespace:

$ /usr/illumon/latest/bin/dhctl metadata list -v -t LearnDeephaven

LearnDeephaven.StockQuotes: 1 total locations.
LearnDeephaven.EODTrades: 1 total locations.
LearnDeephaven.StockTrades: 5 total locations.
     Namespace|  TableName|ColumnPartition|InternalPartition|                                                          Path|    Format|ColumnVersion|                Size|                 LastModifiedTime
--------------+-----------+---------------+-----------------+--------------------------------------------------------------+----------+-------------+--------------------+---------------------------------
LearnDeephaven|StockQuotes|2017-08-25     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-25/StockQuotes|DEEPHAVEN |            1|             1547437|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|EODTrades  |2017-11-01     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-11-01/EODTrades  |DEEPHAVEN |            1|              656894|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|StockTrades|2017-08-25     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-25/StockTrades|DEEPHAVEN |            1|              576170|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|StockTrades|2017-08-21     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-21/StockTrades|DEEPHAVEN |            1|              703883|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|StockTrades|2017-08-22     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-22/StockTrades|DEEPHAVEN |            1|              674529|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|StockTrades|2017-08-23     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-23/StockTrades|DEEPHAVEN |            1|              598675|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|StockTrades|2017-08-24     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-24/StockTrades|DEEPHAVEN |            1|              605396|1969-12-31T19:00:00.000000000 NY

Validate

The dhctl metadata validate command processes the specified table metadata index, compares it with the state of each location on disk, and reports if there are any discrepancies.

Note

To validate the metadata index, every location must be visited and read off of disk. This can take a long time when reading many locations.

Run dhctl metadata validate --help to see the command syntax and available options:

usage: dhctl metadata validate [-e <arg>] [-h] [-k <arg> | -user <arg>] [-p <arg>] [-pf <arg>] -t <arg>  [-v]

Validates that each metadata snapshot present for the specified namespaces and tables
matches the actual state present at each location and generates a report.
 -e,--partition-filter <arg>   A filter expression to select which column partitions to validate.
                               The Partition column is always `Partition`
 -h,--help                     print help for a metadata command
 -k,--key <arg>                specify a private key file to use for authentication
 -p,--partitions <arg>         The Column partition or partitions to validate.

 -pf,--pwfile <arg>            specify a file containing the base64 encoded password for the user that is set with
                               --user
 -t,--table-name <arg>         The table or tables to act upon in Namespace.TableName or Namespace format.
                               A wildcard `*` may be used to select all tables in all system namespaces.
 -user,--user <arg>            specify a user for authentication
 -v,--verbose                  print additional logging, progress messages, and full exception text

Examples:
Validate the metadata for every table in all system namespaces.  The `*` should be quoted to avoid
    shell file globbing.
    dhctl metadata validate --table-name `*`
Validate all locations in Namespace1 and the table Namespace2.MyTable with verbose details
    dhctl metadata validate --verbose --table-name Namespace1 Namespace2.MyTable
Validate locations with a partition value greater than `2023-06-15` in the table Namespace2.MyTable with verbose details
    dhctl metadata validate --verbose --table-name Namespace1 Namespace2.MyTable --partition-filter "Partition >
`2023-06-15`"

Example

This command validates the metadata for the LearnDeephaven namespace:

$ /usr/illumon/latest/bin/dhctl metadata validate -t LearnDeephaven

Validating LearnDeephaven.StockQuotes: 1 total locations.
Validating LearnDeephaven.EODTrades: 1 total locations.
Validating LearnDeephaven.StockTrades: 5 total locations.
LearnDeephaven.StockTrades.P.0.2017-08-21:
	Location size (674529) does not match checkpoint size (703883)

Update

The dhctl metadata update command reads table data on disk and produces a new metadata index for each table specified. This command can be used to build a metadata index for a table that never had one, or to repair the index if problems were found using the validate subcommand.

Note

To update the metadata index, every location must be visited and read off of disk. This can take a long time when reading many locations.

Run dhctl metadata update --help to see the command syntax and available options:

usage: dhctl metadata update [-h] [-k <arg> | -user <arg>] [-pf <arg>] -t <arg>  [-v]

Updates the entire metadata index for the specified namespaces and tables.
 -h,--help               print help for a metadata command
 -k,--key <arg>          specify a private key file to use for authentication
 -pf,--pwfile <arg>      specify a file containing the base64 encoded password for the user that is set with --user
 -t,--table-name <arg>   The table or tables to act upon in Namespace.TableName or Namespace format.
                         A wildcard `*` may be used to select all tables in all system namespaces.
 -user,--user <arg>      specify a user for authentication
 -v,--verbose            print additional logging, progress messages, and full exception text

Example:
Update the metadata for the MarketUs namespace
    dhctl metadata update --table-name MarketUs

Example

Update the metadata for the DbInternal.ProcessEventLog table:

sudo -u irisadmin /usr/illumon/latest/bin/dhctl metadata update --table-name DbInternal.ProcessEventLog
Updating table location metadata index for DbInternal.ProcessEventLog

Log output

The dhctl script creates a log file in /var/log/deephaven/misc if the current user has write permission there, /tmp if not. The name of the logfile is based upon the date and time the command was run.