Data control tool

The command line tool dhctl allows you to send commands to Data ImportServers (DIS) and inspect historical metadata indexes. Like the configuration tool, dhconfig, there is some built-in guidance. All fields and arguments can be abbreviated, as long as they remain unambiguous.

For example, to get help:

dhctl --help
Usage: dhctl [intraday|metadata] [truncate|delete|rescan|help|list|validate|update] [arguments]

dhctl intraday

On any Deephaven server, /usr/illumon/latest/bin/dhctl intraday subcommands can be used to instruct DIS instances to truncate or delete intraday partitions, or to scan for new data.

Deleting intraday data

dhctl intraday truncate --help
usage: dhctl intraday truncate [-d] [-e <arg>] [-h] [-i <arg>] [-k <arg> | --user <arg>] [-p <arg>] [-part <arg>] [-pf
       <arg>] [-s <arg>]
 -d,--dry-run                 print what actions would be performed without actually doing it
 -e,--exclude-dis <arg>       do not send requests to data import servers specified in an exclude parameter
 -h,--help                    print help for a intraday command
 -i,--include-dis <arg>       send requests only to data import servers specified in an include parameter
 -k,--key <arg>               specify a private key file to use for authentication
 -p,--password <arg>          specify a password for the given user
 -part,--partitions <arg>     partition to act on (all internal partitions), as namespace.tableName.columnPartition
 -pf,--pwfile <arg>           specify a file containing the base64 encoded password for the given user
 -s,--singlePartition <arg>   single partition to act on, as namespace.tableName.internalPartition.columnPartition
    --user <arg>              specify a user for authentication

truncate removes data at the specified partition(s) and disables further data ingestion.
Once truncated, partitions may be deleted with the delete action, and then new data can be ingested.

The intraday control tool uses the configured data routing to locate the Data Import Server or Servers responsible for the data specified on the command line. All applicable servers will be instructed to truncate, delete, or rescan the data location(s) unless they are excluded. Any DIS listed in an --exclude-dis argument will be skipped. If any DIS is included with an --include-dis argument, then any DIS not explicitly included will be skipped.

Authentication is required to remove data or to rescan. dhctl will attempt to use the iris user default private key for authentication if it is readable. In most installations, if you invoke dhctl with sudo -u irisadmin dhctl, the tool will automatically authenticate as the iris user. Otherwise, you need to provide a username and password, or the location of an authorized private key file.

The authenticated user requires membership in the iris-superusers or iris-datamanagers groups.

Deleting intraday data happens in two steps:

  1. The first step is to truncate a location. This removes the data and marks the partition as permanently truncated. Any tailers for that partition will be disconnected, and any future attempt to tail data for that partition will be rejected.
  2. After a partition has been truncated, you may prepare to accept new data into the same partition with the delete command. The delete command is permitted only when all the target partitions have been truncated.

If you want to append new data to the same location (with the same internal partition value), you will need to first truncate the partition, then remove any intraday log (.bin) files the tailer would send for that partition, and then delete the partition. At that point, the system is prepared to accept new data for the location.

One of --singlePartition or --partitions must be specified:

  • singlePartition removes a single data directory.
  • partitions removes the data directories for all internal partition values that match the key.

Re-scanning locations

If you add data to /db/Intraday without going through the DIS (e.g., batch or CSV import, or by copying files), the DIS will not be aware of the new data if it has already scanned that table.

The rescan command uses the default data routing to determine which DISs to contact for the --table argument, or for all intraday data if tables are not specified. Each DIS is sent a command instructing it to rescan active intraday locations for new directories.

Examples

Any invocation of truncate or delete may be a “dry run”, in which case the result will be a list of what the command would have done.

truncate

This command is a dry run truncating all internal partitions for 2021-07-07, excluding a backup dis.

$ dhctl intraday truncate --user iris --password iris --exclude-dis db_dis_backup --partitions DbInternal.ProcessEventLog.2021-07-07 --dry-run
Authenticating connection using user and password…
...

DIS: db_dis_backup is excluded by command line parameter
DIS: db_dis_backup
    Result: SKIPPED
    Location results: 0
DIS: db_dis
    Result: SUCCESS
    Location results: 3
        {key=DbInternal.ProcessEventLog.I.db_query_server_servername-bhs-vm_int_illumon_com.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/db_query_server_servername-bhs-vm_int_illumon_com/2021-07-07/ProcessEventLog, message=Dry run, result=DRY_RUN, hasActiveProcessor=true}
        {key=DbInternal.ProcessEventLog.I.servername-bhs-vm_int_illumon_com.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/servername-bhs-vm_int_illumon_com/2021-07-07/ProcessEventLog, message=Dry run, result=DRY_RUN, hasActiveProcessor=true}
        {key=DbInternal.ProcessEventLog.I.db_merge_server_servername-bhs-vm_int_illumon_com.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/db_merge_server_servername-bhs-vm_int_illumon_com/2021-07-07/ProcessEventLog, message=Dry run, result=DRY_RUN, hasActiveProcessor=true}

This command is similar to the previous, but truncates only a single location, having internal partition value "db_query_server_servername-bhs-vm_int_illumon_com".

$ dhctl intraday truncate --user iris --password iris --exclude-dis db_dis_backup --singlePartition DbInternal.ProcessEventLog.db_query_server_servername-bhs-vm_int_illumon_com.2021-07-07 --dry-run
Authenticating connection using user and password…
...

DIS: db_dis_backup is excluded by command line parameter
DIS: db_dis_backup
    Result: SKIPPED
    Location results: 0
DIS: db_dis
    Result: SUCCESS
    Location results: 1
        {key=DbInternal.ProcessEventLog.I.db_query_server_servername-bhs-vm_int_illumon_com.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/db_query_server_servername-bhs-vm_int_illumon_com/2021-07-07/ProcessEventLog, message=Dry run, result=DRY_RUN, hasActiveProcessor=true}

Note that in these examples, the results include hasActiveProcessor=true. A value of true here indicates that the partition has an active tailer, so it’s likely this partition is not ready to be truncated yet.

delete

The delete command is similar. The delete will not proceed unless all partitions on all Import Servers will succeed, and the dry run output indicates this.

$dhctl intraday delete --user iris --password iris --exclude-dis db_dis_backup --partitions DbInternal.ProcessEventLog.2021-07-07 --dry-run
...
DIS: db_dis2
    Result: SKIPPED
    Location results: 0
DIS: db_dis
    Result: FAILURE
    Location results: 3
        {key=DbInternal.ProcessEventLog.I.db_query_server_servername-bhs-vm_int_illumon_com.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/db_query_server_servername-bhs-vm_int_illumon_com/2021-07-07/ProcessEventLog, message=Location has not been truncated, result=FAILED, hasActiveProcessor=true}
        {key=DbInternal.ProcessEventLog.I.servername-bhs-vm_int_illumon_com.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/servername-bhs-vm_int_illumon_com/2021-07-07/ProcessEventLog, message=Location has not been truncated, result=FAILED, hasActiveProcessor=true}
        {key=DbInternal.ProcessEventLog.I.db_merge_server_servername-bhs-vm_int_illumon_com.2021-07-07, dir=/db/Intraday/DbInternal/ProcessEventLog/db_merge_server_servername-bhs-vm_int_illumon_com/2021-07-07/ProcessEventLog, message=Location has not been truncated, result=FAILED, hasActiveProcessor=true}

rescan

This command instructs the DIS handling table DbInternal.ProcessEventLog to look for new data:

$ sudo -u irisadmin /usr/illumon/latest/bin/dhctl intraday rescan --table DbInternal.ProcessEventLog
...
Re-scan Result: SUCCESS
    DIS: db_dis(DbInternal.ProcessEventLog)
        Result: SUCCESS

This command instructs all DISs to look for new data for all tables:

$ sudo -u irisadmin /usr/illumon/latest/bin/dhctl intraday rescan
...
Re-scan Result: SUCCESS
    DIS: db_dis(all)
        Result: SUCCESS
    DIS: Ingester1(all)
        Result: SUCCESS

Scripting API

This functionality is available any from any Legacy Groovy worker with sufficient permissions.

Truncate and delete intraday partitions

Full control over parameters is provided through a builder, described below. For simplicity, some simple options can be invoked via helper methods. These methods do not allow for dry runs, or selection of which Data Import Servers are included.

import io.deephaven.configuration.IntradayControlImpl

// truncate the location(s) specified by the key
IntradayControlImpl.truncateIntradayPartition(FullTableLocationKey)

// truncate the partitions for this table and column partition value (like --partitions)
IntradayControlImpl.truncateIntradayPartition(namespace, tableName, columnPartitionValue)

// truncate the single partition for this table and partition values (like --singlePartition)
IntradayControlImpl.truncateIntradayPartition(namespace, tableName, internalPartitionValue, columnPartitionValue)

// truncate partition(s) as indicated by the options argument (see Options Builder below)
IntradayControlImpl.truncateIntradayPartition(options)

// delete the location(s) specified by the key
IntradayControlImpl.deleteIntradayPartition(FullTableLocationKey)

// delete the partitions for this table and column partition value (like --partitions)
IntradayControlImpl.deleteIntradayPartition(namespace, tableName, columnPartitionValue)

// delete the single partition for this table and partition values (like --singlePartition)
IntradayControlImpl.deleteIntradayPartition(namespace, tableName, internalPartitionValue, columnPartitionValue)

// delete partition(s) as indicated by the options argument (see Options Builder below)
IntradayControlImpl.deleteIntradayPartition(options)

Tip

You will want to check the results of these commands.

result = IntradayControlImpl.truncateIntradayPartition(...)
println result

println IntradayControlImpl.truncateIntradayPartition(...)

Options Builder

The options builder can be useful when programmatically constructing complex truncate or delete commands.

import io.deephaven.configuration.IntradayControlImpl.Options
import io.deephaven.configuration.IntradayControlImpl
builder = Options.builder()

Modify the builder with the desired options, much like the dhctl command line options.

Set the partition to be deleted:

builder.key("namespace", "tableName", "columnPartition")
builder.key("namespace", "tableName", "internalPartition", "columnPartition")
builder.key(key)

For example:

key = new FullTableLocationKey.AggregateTableLocationKey("DbInternal", "ProcessEventLog",  SYSTEM_INTRADAY, lastBusinessDateNy())
builder.key(key)

builder.key("DbInternal", "ProcessEventLog", lastBusinessDateNy())

Note

Only one key is permitted at this time. You can call one of the key() methods again, but you must call builder.clearKey() in between.

Dry run options

Change the dry run option:

builder.dryRun()     // delegates to builder.dryRun(true)
builder.dryRun(true) // perform a dry run, and do not perform the truncate or delete action
Authentication

Change the authentication (by default, the command will be run using the default authentication of the worker):

builder.authenticate("username", "password")
builder.authenticate("username", "password", "operateAsUser")
builder.authenticate("path_to_keyfile") // keyfile must be readable by the worker
builder.authenticate() // do default authentication, according to process environment and properties
Add a DIS

Add a DIS to the include or exclude list:

builder.include("dis_name")
builder.exclude("dis_name")
Build options

Build the options as configured in the builder. You may call build() multiple times. This allows you to use a builder to check a dry run and then perform the delete, or to change the key in a loop.

opts = builder.build()

You can invoke the truncate or delete methods directly from the builder or options:

result = builder.doTruncate()
result = opts.doTruncate()
println result
result = builder.doDelete()
result = opts.doDelete()
println result

You can pass the options to the Intraday Control tool and check the results:

result = IntradayControlImpl.truncateIntradayPartition(opts)
println result

You can display the contents of the builder or options:

println builder
println opts

Check the results

Truncate and delete commands return result objects containing detailed information about the operation. The intraday operations can be complex, so the result object is also necessarily complex. The DISCommandUtil.ActionResult object has an overall result that indicates success or failure of the operation as a whole. It also contains a map of results for each DIS that was involved in the operation. There is an overall result code for each DIS, and a collection of results for the individual locations that were processed. The code examples below illustrate how to check the results at various levels of detail. These are intended as examples; adjust the code to suit your needs.

Example: execute a truncate dry run, and then execute the truncate only if none of the locations to be truncated are still in use:

import io.deephaven.configuration.IntradayControlImpl
import com.illumon.iris.db.tables.dataimport.logtailer.DISCommandUtil

namespace="DbInternal"
tableName="ProcessEventLog"
date=currentDateNy()

def builder = IntradayControlImpl.Options.builder()
        .key(namespace, tableName, date)
        .dryRun(true)

dryRunResult = builder.doTruncate()

// verify the overall result
okToTruncate = dryRunResult.getSummaryResult() == DISCommandUtil.ActionResult.Result.SUCCESS

// verify all the if any DIS results are success
okToTruncate = okToTruncate &&
               dryRunResult.getDisResultsMap().values().every { disResult -> disResult.getResult() == DISCommandUtil.ActionResult.Result.SUCCESS }

// check for locations being actively tailed
hasActiveProcessor = dryRunResult.getDisResultsMap().values().any { disResult -> disResult.getLocationResults().any { locationResult -> locationResult.hasActiveProcessor() } }
okToTruncate = okToTruncate && !hasActiveProcessor
println okToTruncate

if (okToTruncate) {
    builder.dryRun(false)
    commandResult = builder.doTruncate()
    println commandResult
}

Example: execute a truncate command, and then delete only if the truncate was successful:

import io.deephaven.configuration.IntradayControlImpl
import com.illumon.iris.db.tables.dataimport.logtailer.DISCommandUtil

namespace="DbInternal"
tableName="ProcessEventLog"
date="2025-03-19"

truncateResult = IntradayControlImpl.truncateIntradayPartition(namespace, tableName, date)
printf "Truncate %s.%s partition %s: \n", namespace, tableName, date
println truncateResult

// verify the overall result
if (truncateResult.getSummaryResult() == DISCommandUtil.ActionResult.Result.SUCCESS &&
    truncateResult.getDisResultsMap().size() >= 1 &&
    truncateResult.getDisResultsMap().values().every { disResult -> disResult.getResult() == DISCommandUtil.ActionResult.Result.SUCCESS }) {
    deleteResult = IntradayControlImpl.truncateIntradayPartition(namespace, tableName, date)
    printf "Delete %s.%s partition %s: \n", namespace, tableName, date
    println deleteResult
}

Caveats

  • This method makes a best-effort attempt to delete everything on all appropriate Data Import Servers. This cannot be atomic, so the operation might have only partial success. Make sure you check all the results.
  • The truncated partitions are marked as permanently truncated, and further ingestion of data will be disallowed. This is to prevent confusion if loggers produce new data for the partition, or if tailers have not finished all existing data files.
  • Before logging new data for the truncated partitions, remove any existing data files (bin files), and then delete the partitions with dhctl intraday delete ....
  • It is possible to delete data on one Data Import Server and leave it on another (e.g., a backup). Be extremely careful with this, as it can create confusion.

Rescan tables

This command instructs the DIS handling table DbInternal.ProcessEventLog to look for new data:

import io.deephaven.configuration.IntradayControlImpl

result = IntradayControlImpl.rescanTable("DbInternal", "ProcessEventLog")
println result
...
Re-scan Result: SUCCESS
     DIS: db_dis(all)
        Result: SUCCESS

This command instructs all DISs to look for new data for all tables:

import io.deephaven.configuration.IntradayControlImpl

result = IntradayControlImpl.rescanAll()
println result
...
Re-scan Result: SUCCESS
     DIS: db_dis(all)
        Result: SUCCESS
     DIS: Ingester1(all)
        Result: SUCCESS

dhctl metadata

/usr/illumon/latest/bin/dhctl metadata subcommands can be used to inspect, validate, and update the state of historical metadata indexes. Each subcommand expects a list of one or more namespaces or tables to be specified as either * for everything, Namespace for an entire namespace or Namespace.TableName for a specific table.

dhctl metadata update --help
usage: dhctl metadata update [-h] [-k <arg> | -user <arg>] [-p <arg>] [-pf <arg>] -t <arg>  [-v]

Updates the entire metadata index for the specified namespaces and tables.
 -h,--help               print help for a metadata command
 -k,--key <arg>          specify a private key file to use for authentication
 -p,--password <arg>     specify a password for the given user
 -pf,--pwfile <arg>      specify a file containing the base64 encoded password for the given user
 -t,--table-name <arg>   The table or tables to act upon in Namespace.TableName or Namespace format.
                         A wildcard `*` may be used to select all tables in all system namespaces.
 -user,--user <arg>      specify a user for authentication
 -v,--verbose            print additional logging, progress messages, and full exception text

Example:
Update the metadata for the MarketUs namespace
    dhctl metadata update --table-name MarketUs

List

The dhctl metadata list command processes and produces a listing of the metadata index for the specified tables in either tabular or CSV format if the --file option is specified.

For example:

$ /usr/illumon/latest/bin/dhctl metadata list -v -t LearnDeephaven

LearnDeephaven.StockQuotes: 1 total locations.
LearnDeephaven.EODTrades: 1 total locations.
LearnDeephaven.StockTrades: 5 total locations.
     Namespace|  TableName|ColumnPartition|InternalPartition|                                                          Path|    Format|ColumnVersion|                Size|                 LastModifiedTime
--------------+-----------+---------------+-----------------+--------------------------------------------------------------+----------+-------------+--------------------+---------------------------------
LearnDeephaven|StockQuotes|2017-08-25     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-25/StockQuotes|DEEPHAVEN |            1|             1547437|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|EODTrades  |2017-11-01     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-11-01/EODTrades  |DEEPHAVEN |            1|              656894|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|StockTrades|2017-08-25     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-25/StockTrades|DEEPHAVEN |            1|              576170|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|StockTrades|2017-08-21     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-21/StockTrades|DEEPHAVEN |            1|              703883|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|StockTrades|2017-08-22     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-22/StockTrades|DEEPHAVEN |            1|              674529|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|StockTrades|2017-08-23     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-23/StockTrades|DEEPHAVEN |            1|              598675|1969-12-31T19:00:00.000000000 NY
LearnDeephaven|StockTrades|2017-08-24     |0                |/db/Systems/LearnDeephaven/Partitions/0/2017-08-24/StockTrades|DEEPHAVEN |            1|              605396|1969-12-31T19:00:00.000000000 NY

Validate

The dhctl metadata validate command processes the specified table metadata index and compares it with the state of each location on disk, reporting if there are any discrepancies.

Note

To validate the meta data index, each and every location must be visited and read off of disk. When reading many locations, this can take a long time.

For example:

$ /usr/illumon/latest/bin/dhctl metadata validate -t LearnDeephaven

Validating LearnDeephaven.StockQuotes: 1 total locations.
Validating LearnDeephaven.EODTrades: 1 total locations.
Validating LearnDeephaven.StockTrades: 5 total locations.
LearnDeephaven.StockTrades.P.0.2017-08-21:
	Location size (674529) does not match checkpoint size (703883)

Update

The dhctl metadata update command reads the physical data on disk and produces a new metadata index for each table specified. This command can be used to build a metadata index for a table that never had one or repair the index if problems were found using the validate subcommand.

Note

To update the meta data index, each and every location must be visited and read off of disk. When reading many locations, this can take a long time.

Logging

The dhctl script creates a log file in /var/log/deephaven/misc if the current user has write permission there, and in /tmp if not. The scripting API (IntradayControlImpl) logs to the DbInternal ProcessEventLog table. In both cases, actions are logged to the DbInternal AuditEventLog table.