Data control from scripts

The data control tool (dhctl) is a command-line tool for managing intraday data. The same functionality is available from any Core+ worker with sufficient permissions, in both Groovy and Python.

Truncate and delete intraday partitions

Truncate and delete operations target intraday partitions. Truncating removes the data and marks the partition as permanently truncated. Deleting removes the partition directories entirely; partitions must be truncated before they can be deleted.

Simple usage

The following patterns use default authentication and target all Data Import Servers configured for the table.

Tip

Always check the results of these commands.

Options

For full control over parameters - including dry runs and DIS filtering - pass additional keyword arguments in Python, or use the Options.builder() in Groovy.

Dry run options

DIS include/exclude

Check the results

Truncate, delete, and rescan operations return a DisCommandResult containing detailed information about the operation. The result has a summary status and a map of per-DIS results, each of which may contain per-location details.

Check overall success

Check per-DIS results

Check per-location results

For truncate and delete operations, each DIS result contains location-level detail:

Dry run then truncate

This example executes a dry run, verifies no locations are actively being tailed, and then performs the actual truncate:

Truncate then delete

Rescan tables

A rescan instructs the Data Import Server to look for new data.

Caveats

  • These methods make a best-effort attempt to perform the operation on all appropriate Data Import Servers. This cannot be atomic, so the operation might have only partial success. Make sure you check all the results.
  • Truncated partitions are marked as permanently truncated, meaning no further data can be ingested into them. This prevents confusion if loggers produce new data or if tailers have not yet processed all existing files.
  • Before logging new data for truncated partitions, remove any existing data files (bin files), and then delete the partitions.
  • It is possible to delete data on one Data Import Server and leave it on another (e.g., a backup). Be extremely careful with this, as it can create confusion.