Data Tailer runbook
The Data Tailer is a Deephaven data management service that monitors binary log files on the filesystem and streams changes to the Data Import Server (DIS) for ingestion. Tailers detect new data in binary log files as they're written and forward that data for storage and distribution.
Impact of Data Tailer failure
| Level | Impact |
|---|---|
| Sev 2 - Moderate | Users will not be directly affected, but new data from monitored binary logs will not be ingested until the tailer is restored. Data ingestion will resume from checkpoints when the tailer restarts. |
Note
Tailer failures cause ingestion delays but not data loss, provided the source binary log files are retained.
Data Tailer purpose
The Data Tailer acts as the bridge between binary log file producers and the Data Import Server:
Data flow:
- External system writes binary log files to filesystem.
- Tailer monitors configured directories for new/modified files.
- Tailer reads new data from files.
- Tailer streams data to DIS via gRPC.
- DIS persists data and serves to clients.
Data Tailer types and deployment
Standard deployment: One tailer per host (tailer1)
Multiple tailers: Can run multiple tailers on the same machine:
tailer1,tailer2, etc.- Each monitors different directories or file patterns.
- Configured through separate monit configuration files.
Remote tailers: Tailers can run on machines outside the Deephaven cluster:
- Useful for geographically distributed data sources.
- Reduces network traffic to data centers.
- Requires network connectivity to DIS.
Data Tailer dependencies
The Data Tailer requires:
- Data Import Server — Must be running and accessible to receive data.
- Filesystem access — Must have read access to binary log file directories.
- Binary log files — Must be written in the expected format.
- Network connectivity — Must be able to connect to DIS on configured port.
The tailer does not require a Configuration Server or Authentication Server, making it more resilient to cluster issues.
Checking Data Tailer status
Check process is running with monit:
For additional tailers:
Expected output should show status Running.
Viewing Data Tailer logs
View application log:
Tail the log to follow in real time:
List historical log files:
View process stdout/stderr logs:
Restart procedure
Restart tailer1:
Impact: Restarting a tailer temporarily interrupts data flow from that tailer to the DIS. The tailer resumes from its last checkpoint position when restarted.
Verify the restart was successful:
Monitor the log during startup:
Expected startup messages:
- Successfully connected to DIS.
- Loaded checkpoint positions for monitored files.
- Resumed tailing from checkpoint.
- Data streaming to DIS.
Configuring the Data Tailer
Tailer configuration is managed through:
- Binary log schema — Defines file format and structure.
- Routing configuration — Specifies which tailer handles which files.
- Tailer properties — Performance tuning parameters.
Key configuration properties
Tailer properties use the log.tailer.* prefix and are set in Deephaven property files. DIS routing is configured separately in the data routing YAML configuration.
See Data tailer configuration for the full property reference, including log.tailer.defaultDirectories, log.tailer.additionalDirectories, log.tailer.startupLookbackTime, and others.
Binary log file requirements
For a tailer to read files, the files must:
- Match configured pattern — Filename matches the pattern in the tailer's XML configuration.
- Be in monitored directory — Located in a directory listed in
log.tailer.defaultDirectoriesorlog.tailer.additionalDirectories. - Have correct format — Match the binary log schema definition.
- Be readable — Tailer user (typically
irisadmin) has read permissions. - Be complete — Ideally written atomically or with proper locking.
Checkpoint and recovery
The Data Tailer maintains checkpoints to track its read position in each file:
Checkpoint storage: Checkpoints are typically stored in:
- Filesystem (in the same directory as the data).
- DIS (as part of the ingestion state).
Recovery behavior:
When a tailer restarts, it:
- Loads checkpoint positions for all monitored files.
- Resumes reading from checkpoint position.
- Sends data to DIS (which may detect duplicates).
- Updates checkpoints as data is successfully ingested.
Duplicate handling: DIS typically has deduplication logic to handle re-transmitted data.
Running multiple tailers
To run multiple tailers on the same host:
- Create additional monit configuration files:
-
Edit
tailer2.confto use different:- Process name:
tailer2. - Log directory:
/var/log/deephaven/tailer2/. - Configuration file:
tailer2.prop. - Monitored directories.
- Process name:
-
Reload monit and start the new tailer:
Remote tailer deployment
Tailers can run on remote machines outside the Deephaven cluster:
Use cases:
- Data sources in different data centers.
- Edge locations with poor network connectivity.
- Security zones that restrict access.
Requirements:
- Network connectivity to DIS.
- Binary log files available locally.
- Tailer software installed.
Configuration:
- Configure DIS routing to point to the remote DIS host in the data routing YAML configuration.
- Ensure firewall allows tailer → DIS connections.
- Use appropriate authentication if required.
Tailer performance tuning
Performance tuning uses log.tailer.* properties and routing-level throttle settings. See Data tailer configuration for the full property reference, including bandwidth throttling via log.tailer.bucketCapacitySeconds and file cleanup via log.tailer.fileCleanup.* properties.
Configuration files and locations
monit configuration: /etc/sysconfig/illumon.d/monit/tailer1.conf
Property files:
/etc/sysconfig/illumon.d/resources/tailer1.prop- Binary log schema (managed through
dhconfig)
Log directory: /var/log/deephaven/tailer/
Checkpoint storage: Varies by configuration (filesystem or DIS)