Data Tailer Quickstart
The Deephaven Data Tailer plays a crucial role in the live data pipeline by ingesting data from continuously appended binary log files or CSV files directly into a Deephaven DIS.
In a Deephaven cluster, every machine has a tailer, which is sufficient for most ingestion use cases. When data is generated and stored on an external system, you might want to run a tailer on that system to send data directly to the appropriate Data Import Servers.
This guide walks through the process of creating a custom tailer to stream data from an external Linux system to a Deephaven cluster.
This Tailer has the following characteristics:
- It runs on a Linux server that is remote from the Deephaven cluster.
- The process name is
customRemoteTailer
. - The working directory is
/home/mytaileraccount/deephaven/
. - The binlogs to tail are in
/home/mytaileraccount/deephaven/binlogs/
. - The Deephaven cluster's web server is hosted on
https://my-deephaven-host.com
. - The Tailer process is owned by the user
mytaileraccount
on the remote machine.
Prerequisites
Before you begin, ensure:
- You have access to or can create the Tailer user account on the remote machine.
- No firewall rules are blocking the process ports required to connect to the DIS(s) and other services in the Deephaven cluster.
- There is a Java JDK on the remote system that matches the Java version of the Deephaven instance.
Note
The Tailer will stream data to one or more DIS(s) according to the data routing configuration. The claims and filters determine which DIS(s) will receive the data for a given table.
Steps to configure a data tailer
Step 1: Install or Verify Java
Ensure a JDK is installed on the remote system with a version that matches the Java version running Deephaven. Java will print its version with this command:
java -version
Step 2: Get artifacts from your Deephaven installation.
Follow the steps in this guide to download the Deephaven Updater and set up a working directory with the relevant libraries to run the Deephaven Tailer.
The commands below should be sufficient:
# Run as mytaileraccount
cd /home/mytaileraccount
mkdir deephaven
cd deephaven
curl -O https://my-deephaven-host.com:8443/launcher/deephaven-launcher-9.09.1.tgz
tar -xvf ./deephaven-launcher-9.09.1.tgz
./DeephavenLauncher/DeephavenUpdater.sh myDhInstance https://my-deephaven-host.com:8443/iris/
The DeephavenUpdater.sh script, included in the startup script, should be run every time the Tailer starts up to ensure it is using the latest artifacts.
Note
If you are unable to connect to the web service, you might need to import the Deephaven certificates into the Java or system trust store.
Alternatively, you can add the --insecure
flag to the end of the Deephaven Updater command to bypass certificate errors and download the required security files. This is a security risk and should only be done after you check with your system administrator.
Step 3: Configure the tailer with an XML file
The following XML will configure a Tailer to tail binlogs from /home/mytaileraccount/deephaven/binlogs/
. Name this file tailer.xml
and place it in /home/mytaileraccount/deephaven/
.
<Processes>
<Process name="customRemoteTailer">
<Log fileManager="com.illumon.iris.logfilemanager.StandardBinaryLogFileManager"
logDirectories="/home/mytaileraccount/deephaven/binlogs/"
/>
</Process>
</Processes>
name
: A unique name for this process.fileManager
: How log files are read and processed;StandardBinaryLogFileManager
is used for binary logs.logDirectories
: Comma-separated list of binlog locations to tail.
For more information on the Tailer XML configuration, see the Data Tailer guide.
Step 4: Create a bash script to start the Tailer
The following example script assumes the working directory is /home/mytaileraccount/deephaven
and the instance name is myDhInstance
as in the example commands above.
#!/bin/bash
instance_root=/home/mytaileraccount/deephaven/.programfiles/myDhInstance
workspace_root=/home/mytaileraccount/deephaven/workspaces/myDhInstance
# Update any deephaven artifacts
./DeephavenLauncher/DeephavenUpdater.sh myDhInstance
# Run the tailer
java -cp "${instance_root}/myDhInstance/resources":"${instance_root}/java_lib/*":"/home/mytaileraccount/deephaven" \
-server -Xmx1024m -DConfiguration.rootFile=iris-common.prop \
--add-opens java.base/jdk.internal.access=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens java.base/jdk.internal.misc=ALL-UNNAMED \
-Dworkspace="/home/mytaileraccount/deephaven/workspaces/myDhInstance/" \
-Dlog.tailer.configs=tailer.xml \
-Dlog.tailer.defaultDirectories="/home/mytaileraccount/deephaven/binlogs/" \
-Ddevroot="${instance_root}/java_lib/" \
-DlogDir="/home/mytaileraccount/deephaven/tailer_logs/" \
-DpidFileDirectory="/home/mytaileraccount/deephaven/pids/" \
-Dprocess.name=customProcessTailer \
-Dlog.tailer.processes=customRemoteTailer \
-Ddh.config.client.bootstrap=${instance_root}/dh-config/clients \
-Dintraday.tailerID=1 \
com.illumon.iris.logtailer.LogtailerMain
The Data Tailer configuration guide describes additional properties related to the tailer, along with more information about creating the Tailer startup script.
Warning
The default value of property defaultDirectories
causes every Tailer to monitor /var/log/deephaven/binlogs
, /var/log/deephaven/binlogs/pel
, and /var/log/deephaven/binlogs/perflogs
. To change this behavior, override the property by adding -Dlog.tailer.defaultDirectories=/your/custom/path
to your Tailer startup script.
If the default directories do not exist, the Tailer will fail to start. This override is included in the example above.
Ensure the required directories exist:
# Run as mytaileraccount
cd ~/deephaven
mkdir tailer_logs binlogs pids
Step 5: Start the Tailer and verify it is working.
Start the Tailer as the mytaileraccount
user: ./tailer-start.sh
.
If following this guide, the logs can be found in /home/mytaileraccount/deephaven/tailer_logs/
.
Look for messages similar to these to confirm success:
[2025-07-03T10:41:20.298014-0400] - INFO - Starting directory watch service for /home/mytaileraccount/deephaven/binlogs
[2025-07-03T10:41:20.310186-0400] - INFO - Tailer looking for files to tail...
[2025-07-03T10:41:20.310301-0400] - INFO - LogtailerMain1 RUNNING
If you see those log lines, your Tailer is ready to stream binary logs!
Step 6: Manage the Tailer Service
Use cronjobs, systemd, or any scheduling software to manage your Tailer.
The example below configures systemd
to manage the Tailer.
Create the service file, sudo vi /etc/systemd/system/mytailer.service
:
[Unit]
Description=Deephaven Tailer 1
After=network-online.target
Wants=network-online.target systemd-networkd-wait-online.service
StartLimitIntervalSec=100
StartLimitBurst=5
[Service]
User=mytaileraccount
ExecStart=/home/mytaileraccount/deephaven/tailer-start.sh
Restart=on-failure
RestartSec=10s
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl enable mytailer.service
sudo systemctl start mytailer.service