Import Binary Log Files

Deephaven provides a tool for importing binary log files directly into Deephaven tables. Binary logs are normally processed by a tailer in real time and ingested by a Data Import Server. The typical use for this tool would be the (re)import of old logs if an error occurred during the initial ingestion or the original table data is lost.

Quickstart

The following is an example of how to import a specific binary log file. This command assumes a typical Deephaven installation and a log file located at /var/log/deephaven/binlogs/.

sudo -u dbmerge /usr/illumon/latest/bin/iris_exec binary_import -- \
-ns ExNamespace -tn ExTableName \
-sf ExNamespace.ExTableName.System.hostname.2018-10-01.bin.2018-10-01.180353.011-0400 \
-dp localhost/2018-10-01 -om REPLACE

This example will import the given log file into localhost/2018-10-01, replacing whatever data exists in that location already, if any. Note that unlike with other imports, the source file(s) must be located relative to the dbmerge user logs directory.

Schema Inference

There is no schema inference tool for Binary imports - it is anticipated that a table schema will already exist consistent with the binary log files to be loaded.

Import from Command Line

Binary imports can be performed directly from the command line, using the iris_exec tool.

Command Reference

iris_exec binary_import <launch args> -- <binary import args>

Binary Import Arguments

ArgumentDescription
  • -dd or --destinationDirectory <path>
  • -dp or --destinationPartition <internal partition name / partitioning value> | <internal partition name>
  • -pcor--intradayPartition <partition column name>
Either a destination directory, specific partition, or internal partition plus a partition column must be provided. A directory can be used to write a new set of table files to specific location on disk, where they can later be read with TableTools. A destination partition is used to write to intraday locations for existing tables. The internal partition value is used to separate data on disk; it does not need to be unique for a table. The name of the import server is a common value for this. The partitioning value is a string data value used to populate the partitioning column in the table during the import. This value must be unique within the table. In summary, there are three ways to specify destination table partition(s):
  1. Destination directory (e.g., -dd /db/Intraday/<namespace>/<table>/localhost/<date>/<table/)
  2. Internal partition and destination partition (e.g., -dp localhost/2018-01-01)
  3. Internal partition and partition column - for multi-partition import (e.g., -dp localhost -pc Date)
-ns or --namespace <namespace>(Required) Namespace in which to find the target table.
-tn or --tableName <name>(Required) Name of the target table.
-om or --outputMode <import behavior>(Optional):
  • SAFE (default)- SAFE checks whether the target table already contains data matching the provided partitioning value; if it does, the import is aborted.
  • When developing an import process, REPLACE should be used, because failed import attempts will often write some data to the table, causing the next attempt with SAFE to abort.
  • APPEND should normally be used only when you are running multiple imports to load a set of data to the same table at one time, possibly from multiple different sources, and the resultant data needs to be kept together as part of one logical partition.
  • -sd or --sourceDirectory <path>
  • -sf or --sourceFile <exact file name>
  • -sg or --sourceGlob <file name pattern>
sourceDirectory, sourceFile, and sourceGlob are all optional. If none of these are provided, the system will attempt to do a multi-file import. Otherwise, sourceDirectory will be used in conjunction with sourceFile or sourceGlob. If sourceDirectory is not provided, but sourceFile is, then sourceFile will be used as a fully qualified file name. If sourceDirectory is not provided, but sourceGlob is, then sourceDirectory will default to the configured log file directory from the prop file being used.