Working with Binary Log Files
Binary Log File Partitions
Real-time Deephaven data ingestion is typically performed by using an application to log data to Deephaven binary log files, and configuring a tailer process to send these logs to one or more Data Import Server processes. Each binary log file is specific to a given namespace, table, internal partition and column partition.
The filename of each binary log is always used to determine the column partition value (usually the date) for the data within that file, and this must be created correctly by the application. It is recommended that the log filenames be in a standard format of <namespace>.<table>.<System or User>.<internal partition>.<column partition>.bin.<additional sortable value>
. An example for a log generated by the Persistent Query Controller follows:
DbInternal.PersistentQueryStateLog.System.PersistentQueryController_vm1.2023-03-21.bin.2023-03-21.150000.000+0000
This consists of:
DbInternal
- The namespace.PersistentQueryStateLog
- The table name.System
- The table's type.PersistentQueryController_vm1
- The internal partition name as generated by the application; in this case, it is the process name followed by a host name.2023-03-21
- The column partition value, indicating the partition into which this log file's data will be loaded.2023-03-21.151353.472+0000
- Further details used to order the log files; in this case, a millisecond-level timestamp and a timezone offset.
Obviously, it is critical that the file name for these binary log files be generated correctly. In particular, the column partition value and following details are dynamic.
Java Loggers
Creating a logger in Java always consists of two steps:
- The logger instance (of the class generated from the schema) is created.
- The logger instance is initialized, indicating how it is to generate binary log file names.
After initialization is complete, the logger can be used to write data.
Several Deephaven Java factories are available to assist in the creation of binary log file names. These factories all assume the creation of a standard Deephaven logger class from a Deephaven schema. These factories are used during the initialization of the generated Iris logger classes and automatically roll over filenames.
All code examples assume that the following static variables are defined:
private static final int BINARYSTORE_WRITER_VERSION = 2;
BINARYSTORE_WRITER_VERSION
defines the version of the Deephaven binary store writer, and should always be 2.
private static final int BINARYSTORE_QUEUE_SIZE = 10000;
BINARYSTORE_QUEUE_SIZE
is a default value to be used for the queueSize parameter in logger initialization (see below).
They also assume a Deephaven Configuration instance is available, using a statement such as the following:
final Configuration configuration = Configuration.getInstance();
Factory Usage
To use factories, create a logger instance and call the initialization method on the created logger, passing in the factory instance. The initialization method is defined as follows:
init(TableWriterFactory tableWriterBuilder, int queueSize)
tableWriterBuilder
- The factory to be used to build the files; see below.queueSize
- Defines the number of entries created during logger initialization; if this number is exceeded, the application may pause while waiting for entries to be freed.
LongLivedProcessBinaryStoreWriterFactory
LongLivedProcessBinaryStoreWriterFactory
can be used to create filenames that automatically roll over from hour to hour, and from day to day. It always uses the time zone of the system on which the logger is running, but v1.20180629 adds support for user time-zone definition when using this factory.
Every hour the factory rolls the binary log files over, creating a new log file with its name based on the server's current date and time. The rollover time and new filename are based on the time at which the data is written to the log. There is currently no logic to determine the column partition value (date) from the logged data.
If a process is logging data continuously, it may log data without a break as the date changes (midnight). In this case, data at the end of the day may be written slightly after the date change, in which case it will go into the next day's partition.
Ideally, to avoid the rollover issue, the factory will be used with an appropriate time zone where the application is not logging data around midnight.
The LongLivedProcessBinaryStoreWriterFactory
constructor is defined as follows:
LongLivedProcessBinaryStoreWriterFactory(String path, int version, logger log)
path
- The complete path to which the logs will be written. This includes the directory, and the filename up to.bin
.version
- The writer version to be used, usually 2.log
- A Deephaven Logger instance to which errors will be logged.
The following example initializes a logger for the PersistentQueryStateLog shown earlier. It uses the configuration.getLogPath()
method to determine the default log directory location:
final PersistentQueryStateLogLogger stateLogLogger = new PersistentQueryStateLogLogger();
stateLogLogger.init(new LongLivedProcessBinaryStoreWriterFactory(
configuration.getLogPath("DbInternal.PersistentQueryStateLog.PersistentQueryController_vm1.bin"),
BINARYSTORE_WRITER_VERSION,
log),
BINARYSTORE_QUEUE_SIZE);
BinaryStoreWriterFactory
The BinaryStoreWriterFactory
can be used to create files that roll over from hour-to-hour, but not from day-to-day. It allows definition of the time zone to be used to determine the filenames, and even the filename format, although changing the filename format is not recommended. (The recommended format is yyyy-MM-dd.HHmmss.SSSZ
, as this will be handled correctly by the tailer's default configuration.)
The BinaryStoreWriterFactory
does not roll over days: the date is always set to the date when the log was initialized.
The BinaryStoreWriterFactory
constructor is defined as follows:
BinaryStoreWriterFactory(String path, DateFormat rollFormat, int version, Logger log)
path
- The complete path to which the logs will be written. This includes the directory, and the filename up to.bin
.rollFormat
- A DateFormat that specifies the format in which the binary log files will be written.version
- The writer version to be used, usually 2.log
- A Deephaven Logger instance to which errors will be logged.
The following example initializes a logger for the PersistentQueryStateLog shown earlier. It uses the configuration.getLogPath()
method to determine the default log directory location, and writes logs using the date value for the Asia/Tokyo time zone:
final PersistentQueryStateLogLogger stateLogLogger = new PersistentQueryStateLogLogger();
final DateFormat rollFormat = new SimpleDateFormat("yyyy-MM-dd.HHmmss.SSSZ");
rollFormat.setTimeZone(TimeZone.getTimeZone("Asia/Tokyo"));
stateLogLogger.init(new BinaryStoreWriterFactory(
configuration.getLogPath("DbInternal.PersistentQueryStateLog.PersistentQueryController_vm1.bin"),
rollFormat,
BINARYSTORE_WRITER_VERSION,
log),
BINARYSTORE_QUEUE_SIZE);
Manual File Creation
Another option for creating binary log file names is for the application to manage them itself. In this case, when the logger is initialized, the filename is specified including the date partition. A BinaryStoreWriterFactory
will be created for the user, and it will roll over based on the system's current time. This should be used with caution as when the time ticks past midnight, the filename timestamps will also reset.
To use this method, a logger is created and initialized directly. The logger's initialization signature is defined as follows:
init(String path, int queueSize)
path
- The full path for this log.queueSize
- Defines the number of entries created during logger initialization; if this number is exceeded, the application may pause while waiting for entries to be freed.
For example:
final PersistentQueryStateLogLogger stateLogLogger = new PersistentQueryStateLogLogger();
stateLogLogger.init(configuration.getLogPath("DbInternal.PersistentQueryStateLog.PersistentQueryController_vm1.bin.2018-05-30"),
BINARYSTORE_QUEUE_SIZE);
C++ Loggers
The Deephaven C++ logger is exposed in the BinaryStoreWriter.hpp
as BinaryStoreWriter. When constructing the log file name, you may pass in a simple file name, which will be used as the file name of your log file and no further rotation or changes are made to the binary log file name.
You may alternatively pass in a rotation interval, expressed in seconds. The file will be rotated on the rotation interval, rounding to the specified value. For example, if you pass in an interval of 900 seconds, then the file is rotated every fifteen minutes on the hour, and at 15, 30, and 45 past the hour. To generate the log file name, the logger concatenates base file name passed in, a dot ("."), and the time the file was created. For the first file, the actual time is used. For subsequent files, the time is rounded to the beginning of the rotation interval. This provides predictable log file names.
The file dates are formatted using the following strftime format, %F-%H%M%S%z
, producing dates of the format YYYY-MM-DD-HHMMSS-TZ
. For example, with a base file name of "trades-out.bin" the logger would produce a file name of the form "trades-out.bin.2018-05-31-082542-0400". Presently, the time zone of the date format will be the time zone of the machine producing the logs. To use alternative time zones for the column partition, the application must format the column partition value internally, and the Deephaven administrator must update the tailerConfig.xml to appropriately parse the file name format containing both the column partition and the local date.