Intraday Binary Log format configuration
The binary log format configuration is used to generate loggers and listeners for the production and consumption of streaming data.
The configuration is contained inside the schema as a LogFormat XML element.
LogFormat element
The LogFormat element has several attributes and child elements.
| Attribute | Meaning | Default | Notes |
|---|---|---|---|
version | The format of the generated log, which must match the format used in the listener. | Required | |
loggerType | The type of logger to generate. | THREAD_SAFE | Must be one of THREAD_SAFE, THREAD_UNSAFE, ENCODERS, or NONE. |
loggerClass | The name of the output class. | Required when loggerType != NONE. | |
loggerInterface | The name of an interface to be implemented by the generated class. | ||
columnPartitionArgument | The name of the argument for passing in the column partition to write to. | ||
timePartitionColumn | The name of the Column used for generating the column partition to write to. | The columnPartitionArgument and timePartitionColumn attributes are mutually exclusive. If either of these attributes is specified, then the logger uses dynamic partitions. If neither of these attributes is specified, then the logger does not manage column partitions, leaving that up to the buffer writer used to initialize it. | |
includeRowFlags | Include row flags in the log method definition. If not included, all rows are logged with the single row flag. | false | |
argumentOrder | Whether generated methods will use the column order specified in the Table element or the column order specified in the LogFormat element (followed by the remaining columns in the Table element). | table | Must be one of table or logger. |
maxEntrySize | The maximum size for a single entry. | 1 MiB | Must be at most configuration property BinaryStoreMaxEntrySize. Default inherited as the configuration property BinaryStoreMaxEntrySize. |
bufferSize | The buffer size each Logger will maintain. | 2 MiB | Must be at least 2x maxEntrySize. Default inherited as 2x the configuration property BinaryStoreMaxEntrySize. |
maxHeaderSize | The maximum size for a header entry. | 4 KiB | |
listenerType | The type of listener to generate. | DEFAULT | Must be one of DEFAULT or NONE. loggerType and listenerType can't both be NONE. |
LogFormat loggerType attribute
loggerType | Meaning |
|---|---|
THREAD_SAFE (default) | Generates a thread-safe io.deephaven.enterprise.binlog.buffered.Logger with respect to the log, flush, and close methods. |
THREAD_UNSAFE | Generates a thread-unsafe io.deephaven.enterprise.binlog.buffered.Logger with respect to the log, flush, and close methods. This is useful in performance critical code where the logger is used by a single thread, or otherwise has externally managed thread safety. |
ENCODERS | Generates only the low-level encoders necessary to encode into the Deephaven binary log format (no log, flush, nor close methods will be generated). |
NONE | No loggerClass will be generated at all. |
LogFormat listenerType attribute
In addition to loggers, the new factory can generate the corresponding listener within a Data Import Server. Using a
LogFormat block for both logger and listener simplifies schema setup. The following attributes control listener generation:
listenerType | Meaning |
|---|---|
DEFAULT (default) | The default - this LogFormat block will be used for listener generation. |
NONE | No listener will be generated from this LogFormat block. |
For each log format there must be an unambiguous listener. If an old-style LoggerListener or Listener block is
present with the desired logFormat, that block is used. If no old-style block is present, then a suitable new style
block is used if it has the desired format. In contrast to IntradayLoggerFactory generated loggers, the version
attribute is required. This eliminates any ambiguity in which Listener or LogFormat element should be used to process
a file. When the input log format is zero, the old or new style block with the highest logFormat or version is used. If two
LogFormat blocks have the same version, then an error is thrown. To allow multiple logger definitions, the
listenerType="NONE" settings prevents a LogFormat block from being considered for listener generation. To preserve legacy
log formats, you may tag a LogFormat block as loggerType="NONE", which allows you to read old formats without the need
to generate a logger.
LogFormat Column element
You do not need to specify elements for columns. The data types written to the log are automatically the same as the data types in the schema definition. To provide additional control over logger generation, you may add a Column element with various attributes.
| Attribute | Meaning | Default |
|---|---|---|
name | The name of the column the attributes apply to. | Required |
constant | The column is a constant value stored in the header. | false |
ignore | The column is not written to the log. | false |
inputType | The type of input parameter to the log method for this column. This is only valid for temporal column types, with possible values long, java.time.Instant, java.time.ZonedDateTime, com.illumon.iris.db.tables.utils.DBDateTime, or DateTime. | - |
source | The ObjectInput that this column is derived from. | - |
logPrecision | The precision of the timestamp written to the log. May be "seconds", millis", "micros" or "nanos". Nanos is preferred for newly defined binary logs, but existing logs may use millis or micros. | nanos |
argumentPrecision | The precision of a long timestamp argument to the log method; this may be "seconds", millis", "micros" or "nanos". | nanos |
objectCodec | The name of the ObjectCodec that should be used for this column. This is required for object columns that are not String or temporal types. | - |
objectCodecArguments | The arguments for the the ObjectCodec that should be used for this column. | - |
maxLoggedSize | The maximum size, in bytes, that can be written to the log file for this column. This is valid for Strings and Blob columns that use a codec. When present, if an encoded value exceeds the limit the generated logger throws an IOException. | - |
renamedFrom | If this column (that exists in the schema) was renamed, the old name that should be used for function arguments in the log files. | - |
deleted | Was the column deleted from the schema? | false |
deletedDataType | If the column was deleted, then the dataType of the column as it previously existed. Required for deleted columns, invalid for not deleted columns. | - |
instrumentation | The type of instrumentation for this column. Must be one of TailerTxTime, DisRxTime, and RowSize. | - |
stringStrategy | The strategy used to encoding String values. This is valid for String columns. Must be one of BYTES or ENCODER. | BYTES |
LogFormat ObjectInput Element
You may also provide ObjectInput child elements that define parameters to the
log method. This makes it simpler to log many fields from a single object. The ObjectInfo element supports the following attributes:
| Attribute | Meaning | Default |
|---|---|---|
name | The name of the log method parameter, referenced in the Column source attribute. | - |
type | The type of the object, which must be available to the factory when generating the log. | - |
mixin | An additional type that can be used to provide annotations for determining the correct getters. | - |
nullable | The object passed to log may be null, in which case the log method fills in all columns derived from this object with null values. | false |
LogFormat ImportState Element
Listeners can have an additional child element of ImportState, which has the following attribute:
| Attribute | Meaning | Default |
|---|---|---|
importStateType | The class name of the import state object. | - |
The ImportState element contains child elements named Column, each with a name attribute. These columns are passed to the import state onNewRow call.
Examples
This example is from the Deephaven ProcessEventLog schema. The logger requires
a "Date" input to determine the partition to write to, and uses constant values
for many fields that do not change per worker. These fields are written once
to the header. Only the remaining columns are necessary to pass into each
log method call. The Timestamp column has additional attributes to determine the
type of input to the log method and the precision of the method input and
log output.
The generated log method has the following signature:
The generated static of method for construction:
The generated static header method to help with writer construction:
Here is an example from the persistent query state log that uses ObjectInputs. Some fields are derived from the "config" parameter, which uses a mixin for annotations. The PersistentQueryState object supplies most values. ControllerHost and Timestamp are provided as primitive inputs to the log method.
The use of ObjectInputs simplifies calling the log method, which only requires four parameters:
The generated static of method for construction takes an additional DateTimeFormatter due to the timePartitionColumn:
Because there are no constant fields, the header method takes no fields:
ObjectInput Search Rules and Annotations
If a column is derived from an ObjectInput, then the factory automatically selects the most appropriate method or field
from the source object. The io.deephaven.enterprise.binlog.annotations.LogColumn annotation can be added to a field
or method to indicate that the named column should be derived from that field or method.
For a method to be eligible for matching, it must be public, and have no parameters. Fields must be public. Priority is given to:
- Annotated methods or fields. You may only have one annotated method or field for a given name.
- Methods with the same name as the field.
- Methods that are named "get" followed by the field name. For booleans, methods that are named "is" or "has" followed by the field name.
If more than one item from the highest priority category matches, then the result is ambiguous and the code must be fixed. If the class is ambiguous or the default matching rules do not meet your requirements, then you should use an annotation in the class or mix-in to unambiguously define the field or method use.
An annotation can be provided as follows, which logs the ScriptLoaderState column using the result of the getScriptLoaderStateJson method.
Instead of annotating the input type (e.g., because it is a third party type, or different loggers map its getters differently,
or multiple input types share the same pattern), you may create an abstract class or interface with the @LogColumn annotations.
The generator scans the mixin type for annotations, associates column names to method or field names, and applies them as if the
input type itself was annotated. The mixin definition does not even require a dependency on the input type.
Casing
Casing is ignored when determining accessor candidates, so the accessor below is automatically processed as an accessor candidate for column "Price", i.e. without a LogColumn annotation.
Also, if both a Price and price accessor are present, logger generation fails and the ambiguity is reported, requiring you to e.g. specify a LogColumn annotation, which leads to a safer result. For example:
Generating a Logger
You may generate a logger from the schema using the dhctl tool's logger subcommand. In this example, the logger
for the internal PersistentQueryStateLog is written to ~/code/project/src/main/java/io/deephaven/binlog/internal/gen/V2PersistentQueryStateLogger.java:
When a logger defines a loggerInterface attribute and that interface is on the class path, logger generation validates
that the generated logger implements that interface. The --interface-validation argument allows the caller to configure interface validation.
The generated V2 logger has very few Deephaven dependencies, and can operate in a Java 8 or higher environment (the Deephaven server requires Java 17). To use the logger, you should include the "support" and "channels" dependencies from Deephaven's "iris" group.