Importing streaming data

While batch data is imported in large chunks, ticking data (or streaming data) is appended to an intraday table on a row-by-row basis as it arrives. Ticking data is also different from batch data in how Deephaven handles it. Normally, batch data that is imported to a partition with today's date is not available to users. The expectation is that such data will first be merged and validated before users should access it. Ticking data, on the other hand, is immediately delivered to users as new rows arrive. The quality of this newest data may be lower than that of historical data but having access to information with low latency is generally more important for ticking sources.

Another difference between batch and ticking data is that there are no general standards for ticking data formats. CSV, JDBC, JSON, and others provide well-known standards for describing and packaging sets of batch data. However, ticking data formats do not have such accepted standards, so there is always some custom work involved in adding a ticking data source to Deephaven.

Typical steps in adding a streaming data source to Deephaven:

  1. Create the base table schema.
  2. Add a LoggerListener section to the schema.
  3. Deploy the schema.
  4. Generate the logger and listener classes.
  5. Create a logger application that will use the generated logger to send new events to Deephaven.
  6. Edit the host config file to add a new service for the tailer to monitor.
  7. Edit the tailer config file to add entries for the new files to monitor.
  8. Restart the tailer and DIS processes to pick up the new schema and configuration information.

The last three steps above apply only if deploying a new tailer. If the standard Deephaven Java logging infrastructure is used (with log file names that include all the details needed for the tailer to determine its destination), the existing system tailer should pick up new tables automatically.

Process Overview

In addition to specifying the structure of the data in a table, a schema can include directives that affect the ingestion of live data. This section provides an overview of this process followed by details on how schemas can control it. Example schemas are provided as well.

Streaming data ingestion is a structured Deephaven process where external updates are received and appended to intraday tables. In most cases, this is accomplished by writing the updates into Deephaven's binary log file format as an intermediate step.

img

To understand the flexibility and capabilities of Deephaven table schemas, it is important to understand the key components of the import process. Each live table requires two custom pieces of code, both of which are generated by Deephaven from the schema:

  • a logger, which integrates with the application producing the data.
  • a listener, which integrates with the Deephaven Data Import Server (DIS).

The logger is the code that can take an event record and write it into the Deephaven binary log format, appending it to a log file. The listener is the corresponding piece of code that can read a record in Deephaven binary log format and append it to an intraday table.

The customer application receives data from a source such as a market data provider or application component. The application uses the logger to write that data in a row-oriented binary format. For Java applications or C#, a table-specific logger class is generated by Deephaven based on the schema. Logging from a C++ application uses variadic template arguments and does not require generated code. This data may be written directly to disk files or be sent to an aggregation service that combines it with similar data streams and then writes it to disk.

The next step in streaming data ingestion is a process called the tailer. The tailer monitors the system for new files matching a configured name and location pattern, and then monitors matching files for new data being appended to them. As data is added, it reads the new bytes and sends them to one or more instances of the Data Import Server (DIS). Other than finding files and "tailing" them to the DIS, the tailer does very little processing. It is not concerned with what data is in a file or how that data is formatted. It simply picks up new data and streams it to the DIS.

The Data Import Server receives the data stream from the tailer and converts it to a column-oriented format for storage using the listener that was generated from the schema. A listener will produce data that matches the names and data types of the data columns declared in the "Column" elements of the schema.

Loggers and listeners are both capable of converting the data types of their inputs, as well as calculating or generating values. Additionally, Deephaven supports multiple logger and listener formats for each table. Together, this allows Deephaven to simultaneously ingest multiple real-time data streams from different sources in different formats to the same table.

Logging Data

The Logger Interfaces

The customer application might have "rows" of data to stream in two main formats:

  1. sets of values using basic data types like double or String, and
  2. complex objects that are instances of custom classes.

The generated logger class will provide a log method that will be called each time the application has data to send to Deephaven, with its arguments based on the schema. To create the format for the log method, the logger class will always implement a Java interface. Deephaven provides several generic interfaces based on the number of arguments needed. For instance, if three arguments are needed, by default the Deephaven generic ThreeArgLogger is used. These arguments might be basic types such as double or String, custom class types, or a combination of both.

Deephaven provides generic logger interfaces for up to eight arguments, plus a special MultiArgLogger interface for loggers with more than eight arguments. The MultiArgLogger is more generic than the other interfaces in that the other interfaces will have their arguments typed when the logger code is generated, while the MultiArgLogger will simply take an arbitrarily long list of objects as its arguments. One known limitation of the MultiArgLogger is that it cannot accept generic objects among its arguments. In this case "generic objects" refers to objects other than String or boxed primitive types. The logger interfaces for fixed numbers of arguments do not have this limitation.

In many cases the events to be logged will have a large number of properties. Rather than use the MultiArgLogger there are two other approaches that are preferred: either create a custom logger interface or pass the event as a custom object.

In most cases the custom object is the easier solution, since such an object probably already exists in the API of the data source from which the custom application is receiving data. For example, if the custom application is receiving Twitter tweets as tweet objects, this object type could be added to the schema as a SystemInput, and Tweet properties could be used in the intradaySetters:

<SystemInput name="tweet" type="com.twitter.event.Tweet" />

...

<Column name="text" dataType="String" intradaySetter="tweet.getText()" />

<Column name="deleted" dataType="Boolean" intradaySetter="tweet.getDeleted()" />

This way, the customer application only needs to pass a Tweet object to the log() method instead of having to pass each of the properties of the object.

The other option for logging more than eight properties is to define a custom logger interface. A custom logger interface extends IntradayLogger, and specifies the exact names and types of arguments to be passed to the log() method:

public interface TickDataLogFormat1Interface extends IntradayLogger {
   void log(Row.Flags flags,
      long timestamp,
      String name,
      float price,
      int qty);

default void log(long timestamp,
      String name,
      float price,
       int qty)
   {
    log(DEFAULT_INTRADAY_LOGGER_FLAGS, timestamp, name, price, qty);
   }
}

Note the timestamp argument. This column is often included in Deephaven tables to track when a logger first "saw" a row of data. By default, this is "now" in epoch milliseconds at the time the event was received. If needed, a custom intradaySetter and dbSetter can be specified to use other formats or precisions for this value.

A custom logger interface should specify any exceptions the log() method will throw. For instance, a logger that handles BLOB arguments will need to include throws IOException as part of its log(...) method declarations:

void log(<parameters as above>) throws IOException { ... }

In this case, the "1" in the name (TickDataLogFormat1Interface) is to denote this is the first version of this interface, which is helpful if we later need to revise it. This is convention and recommended, rather than an enforced requirement in naming logger interfaces.

Log Formats

Each logger and listener corresponds to a distinct version number, which is specified in a logFormat attribute when defining a logger or listener in the schema. A listener will only accept data that has been logged with a matching column set and version number.

If a log format version is not specified, it will default to "0".

Listeners in Deephaven Schemas

Listeners are defined in a <Listener> element.

A listener element has the following attributes:

  • logFormat - Optional – specifies the listener's version number; defaults to 0 if this attribute is not present. If multiple listeners are defined, then it is required.
  • listenerPackage- Optional – specifies the Java package for the generated code, defaults to the value of the SchemaConfig.defaultListenerPackagePrefix configuration property with the namespace lowercased and appended; e.g., com.prefix.testnamespace.
  • listenerClass - Optional – specifies the Java class name for the generated code, defaults to a value that includes the table name and log format version (if non-zero); e.g., TestTableListener or TestTableFormat2Listener. The class name must include the word "Listener".

Listener Code Generation Elements

Listener code generation is supplemented by three elements:

  • <ListenerImports> - Optional – specifies a list of extra Java import statements for the generated listener class.
  • <ListenerFields> - Optional – specifies a free-form block of code used to declare and initialize instance or static members of the generated listener class.
  • <ImportState> - Optional – specifies a state object used for producing validation inputs, with two attributes:
    • importStateType - Required - a full class name, must implement the com.illumon.iris.db.tables.dataimport.logtailer.ImportStateinterface.
    • stateUpdateCall - required - a code fragment used to update the import state per row which may reference column names or fields; e.g., newRow(Bravo).

Listener <Column> Elements

Each <Listener> element contains an ordered list of <Column> elements. The <Column> elements declare both the columns expected in the data from the logger and the columns to write to the table. The <Column> elements for a listener support the following attributes:

  • name - Required – The name of the column. A column of this name does not necessarily need to exist in the table itself.
  • dbSetter - Optional, unless intradayType is none – A Java expression to produce the value to be used in this column. This can customize how a raw value from the binary log file is interpreted when writing into a Deephaven column. The expression may use any fields or column names as variables, except columns for which intradayType is none.
  • intradayType - Optional (defaults to dataType) – The data type of this column as written by the logger. Use none if a column is present in the table but not in the logger's output - if this is the case, a dbSetter attribute is required. This attribute is only required when the logger uses a different data type than the table itself.

DBDateTime

One special case is the DateTime or DBDateTime data type. (DateTime is an alias for Deephaven's DBDateTime type, which is a zoned datetime with nanosecond precision.) It is expressed internally as nanoseconds from epoch. However, by default, listener code will assume that a DBDateTime is logged as a long value in milliseconds from epoch. If the value provided by the logger is something other than milliseconds from epoch, a custom setter must be specified in the dbSetter attribute.

For example:

dbSetter="com.illumon.iris.db.tables.utils.DBTimeUtils.nanosToTime(LoggedTimeNanos)"

In this case, the logging application is providing a long value of nanoseconds from epoch using the column name of LoggedTimeNanos.

The following is an example listener for the example table defined above:

<Listener logFormat="1" listenerPackage="com.illumon.iris.test.gen" listenerClass="TestTableFormat1Listener">
    <ListenerImports>
import com.illumon.iris.db.tables.libs.StringSet;
    </ListenerImports>
<ListenerFields>
private final String echoValue = "reverberating";
private final StringSet echoValueStringSet = new com.illumon.iris.db.tables.StringSetArrayWrapper(echoValue);
    </ListenerFields>
    <ImportState importStateType="com.illumon.iris.db.tables.dataimport.logtailer.ImportStateRowCounter" stateUpdateCall="newRow()" />
    <Column name="Alpha"/>
    <Column name="Bravo" />
    <Column name="Charlie"/>
    <Column name="Delta" dbSetter="5.0" intradayType="none" />
    <Column name="Echo" dbSetter="echoValueStringSet" intradayType="none" />
</Listener>

Loggers in Deephaven Schemas

Loggers are defined in a <Logger> element, only required when a Java or C# logger is needed. A logger element has the following attributes:

  • logFormat - Required – specifies the logger's version number.
  • loggerPackage - Required – specifies the Java package for the generated code.
  • loggerClass - Optional – specifies the Java class name for the generated code; defaults to a value that includes the table name and log format version (if non-zero); e.g., TestTableLogger, TestTableFormat2Logger. If specified, the value (class name) must include "Logger" somewhere within its definition.
  • loggerInterface - Optional – specifies a Java interface that the generated logger class will implement. Defaults to a generified interface based on the number of system input parameters; e.g., com.illumon.intradaylogger.FourArgLogger. The class name must include the word "Logger".
  • loggerInterfaceGeneric - Optional - the use of this attribute is deprecated.
  • loggerLanguage - Optional - specifies the logger language. If not specified, a default value of JAVA will be used. Supported values are:
    • JAVA
    • CSHARP or C#
  • tableLogger - Optional - if specified as true this indicates that a logger should be generated that can write data directly to a Deephaven table.
  • verifyChecksum - Optional - if specified as false then the logger loaded by a logging application will not be checked against the latest logger generated by the schema. This configuration is not recommended.

Logger Code Generation Elements

Logger code generation is supplemented by the following elements:

  • <LoggerImports> - Optional – specifies a list of extra Java import statements for the generated logger class.
  • <LoggerFields> - Optional – specifies a free-form block of code used to declare and initialize instance or static members of the generated logger class.
  • <SystemInput> - Required unless the loggerInterface attribute was specified. Declares an argument that the Logger's log() method will take, with two attributes:
    • name - required – the name of the argument, available as a variable to intradaySetter expressions.
    • type - required – the type for the argument; any Java type including those defined in customer code.
  • <ExtraMethods> - Optional - if specified, these are extra methods added to the generated logger class.
  • <SetterFields> - Optional - if specified, these are extra fields (variables) added to the setter classes within the generated logger.

The application should call the log() method once for each row of data, passing the data for the row as arguments.

Logger <Column> Elements

The <Column> elements declare the columns contained in the logger's output. A logger's column set typically matches the destination table's column set, but this is not a requirement, as the listener can convert the data from the logger's format to the columns required for the table. Each column in the table's Column set must exist in the Logger Column elements.

The <Column> element for a logger supports the following attributes:

  • name - Required – The name of the column in the logger's output. A corresponding <Column> element with the same name must exist in the listener that corresponds to the logger's logFormat version.
  • intradayType - The data type for values in the log, which may be different from the dataType for the Column. If intradayType is not specified, then the column's dataType is used. An intradayType of none indicates that this column should not be included in the log.
  • intradaySetter - Optional – A Java expression to produce the value to be used in this column. This can customize how a value from the logging application should be interpreted or converted before writing it into the binary log file. The expression may use the names of any <SystemInput> elements as variables and perform valid Java operations on these variables. If intradaySetter is not specified, then the column's name is used.
  • datePartitionInput - This optional attribute specifies that if the logger is Java, the column's value can be used to automatically calculate the column partition value for every logged row. The column's time precision must be specified in the attribute. Available values include:
    • seconds
    • millis
    • micros
    • nanos To use this option the logging program must initialize the logger with an appropriate time zone name. See the Dynamic Logging Example.
  • functionPartitionInput - If set to true, this optional attribute specifies that if the logger is Java, the column's value can be used to calculate the column partition value for every logged row by using a lambda provided in the logger initialization. To use this option the logging program must initialize the logger with an appropriate lambda. See the Dynamic Logging Example.
  • partitionSetter - Optional – This attribute is only valid for the partitioning column. It specifies that the value will be specified by a dynamic-partitioning function using a defined system input specified by the directSetter attribute. Without this attribute, the value passed to a functionPartitionInput is taken from a stored column, but this allows it to be passed to the function without linking it to another column's data. For example, if DataType was the partitioning column, a system input with the name dataType could be passed to the dynamic partition function as follows:
<SystemInput name="dataType" type="String" />
<Column name="DataType"
functionPartitionInput="true"
directSetter="dataType" />

If a logger depends on a customer's classes, those classes must be present in the classpath both for logger generation (as it is required for compilation) and at runtime.

If a logger uses column partition values that are calculated for each row, it will only allow a limited number of open partitions at any time. If this number is exceeded, the least-recently-used partition will be closed before the new one is opened. If a large number of partitions are expected to be written to at any time, the default number of open partitions may not be sufficient, as quick rollover of partitions will cause the creation of a lot of new files or network connections. These values can be configured with the following properties:

  • MultiPartitionFileManager.maxOpenColumnPartitions - this is for loggers writing directly to binary log files. The default value is 10.
  • logAggregatorService.maxOpenColumnPartitions - this is for loggers writing through the Log Aggregator Service. The default value is 10.

The following is an example logger to go with the example listener:

<Logger logFormat="1" loggerPackage="com.illumon.iris.test.gen" loggerClass="BarTestLogger" loggerInterface="com.illumon.iris.test.ABCLogger">
   <LoggerImports>
   import com.abc.xyz.Helper;
   </LoggerImports>
   <LoggerFields>
   private final Helper helper = new Helper();
   </LoggerFields>
   <SystemInput name="Alpha" type="String" />
   <SystemInput name="Bravo" type="int" />
   <SystemInput name="CharlieSource" type="String" />
   <Column name="Alpha" dataType="String" />
   <!-- The BarTestLogger will perform different input transformations than the TestLogger. -->
   <Column name="Bravo" dataType="int" intradaySetter="Bravo + 1"/>
   <Column name="Charlie" dataType="double" intradaySetter="helper.derive(Double.parseDouble(CharlieSource) * 2.0)" directSetter="matchIntraday" />
   <!-- This column exists in the schema, but not in the V1 log.  Missing columns are not allowed; therefore it must have an intradayType of none. -->
   <Column name="Delta" dataType="double" intradayType="none" />
   <Column name="Echo" dataType="StringSet" intradayType="none" />
</Logger>

Combined Definition of Loggers and Listeners

It is possible to declare a Logger and Listener simultaneously in a <LoggerListener> element. A <LoggerListener> element requires both a listenerPackage attribute (assuming the default listener package name is not to be used) and a loggerPackage attribute. The <Column> elements declared under the <LoggerListener> element will be used for both the logger and the listener. This is useful as it avoids repetition of the <Column> elements.

An example of a <LoggerListener> declaration for the example table is provided below.

<LoggerListener listenerClass="TestTableListenerBaz" listenerPackage="com.illumon.iris.test.gen" loggerPackage="com.illumon.iris.test.gen" loggerClass="TestTableLoggerBaz" logFormat="2">
   <SystemInput name="Alpha" type="String" />
   <SystemInput name="Bravo" type="int" />
   <SystemInput name="Charlie" type="int" />
   <ListenerImports>
   import com.illumon.iris.db.tables.libs.StringSet;
   </ListenerImports>
   <ListenerFields>
   private final String echoValue = "reverberating";
   private final StringSet echoValueStringSet = new com.illumon.iris.db.tables.StringSetArrayWrapper(echoValue);
   </ListenerFields>
   <Column name="Alpha" dataType="String" />
   <Column name="Bravo" dataType="int" />
   <Column name="Charlie" dataType="double" intradayType="Int" dbSetter="(double)Charlie" directSetter="(double)Charlie" />
   <Column name="Delta" dataType="double" dbSetter="6.0" intradayType="none" />
   <Column name="Echo" dataType="StringSet" dbSetter="echoValueStringSet2" intradayType="none" />
</LoggerListener>

Combined Definitions of Table, Loggers, and Listeners

In past Deephaven releases, some schemas controlled code generation from the <Column> elements declared for the <Table>, without explicitly declaring the columns under a <Logger>, <Listener>, or <LoggerListener> attribute. The attributes normally defined on a <Column> element within <Logger>, <Listener>, or <LoggerListener> block were placed on the same <Column> elements used to define the table itself. While this type of schema will still work, it has been depracated, as it does not allow for multiple logger or listener versions, or for easy migration such as the addition of new columns with application backwards-compatibility.

<Table namespace="TestNamespace" name="TestTable3" storageType="NestedPartitionedOnDisk" loggerPackage="com.illumon.iris.test.gen" listenerPackage="com.illumon.iris.test.gen">
<Partitions keyFormula="__PARTITION_AUTOBALANCE_SINGLE__"/>
<SystemInput name="Alpha" type="String" />
<SystemInput name="Bravo" type="int" />
<SystemInput name="Charlie" type="int" />
<SystemInput name="Foxtrot" type="double" />
<Column name="Timestamp" dbSetter="DBTimeUtils.millisToTime(Timestamp)" dataType="DateTime" columnType="Normal" intradaySetter="System.currentTimeMillis()"/>
<Column name="Alpha" dataType="String" />
<Column name="Bravo" dataType="int" />
<Column name="Charlie" dataType="double" intradaySetter="(double)Charlie + Foxtrot" directSetter="matchIntraday" />
<Column name="Delta" dataType="double" dbSetter="1.5" intradayType="none" />
</Table>

Logger/Listener Generation

Introduction

Streaming data ingestion by Deephaven requires a table structure to receive the data, generated logger classes to format data and write it to binary log files, generated listeners used by Deephaven for ingestion, and processes to receive and translate the data. The main components are:

  • Schemas - XML used to define table structure for various types of tables. These are described in detail in Schemas.
  • Logger - generated class code that will be used by the application to write the data to a binary log, or to the Log Aggregator Service.
  • Log Aggregator Service (LAS) - a Deephaven process that combines binary log entries from several processes and writes them into binary log files. The loggers may instead write directly to log files without using the LAS.
  • Tailer - a Deephaven process that reads the binary log files as they are written, and streams the data to a Data Import Server (DIS).
  • Data Import Server (DIS) - a process that writes near real-time data into the intraday table locations.
  • Listener - generated class code running on the DIS that converts the binary entries sent by the tailer into appended data in the target intraday table.

The logger and the listener work together in that both of them are generated based on the table's schema. The logger converts elements of discrete data into the Deephaven row-oriented binary log format, while the listener converts the data from the binary log format to Deephaven's column-oriented data store.

Although the typical arrangement is to stream events through the logger, tailer, and listener into the intraday tables as they arrive, there are also applications where only the logger is used, and the binary log files are manually imported when needed. The matching listener component will then be used when that later import is run.

Customers can generate custom loggers and listeners based on the definitions contained in schemas, and this will be required to use the streaming data ingestion described above. Logger and listener generation is normally done through the use of the generate_loggers script, provided as part of the software installation.

When Deephaven uses a logger or listener class, it will first verify that the class matches the current schema definition for the table in question. Therefore, whenever a table schema is modified and redeployed with dhconfig, any related loggers and listeners will also need to be recreated.

generate_loggers Script

Once the schemas are defined, the generate_loggers script will normally be used to generate logger and listener classes. It finds schemas, generates and compiles Java files based on the definitions in these schemas, and packages the compiled .class files and Java source files into two separate JAR files which can be used by the application and Deephaven processes, or by the customer for application development. The IntradayLoggerFactory class is called to perform the actual code generation, and it uses the properties described above when generating the logger and listener code.

To use it with default behavior, simply call the script without any parameters, and it will generate the loggers and listeners for any customer schemas it finds through the schema service.

The simplest way to call this script is:

sudo /usr/illumon/latest/bin/iris generate_loggers

This call will generate all loggers and listeners that are not internal Deephaven ones, and place a default jar file in a location where it will be accessible to the application.

The script will use several default options based on environment variables in the host configuration file's generate_loggers entry.

  • ILLUMON_JAVA_GENERATION_DIR - the directory into which the generated Java files will be placed. If this is not supplied, then the directory $WORKSPACE/generated_java will be used. This can also be overridden with the javaDir parameter as explained below. Two directories will be created under this directory:
    • build_generated - used to generate the compiled Java class files.
    • java - used to hold the generated Java code.

A typical value for this is: export ILLUMON_JAVA_GENERATION_DIR=/etc/sysconfig/illumon.d/resources:

  • ILLUMON_CONFIG_ROOT indicates the customer configuration root directory. If defined, the script copies the generated logger/listener JAR file to the java_lib directory under this. A typical value for this is: export ILLUMON_CONFIG_ROOT=/etc/sysconfig/illumon.d.
  • ILLUMON_JAR_DIR - the directory in which the generated JAR files will be created. If it is not defined, then the workspace directory will be used. This is not the final location of the generated JAR file that contains the compiled .class files, as it is copied based on the ILLUMON_CONFIG_ROOT environment variable. The JAR file that contains the logger Java sources is not copied anywhere.

Several options are available to provide flexibility in logger/listener generation. For example, a user could generate loggers and listeners from non-deployed schema files for use in application development:

  • outputJar - specifies the filename of the JAR file generated by the logger/listener script. If the parameter is not provided, the default JAR file name is IllumonCustomerGeneratedCode.jar.
  • packages - a comma-delimited list which restricts which packages will be generated. If a logger or listener package doesn't start with one of the specified names, generation will be skipped for that logger or listener. If the parameter is not provided, all loggers and listeners for found schema files will be generated. Customer logger and listener packages should never start with com.illumon.iris.controller, com.illumon.iris.db, or io.deephaven.iris.db, as these are reserved for internal use.
  • javaDir - specifies the directory which will be used to write generated Java files. A logs directory must be available one level up from the specified javaDir directory. If the parameter is not provided, the directory generated_java under the workspace will be used. In either case, under this directory a subdirectory build_generated will be used to contain compiled .class files, and this subdirectory will be created if it does not exist.
  • schemaDir - specifies a single directory to search for schema files. The specified location is searched for schema files. If this parameter is provided, the schema service is not used to retrieve schemas; instead, only the location specified by this parameter is used. Especially combined with jarDir, this may be useful to test logger and listener generation before deploying a schema and the associated loggers and listeners. Note: This directory must be readable by the user that will run the command, usually irisadmin.
  • jarDir - specifies a directory to hold the generated JAR file. If the parameter is not provided, then the workspace directory (as defined in the host configuration file) will be used. The generated JAR file will always be copied to a location specified by $ILLUMON_CONFIG_ROOT/java_lib, which defaults to a location where the Deephaven application will find it (currently /etc/sysconfig/illumon.d/java_lib).
  • javaJar - specifies a JAR file in which the generated logger source (java) files will be placed. This will be placed in the same directory as the generated JAR file. If the parameter is not specified, then a JAR file with the name "IllumonCustomerGeneratedCodeSources.jar" will be created.

For example, the following command will generate a logger/listener jar using several custom options:

export WORKSPACE=/home/username/workspace
export ILLUMON_JAVA=/usr/bin/java (or any other path to a valid java executable)
export JAVA_HOME=/usr/java/jdk[version] (or wherever bin/java and bin/javac can be found)
/usr/illumon/latest/bin/generate_loggers \
-d /usr/illumon/latest/ \
-f iris-common.prop \
-j -Dlogroot=/home/username/logs
outputJar=test.jar \
packages=com.customer.gen \
javaDir=/home/username/java_gen_dir \ schemaDir=/home/username/schema jarDir=/home/username/jars
  • The WORKSPACE environment variable will be used to store temporary files (in this case, /home/username/workspace, which must already exist).
  • The ILLUMON_JAVA environment variable tells it where to find Java.
  • The JAVA_HOME environment variable tells it where to find the JDK.
  • -d /usr/illumon/latest/ indicates where Deephaven is installed.
  • -f iris-common.prop indicates the property file to use (this is a common property file that will exist on most installations).
  • -j -Dlogroot=/home/username/logs indicates the root logging directory; logs will be placed in subdirectories under this directory, which should already exist.
  • The schemaDir parameter tells it to look for schema files in the directory /home/username/schema, instead of using the schema service.
  • The outputJar parameter tells it to generate a JAR with the name "test.jar".
  • The packages parameter tells it to only generate classes that start with packages com.customer.gen.
  • Because it is only operating out of the user's directories it can be run under the user's account.

Note: in this example we are calling generate_loggers directly, rather than through /usr/illumon/latest/bin/iris.

If the generation process completes correctly, the script will show which Java files it generated and indicate what was added to the JAR files. The final output should be similar to the following:

**********************************************************************
Jar file generated: /db/TempFiles/irisadmin/IllumonCustomerGeneratedCode.jar
Jar file not copied to a Deephaven-available location
Jar file with logger sources created as /db/TempFiles/irisadmin/IllumonCustomerGeneratedCodeSources.jar
********************

Dynamic Logging Example

This section provides a simple example to show how to define simple schemas that determine column partition values based on the logged data, and how to write Java loggers based on these schemas. This involves several steps:

  1. Define the schemas for the data to be logged.
  2. Deploy the schemas.
  3. Generate the Deephaven logger and listener classes from these schemas.
  4. Create a Java class to log some data for these tables.
  5. Compile and run the Java class.
  6. Query the tables to see the logged data.

All of the examples work within a standard Deephaven installation. In customized installations, changes to some commands may be required.

Define the Schemas

For this example, two schemas are provided to illustrate the two dynamic logging options.

  • ExampleNamespace.LoggingDate.schema (see Appendix A) is for a table where the column partition value will be determined by each row's logged timestamp.
  • ExampleNamespace.LoggingFunction.schema (see Appendix B) is for a table where the column partition value will be determined from each row's logged SomeData column.

The following lines create the schemas:

sudo -u dbmerge mkdir -p /tmp/schema/ExampleNamespace

sudo -u dbmerge vi /tmp/schema/ExampleNamespace/ExampleNamespace.LoggingDate.schema
<Copy and paste the schema from Appendix B, save, and exit>

sudo -u dbmerge vi /tmp/schema/ExampleNamespace/ExampleNamespace.LoggingFunction.schema
<Copy and paste the schema from Appendix C, save, and exit>

Import the Schemas

The schemas must be imported into the schema service so the infrastructure can access them.

sudo -u irisadmin /usr/illumon/latest/bin/dhconfig schema import --force --directory /tmp/schema/ExampleNamespace

Generate the Logger Classes

This step involves running the procedure that generates the Deephaven logger classes from the schemas. The logger classes will be used in the example to log some data, while the listener classes will be generated as-needed (not with this command) by the Deephaven infrastructure to load the data. The command will restrict the logger class generation to just the package named in the schema, and will give the generated JAR file a specific name to distinguish it from other generated JAR files.

sudo /usr/illumon/latest/bin/iris generate_loggers packages=com.deephaven.test.gen outputJar=exampleGeneratedJar.jar

The generated classes are visible in the newly generated file, which has been placed in a location where Deephaven expects to find customer JAR files:

jar -tf /etc/sysconfig/illumon.d/java_lib/exampleGeneratedJar.jar

Create the Java Logger Program

It is easiest to create, compile and run the program under the irisadmin account:

sudo su - irisadmin

The program that runs the generated loggers is a simple Java class that can be found in Appendix C. In this manual process, we need to put it on the system, compile it, and run it. In a more realistic scenario, the generated JAR files will be copied over to an IDE, which will be used to develop the application.

Create the Java File

cd ~
mkdir -p java/com/deephaven/test/gen
vi java/com/deephaven/test/gen/LoggingExample.java

You may need to type the following command before pasting to avoid formatting issues:

:set paste
# <Copy and paste the Java class from Appendix C, save, and exit>

When creating the Java program, note the following:

  • The Deephaven EventLoggerFactory class is used to create the intraday logger instances. This factory returns fully initialized loggers, set up to either write binary log files or to send data to the Log Aggregator Service based on properties.
  • When creating the dateLogger (for rows that will have column partition values based on the timestamp field), note the use of a time zone. This will be used to determine the actual date for any given timestamp (a timestamp does not inherently indicate the time zone to which it belongs).
  • When creating the functionLogger, note the lambda passed in the final parameter of the createIntradayLogger call. This will use the first 10 characters of the SomeData column to determine the column partition value. Since a String can be null or empty, it uses a default value to ensure that an exception is not raised in this case.
  • The fixedPartitionLogger is created to write all data in a single column partition value, "2018-01-01". It will use the same table as the date logger.

For the actual log statements, hard-coded values are used.

Compile the Java Logger Program

Compile the Java code into a JAR file as follows:

vi compile.bash
# <Copy and paste the compilation script from Appendix D>
chmod 755 compile.bash
mkdir classfiles
./compile.bash
cd classfiles
jar cvf myClass.jar ./com/deephaven/test/gen/LoggingExample.class
# In a default installation irisadmin has the privileges to copy the jar file
cp ~irisadmin/classfiles/myClass.jar /etc/sysconfig/illumon.d/java_lib

Run the Compiled File

This is best done when logged in as the irisadmin account, which will still be the case if these example steps are being followed in order.

Use the iris_exec script to run the program. It will set up the classpath appropriately, and can include additional parameters. The first of these parameters will specify the LoggingExample class, while the second one tells the EventLoggerFactory to use the dynamic (row-based) partitioning. The -j tells the script that it is a Java parameters to be passed through to the JVM.

/usr/illumon/latest/bin/iris_exec com.deephaven.test.gen.LoggingExample \
-j -DLoggingExample.useDynamicPartitions=true

You can see the generated binary log files for the two tables with:

ls -ltr /var/log/deephaven/binlogs/ExampleNamespace*.bin.*

The location into which these logs are generated can be changed by updating the log-related properties as defined in the Log Files section of the Deephaven Operations Guide.

Query the Data

When logged in to a Deephaven console, the data can be queried with statements like the following examples, each of which looks at one of the partitions. The examples assume that the user has privileges to view the table.

dateTable = db.i("ExampleNamespace", "LoggingDate").where("Date=`2018-05-01`")
functionTable = db.i("ExampleNamespace", "LoggingFunction").where("Part=`TrialPart1`")
fixedDateTable = db.i("ExampleNamespace", "LoggingDate").where("Date=`2018-01-01`")

Appendix A: Dynamic Date-Based Schema

<Table namespace="ExampleNamespace" name="LoggingDate" storageType="NestedPartitionedOnDisk" >
  <Partitions keyFormula="${autobalance_single}"/>
  <Column name="Date" dataType="String" columnType="Partitioning" />
  <Column name="Timestamp"  dataType="DateTime" />
  <Column name="SomeData" dataType="String" />

  <LoggerListener logFormat="1" loggerPackage="com.deephaven.test.gen" loggerClass="LoggingDateFormat1Logger"
      listenerPackage="com.deephaven.test.gen" listenerClass="LoggingDateFormat1Listener"
      rethrowLoggerExceptionsAsIOExceptions="false">
    <SystemInput name="timestamp" type="long" />
    <SystemInput name="someData" type="String" />
    <Column name="Date" />
    <Column name="Timestamp" intradaySetter="timestamp" datePartitionInput="millis" />
    <Column name="SomeData" intradaySetter="someData" />
  </LoggerListener>
</Table>

Appendix B: Dynamic Function-Based Schema

<Table namespace="ExampleNamespace" name="LoggingFunction" storageType="NestedPartitionedOnDisk" >
  <Partitions keyFormula="${autobalance_single}"/>
  <Column name="Part" dataType="String" columnType="Partitioning" />
  <Column name="Timestamp"  dataType="DateTime" />
  <Column name="SomeData" dataType="String" />

    <LoggerListener logFormat="1" loggerPackage="com.deephaven.test.gen" loggerClass="LoggingFunctionFormat1Logger"
      listenerPackage="com.deephaven.test.gen" listenerClass="LoggingFunctionFormat1Listener"
      rethrowLoggerExceptionsAsIOExceptions="false" >
    <SystemInput name="timestamp" type="long" />
    <SystemInput name="someData" type="String" />

    <Column name="Part" />
    <Column name="Timestamp" intradaySetter="timestamp" />
    <Column name="SomeData" intradaySetter="someData" functionPartitionInput="true" />
  </LoggerListener>
</Table>

Appendix C: Example Java Class

package com.deephaven.test.gen;
import com.fishlib.configuration.Configuration;
import com.fishlib.io.log.LogLevel;
import com.fishlib.io.logger.Logger;
import com.fishlib.io.logger.StreamLoggerImpl;
import com.illumon.iris.db.util.logging.EventLoggerFactory;

import java.io.IOException;import java.time.ZoneId;
import java.time.ZonedDateTime;
import java.util.function.Function;

/**
  * A simple class to illustrate dynamic logging from pre-defined schemas.
  */
public class LoggingExample {

    /** The logger that illustrates dynamic logging using date rollover */
    private final LoggingDateFormat1Logger dateLogger;

    /** The logger that illustrates dynamic logging using a function */
    private final LoggingFunctionFormat1Logger functionLogger;

    /** The logger that illustrates a fixed partition */
    private final LoggingDateFormat1Logger fixedPartitionLogger;

   private LoggingExample() {

     // A Deephaven Logger instance
     final Logger log = new StreamLoggerImpl(System.out, LogLevel.INFO);

    // Required by the factories that create the dynamic logging instances
    final Configuration configuration = Configuration.getInstance();

    // Timezone for the date-determination logger
    final String timeZone = "America/New_York";

    // This logger will automatically determine the column partition value based on the logged timestamp,
   // using the specified time zone.
   dateLogger = EventLoggerFactory.createIntradayLogger(configuration,
     "LoggingExample",
      log,
      LoggingDateFormat1Logger.class,
      timeZone);

  // This logger requires a lambda to generate the column partition value. Because of the schema definition,
  // the function will be passed the value of the row's SomeData column.
  functionLogger = EventLoggerFactory.createIntradayLogger(configuration,
    "LoggingExample",
    log,
    LoggingFunctionFormat1Logger.class,
    null,
   (Function<String, String>) (value) -> {
     final int dataLen = value != null ? value.length() : 0;
     if (dataLen == 0){
        return "Unknown";
     }
     return value.substring(0, Math.min(10, dataLen));
    });

   // This logger will always log to the column partition value 2018-01-01
   fixedPartitionLogger = EventLoggerFactory.createIntradayLogger(configuration,
     "LoggingExample",
     log,
     LoggingDateFormat1Logger.class,
     null,
     "2018-01-01");
  }

  private void logDateEntry(final long timestamp, final String dataToLog) throws IOException {
     dateLogger.log(timestamp, dataToLog);
  }

 private void logFunctionEntry(final long timestamp, final String dataToLog) throws IOException

{
     functionLogger.log(timestamp, dataToLog);
 }

private void logFixedPartitionEntry(final long timestamp, final String dataToLog) throws IOException {
     fixedPartitionLogger.log(timestamp, dataToLog);
}

private void run() throws IOException {
     // Using the date-partitioning logger, log a few entries in two different partitions
    final ZoneId zoneId = ZoneId.of("America/New_York");
    final ZonedDateTime dateTime1 = ZonedDateTime.of(2018, 5, 1, 12, 0, 0, 0, zoneId);
    final ZonedDateTime dateTime2 = dateTime1.plusDays(1);
    logDateEntry(dateTime1.toEpochSecond() * 1_000, "Some data row 1");
    logDateEntry(dateTime1.toEpochSecond() * 1_000, "Some data row 2");
    logDateEntry(dateTime1.toEpochSecond() * 1_000, "Some data row 3");
    logDateEntry(dateTime2.toEpochSecond() * 1_000, "Some data row 4");

    // The function-partition logger will use the first 10 characters of the logged data to determine the column partition value
    logFunctionEntry(System.currentTimeMillis(), "TrialPart1");
    logFunctionEntry(System.currentTimeMillis() + 1000, "TrialPart1 with data");
    logFunctionEntry(System.currentTimeMillis() + 2000, "TrialPart2 with more data");

    // Log a couple of entries which go in a fixed partition
    logFixedPartitionEntry(System.currentTimeMillis(), "FixedEntry1");
    logFixedPartitionEntry(System.currentTimeMillis(), "FixedEntry2");

   // Ensure everything is flushed before shutting down
   dateLogger.shutdown();
   functionLogger.shutdown();
   fixedPartitionLogger.shutdown();
}

public static void main(final String... args) throws IOException {
  LoggingExample loggingExample = new LoggingExample();
  loggingExample.run();
 }
}

Appendix D: Java Compilation Script

#!/bin/bash

export DEVROOT=/usr/illumon/latest
proc="iris_exec"
source /etc/sysconfig/illumon
source /usr/illumon/latest/bin/launch_functions
echo sourced
setup_run_environment
export CLASSPATH
javac -parameters -d ./classfiles java/com/deephaven/test/gen/LoggingExample.java