Write custom ingesters

So, you’ve got some data that isn’t in one of the formats Deephaven can ingest “out of the box”. That is, it’s not in CSV, or XML, or JSON, or Deephaven’s binary log format, or available from a database via JDBC, or available as a Deephaven table, or in Quandl, or supported by one of our plugins. The good news is that we can totally support that!

So, what do I do?

The recommended approach is to implement a custom importer for your data in Java using Deephaven’s APIs.

Why should I do that?

Implementing a proper Deephaven importer gives you two advantages over other approaches:

  • Your data can be ingested into your Deephaven instance with the minimum number of copies or transformation steps. This is an important consideration for large data sets spanning years that will need to be backfilled quickly and efficiently, or for mission-critical production data sets that need to be delivered with the most robust possible process.
  • Your ingestion process can easily be integrated into Deephaven’s DBA workflow, allowing scheduled ingestion, merge, validation, and cleanup; semi-automated backfill; and dependent derived import jobs.

What if I really don’t want to write Java code?

We get it: Java isn’t everyone’s cup of tea. (Hah. Hahahaha.)

For users working in C# and C++, we have native implementations of our binary logging API. The Deephaven binary store format is robust and extremely efficient to ingest. Given that you may be implementing binary store loggers as a matter of course in order to integrate with Deephaven’s real-time data ingestion capabilities, using the same data transformers to prepare backfill inputs may impose minimal additional development cost. See Logging From C#, Logging from C++, and Importing Binary Logs.

If you’d prefer to work with the languages and tools you know, we also support a number of language-agnostic formats, such as CSV, XML, and JSON.

OK, you’ve convinced me, how do I write a custom importer?

The following recipe should give you an idea of the work involved:

  1. You should read and take ideas from ExampleImporter.java. This provides an extensively-documented importer implementation for ingesting a trivial generated data set, illustrating best practices for working with Deephaven’s APIs (e.g., TableWriter) and Java language features in general.
  2. Now that you get the idea, you're going to want to create your own subclass of BaseImporter, an abstract class which addresses most of the setup. We generally recommend that you copy parts of ExampleImporter.java and update them to match your needs.
  3. Your new importer subclass is probably going to need to find the data it imports, and might need some additional options. Create an Arguments class extending StandardImporterArguments. This class needs to add arguments for the information your importer needs to locate and process the data to be imported. If you’re taking example code from ExampleImporter.java, you’ll need to replace references to ExampleImporter.Arguments with your own class.
  4. Your importer will need a main() method of its own, which uses your arguments class and instantiates your importer class. If you’re taking example code from ExampleImporter.java, you’ll need to replace references to ExampleImporter.Arguments with your Arguments class, and ExampleImporter with your importer class.
  5. Your importer probably needs to ingest some data. Implement processData() in your importer class. This method must set all the column values using the TableWriter instance and call TableWriter.writeRow for each row of output data. ExampleImporter.java includes sample implementations at two extremes: one is completely hardcoded and the other is completely dynamic.

Once you have your importer implemented, in order to use it you’ll need to be sure to deploy the schema for your table(s).

Running your importer as a query

This is the recommended approach for most users, as it allows you to integrate your data ingestion with the functionality Deephaven has designed to make data import reliable and easy to monitor.

ExampleImporter and ExampleImporter.Arguments include overrides that bypass the command line. Importers that override these methods in a similar way can be called from a Deephaven console or Persistent Query. In general, you will need to run on a merge server. For more information, see PQ documentation for Batch Query - Import Server. You will also need to make sure your code is available on the classpath of your PQ.

Sample Groovy query code:

import com.illumon.iris.importers.ExampleImporter

// Call the static convenience method
ExampleImporter.importData(log, "ExampleNamespace", "ExampleTable", "hostname/2018-01-01", "REPLACE", 12, 23)

// Create an object and invoke doImport()
e = new com.illumon.iris.importers.ExampleImporter(log, "ExampleNamespace", "ExampleTable", "hostname/2018-03-08", "REPLACE", 12, 3)
e.doImport()

Running your importer as a standalone process

If you prefer to use your own orchestration tooling, you may want to run your data ingestion as a standalone process. In general, you will need to use a server that has configured “intraday” storage to hold ingested data before it is merged.

Classpath

Your ingester's classpath should include installed Deephaven files, the default override locations, and the path to your own java code.

  • /etc/sysconfig/illumon.d/hotfixes

  • /etc/sysconfig/illumon.d/override

  • /etc/sysconfig/illumon.d/resources `

  • /etc/sysconfig/illumon.d/java_lib/*

  • /usr/illumon/latest/etc

  • /usr/illumon/latest/java_lib/*

Pass this as a JVM argument:

export EXAMPLECLASSPATH=/etc/sysconfig/illumon.d/hotfixes:/etc/sysconfig/illumon.d/override:/etc/sysconfig/illumon.d/resources:/etc/sysconfig/illumon.d/java_lib/*:/usr/illumon/latest/etc:/usr/illumon/latest/java_lib/*

cp $EXAMPLECLASSPATH

process.name

Set this to whatever seems appropriate. Only one instance of each process.name can run at any given time. Pass this as a JVM argument:

-Dprocess.name=example_importer

Configuration.rootFile

The root property file specifying configuration properties for your process. Pass this as a JVM argument:

-DConfiguration.rootFile=iris-common.prop

devroot

Can be /usr/illumon/latest, or the resolved target of that link. Pass this as a JVM argument:

-Ddevroot=/usr/illumon/latest

workspace

This governs where log files will go. Matching the other import processes should be fine. Pass this as a JVM argument:

-Dworkspace=/db/TempFiles/dbmerge/example_importer

Class to execute

Replace com.illumon.iris.importers.ExampleImporter with your class.

Program Arguments

Standard importer arguments, plus any new arguments you add. Standard arguments are:

  • -dd or --destinationDirectory <path>
  • -dp or --destinationPartition <internal partition name / partitioning value>
  • -ns or --namespace <namespace>
  • -tn or --tableName <name>
  • -om or --outputMode <import behavior>

Note

See also: Tables & Schemas

Result command line

This will give you a command line (for bash or similar) something like:

export EXAMPLECLASSPATH=/etc/sysconfig/illumon.d/hotfixes:/etc/sysconfig/illumon.d/override:/etc/sysconfig/illumon.d/resources:/etc/sysconfig/illumon.d/java_lib/*:/usr/illumon/latest/etc:/usr/illumon/latest/java_lib/*

java -cp $EXAMPLECLASSPATH -Dprocess.name=example_importer -DConfiguration.rootFile=iris-common.prop -Ddevroot=/usr/illumon/latest -Dworkspace=/db/TempFiles/dbmerge/example_importer com.illumon.iris.importers.ExampleImporter -dp hostname/2018-03-07 -om REPLACE -ns ExampleNamespace -tn ExampleTable [custom arguments]