Importing data examples

The following example schemas demonstrate the various capabilities of Deephaven table schemas.

Example 1 - Logger and Listener elements

This example demonstrates how to combine logger and listener elements in a table schema for streaming data ingestion. This schema includes separate Listener and Logger elements, as well as a combined LoggerListener element, showing different ways to define data ingestion paths. For more details on streaming data and loggers/listeners, see Streaming data overview.

<Table name="TestTable" namespace="TestNamespace" storageType="NestedPartitionedOnDisk">

   <Partitions keyFormula="__PARTITION_AUTOBALANCE_SINGLE__"/>
   <Column name="Partition" dataType="String" columnType="Partitioning" />
   <Column name="Alpha" dataType="String" columnType="Grouping"/>
   <Column name="Bravo" dataType="int" />
   <Column name="Charlie" dataType="double" />
   <Column name="Delta" dataType="double" />
   <Column name="Echo" dataType="StringSet" />

<Listener logFormat="1" listenerPackage="com.illumon.iris.test.gen" listenerClass="TestTableFormat1Listener">
      <ListenerImports>
   import com.illumon.iris.db.tables.libs.StringSet;
      </ListenerImports>
      <ListenerFields>
       private final String echoValue = "reverberating";
       private final StringSet echoValueStringSet = new com.illumon.iris.db.tables.StringSetArrayWrapper(echoValue);
      </ListenerFields>
      <ImportState importStateType="com.illumon.iris.db.tables.dataimport.logtailer.ImportStateRowCounter" stateUpdateCall="newRow()" />
      <Column name="Alpha"/>
      <Column name="Bravo" />
      <Column name="Charlie"/>
      <Column name="Delta" dbSetter="5.0" intradayType="none" />
      <Column name="Echo" dbSetter="echoValueStringSet" intradayType="none" />
   </Listener>

   <Logger logFormat="1" loggerPackage="com.illumon.iris.test.gen" loggerClass="BarTestLogger" >
      <LoggerImports>
   import com.abc.xyz.Helper;
      </LoggerImports>
      <LoggerFields>
      private final Helper helper = new Helper();
      </LoggerFields>
      <SystemInput name="Alpha" type="String" />
      <SystemInput name="Bravo" type="int" />
      <SystemInput name="CharlieSource" type="String" />
      <Column name="Alpha" dataType="String" />
      <!-- The BarTestLogger will perform different input transformations than the TestLogger. -->
      <Column name="Bravo" dataType="int" intradaySetter="Bravo + 1"/>
      <Column name="Charlie" dataType="double" intradaySetter="helper.derive(Double.parseDouble(CharlieSource) * 2.0)" directSetter="matchIntraday" />
      <!-- This column exists in the schema, but not in the V1 log.  Missing columns are not allowed, therefore it must have an intradayType of none. -->
      <Column name="Delta" dataType="double" intradayType="none" />
      <Column name="Echo" dataType="StringSet" intradayType="none" />
      </Logger>

<LoggerListener listenerClass="TestTableListenerBaz" listenerPackage="com.illumon.iris.test.gen" loggerPackage="com.illumon.iris.test.gen" loggerClass="TestTableLoggerBaz" logFormat="2">
      <SystemInput name="Alpha" type="String" />
      <SystemInput name="Bravo" type="int" />
      <SystemInput name="Charlie" type="int" />
      <ListenerImports>
      import com.illumon.iris.db.tables.libs.StringSet;
      </ListenerImports>
      <ListenerFields>
      private final String echoValue = "reverberating";
      private final StringSet echoValueStringSet = new com.illumon.iris.db.tables.StringSetArrayWrapper(echoValue);
      </ListenerFields>
      <Column name="Alpha" dataType="String" />
      <Column name="Bravo" dataType="int" />
      <Column name="Charlie" dataType="double" intradayType="Int" dbSetter="(double)Charlie" directSetter="(double)Charlie" />
      <Column name="Delta" dataType="double" dbSetter="6.0" intradayType="none" />
      <Column name="Echo" dataType="StringSet" dbSetter="echoValueStringSet" intradayType="none" />
   </LoggerListener>

</Table>

Example 2 - Simplified streaming schema

This example shows a more compact streaming schema where logger and listener configuration is specified directly as attributes on the Table element, rather than as separate nested elements. This approach is simpler when you don't need custom imports, fields, or complex transformations. For more details on streaming data, see Streaming data overview.

<!-- Simple test schema for backwards compatibility. -->

<Table namespace="TestNamespace" name="TestTable3" storageType="NestedPartitionedOnDisk" loggerPackage="com.illumon.iris.test.gen" listenerPackage="com.illumon.iris.test.gen">
   <Partitions keyFormula="__PARTITION_AUTOBALANCE_SINGLE__"/>
   <SystemInput name="Alpha" type="String" />
   <SystemInput name="Bravo" type="int" />
   <SystemInput name="Charlie" type="int" />
   <SystemInput name="Foxtrot" type="double" />
   <Column name="Partition" dataType="String" columnType="Partitioning" />
   <Column name="Timestamp" dbSetter="DBTimeUtils.millisToTime(Timestamp)" dataType="DateTime" columnType="Normal" intradaySetter="System.currentTimeMillis()"/>
   <Column name="Alpha" dataType="String" />
   <Column name="Bravo" dataType="int" />
   <Column name="Charlie" dataType="double" intradaySetter="(double)Charlie + Foxtrot" directSetter="matchIntraday" />
  <Column name="Delta" dataType="double" dbSetter="1.5" intradayType="none" />
</Table>

Example 3 - Batch data import using the Schema Editor

The previous two examples focused on streaming data ingestion, which uses loggers and listeners to process row-by-row data in real time. This example demonstrates a different workflow: batch data import from CSV files using the Schema Editor UI and Persistent Queries.

The following walkthrough shows the common steps in setting up a schema and import job for a new CSV batch data source.

The example uses U.S. Federal College Scorecard data from www.data.gov. To follow along with this walk-through exactly, please download and unzip the data file from the College Scorecard dataset page. The direct download link for all data files is: https://ed-public-download.app.cloud.gov/downloads/CollegeScorecard_Raw_Data_04192023.zip

Click the Advanced button at the top of the Deephaven Console and then select Schema Editor.

Note

The Schema Editor menu item is only available if you are a member of the iris-schemamanagers group.

The Schema Editor menu item highlighted in the Advanced tab

After accepting the defaults, the Schema Editor panel will open:

The Schema Editor panel options before the panel opens

The Schema Editor panel

Click the File menu and then select the Discover CSV Schema... option:

Discover CSV Schema highlighted in the File menu

When the Discover Schema panel opens, select the ellipsis button (circled below) to the right of the Source File field:

The ellipsis button highlighted next to the Source File field in the Discover Schema panel

Browse for the CSV file you downloaded earlier:

The file search panel

Options when opening a file

After selecting the file, enter the Table Name and Namespace. (It can also be entered in the main Schema Editor window in the next step.) Format, Skip Lines, and Delimiter should be set here to result in a good preview of the data. For very large files, it may be desirable to limit how many rows are analyzed to infer the data types of the columns. The entire file does not need to be scanned, as long as there is enough representative data in the selected top lines of the file.

Click OK once the preview looks correct. Depending on the size of the file, it may take several seconds or longer to process the file. A progress bar will be shown at the bottom of the window during processing.

Options when opening a file, with a preview of the resulting table

Once processed, the schema details will be shown in the main Schema Editor window. The schema can then be edited from here to change column names, types, order, and import processing, as needed or desired.

The Schema Editor showing the discovered schema details

The expected results of an import to Deephaven can then be previewed using the Preview Table button. This action requires that the Source Directory and Source File or Source Glob be completed for a path to source files relative to the query server where the Schema Editor is being run.

The Preview Table window

Once the schema and results of a test import are as desired, the schema can be deployed to Deephaven to create a new table definition.

Open the File menu again, and select the Deploy Schema option to deploy the new table to Deephaven. Optionally, the schema file could instead be saved to disk for later use or manual deployment.

The Deploy Schema option highlighted in the File menu

Once the schema is deployed, an Import Query can be created using the Create Import Query button. (You can also use the Create Import Query button to deploy the schema.)

Alternatively, an Import Query can also be created later from the Query Config panel of the Deephaven Classic console. It will be necessary to set basic settings on the Settings tab, at least a timeout on the Scheduling tab, and details of the source file path and name on the CsvImport Settings tab.

The Persistent Query Configuration Editor panel - Settings tab

The Persistent Query Configuration Editor panel - CsvImport Settings tab

After completing this process, the new Import Query will be displayed alphabetically in the Query Config panel. It can then be run as desired.

The Import Query visible in the Query Config panel