Import XML using Builder
Deephaven provides tools for inferring a table schema from sample data and importing XML files. Because XML can represent nested/hierarchical data in many different ways, mapping to Deephaven tables is more complex that for a simple format like CSV. The XML importer described here can handle a few common variations - extraction of value from either attributes or element text, and different levels of nesting, but some XML formats may require a custom importer.
Example
The following script imports a single XML file to a specified partition. This import uses options consistent with the XML Quickstart example.
import com.illumon.iris.importers.util.XmlImport
import com.illumon.iris.importers.ImportOutputMode
rows = new XmlImport.Builder("Test","Sample")
.setSourceFile("/db/TempFiles/dbquery/staging/data1.xml")
.setDestinationPartitions("localhost/2018-04-01")
.setElementType("Record")
.setUseAttributeValues(true)
.setUseElementValues(false)
.setStartDepth(0)
.setOutputMode(ImportOutputMode.REPLACE)
.build()
.run()
println "Imported " + rows + " rows."
from deephaven import *
rows = (
XmlImport.builder("Test", "Sample")
.setSourceFile("/db/TempFiles/dbquery/staging/data1.xml")
.setDestinationPartitions("localhost/2018-04-01")
.setElementType("Record")
.setUseAttributeValues(True)
.setUseElementValues(False)
.setStartDepth(0)
.setOutputMode("REPLACE")
.build()
.run()
)
print("Imported {} rows.".format(rows))
Import API Reference
The XML import class provides a static builder method, which produces an object used to set parameters for the import. The builder returns an import object from the build()
method. Imports are executed via the run()
method and if successful, return the number of rows imported. All other parameters and options for the import are configured via the setter methods described below. The general pattern when scripting an import is:
nRows = XmlImport.builder(<namespace>,<table>)
.set<option>(<option value>)
…
.build()
.run()
XML Import Options
Setter Method | Type | Req? | Default | Description |
---|---|---|---|---|
setSourceDirectory | String | No* | N/A | Directory from which to read source file(s).. |
setSourceFile | String | No* | N/A | Source file name (either full path on server filesystem or relative to specified source directory). |
setSourceGlob | String | No* | N/A | Source file(s) wildcard expression. |
setDelimiter | char | No | , | Allows specification of a character when parsing string representations of long or double arrays. |
setElementType | String | Yes | N/A | The name or path of the element that will contain data elements. This will be the name of the element which holds your data. |
setStartIndex | int | No | 0 | Starting from the root of the document, the index (1 being the first top-level element in the document after the root) of the element under which data can be found. |
setStartDepth | int | No | 1 | Under the element indicated by Start Index , how many levels of first children to traverse to find an element that contains data to import. |
setMaxDepth | int | No | 1 | Starting from Start Depth , how many levels of element paths to traverse and concatenate to provide a list that can be selected under Element Name . |
setUseAttributeValues | boolean | No | false | Indicates that field values will be taken from attribute value; e.g., <Record ID="XYZ" Price="10.25" /> |
setUseElementValues | boolean | No | true | Indicates that field values will be taken from element values; e.g., <Price>10.25</> |
setPositionValues | boolean | No | false | When false , field values within the document will be named; e.g., a value called Price might be contained in an element named Price, or an attribute named Price. When this option is included, field names (column names) will be taken from the table schema, and the data values will be parsed into them by matching the position of the value with the position of column in the schema. |
setConstantColumnValue | String | No | N/A | A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT . |
* The sourceDirectory
parameter will be used in conjunction with sourceFile
or sourceGlob
. If sourceDirectory
is not provided, but sourceFile
is, then sourceFile
will be used as a fully qualified file name. If sourceDirectory
is not provided, but sourceGlob
is, then sourceDirectory
will default to the configured log file directory from the prop file being used.