Import JSON Files

This guide demonstrates how to generate a schema from a JSON data file, deploy it, and import a file from the command line. These commands assume a typical Deephaven installation and a sample file located at /data/sample.json.

Schema preparation

This section outlines the necessary steps to prepare a schema, which must be completed before attempting to import data using the command-line interface.

Generate a schema

See the Schema inference page for a guide on generating a schema from a JSON file.

Deploy the schema

sudo -u irisadmin /usr/illumon/latest/bin/dhconfig schema import --file /tmp/JSONExampleNamespace.JSONExampleTableName.schema

Command-line import with `iris_exec json_import`

Note

Ensure you have completed the schema preparation steps before proceeding with a command-line import.

JSON imports can also be performed directly from the command line, using the iris_exec tool.

General syntax

sudo -u dbmerge /usr/illumon/latest/bin/iris_exec json_import <launch args> -- <json import args>

Example import command

The following example imports a single data file into the specified intraday partition, using the schema file generated in the previous step.

sudo -u dbmerge /usr/illumon/latest/bin/iris_exec json_import -- --namespace JSONExampleNamespace --tableName JSONExampleTableName --sourceFile /data/sample.json --destinationPartition localhost/2018-09-26

JSON import arguments

Argument	Description
`-dd` or `--destinationDirectory <path>` `-dp` or `--destinationPartition <internal partition name / partitioning value> \| <internal partition name>` `-pc`or`--intradayPartition <partition column name>`	Either a destination directory, specific partition, or internal partition plus a partition column must be provided. A directory can be used to write a new set of table files to specific location on disk, where they can later be read with TableTools. A destination partition is used to write to intraday locations for existing tables. The internal partition value is used to separate data on disk; it does not need to be unique for a table. The name of the import server is a common value for this. The partitioning value is a string data value used to populate the partitioning column in the table during the import. This value must be unique within the table. In summary, there are three ways to specify destination table partition(s): Destination directory (e.g., `-dd /db/Intraday/<namespace>/<table>/localhost/<date>/<table/`) Internal partition and destination partition (e.g., `-dp localhost/2018-01-01`) Internal partition and partition column - for multi-partition import (e.g., `-dp localhost -pc Date`)
`-ns` or `--namespace <namespace>`	(Required) Namespace in which to find the target table.
`-tn` or `--tableName <name>`	(Required) Name of the target table.
`-om` or `--outputMode <import behavior>`	(Optional): `SAFE` (default)- `SAFE` checks whether the target table already contains data matching the provided partitioning value; if it does, the import is aborted. When developing an import process, `REPLACE` should be used, because failed import attempts will often write some data to the table, causing the next attempt with `SAFE` to abort. `APPEND` should normally be used only when you are running multiple imports to load a set of data to the same table at one time, possibly from multiple different sources, and the resultant data needs to be kept together as part of one logical partition.
`-rc` or `--relaxedChecking <TRUE or FALSE>`	(Optional) Defaults to `FALSE`. If `TRUE`, will allow target columns that are missing from the source JSON to remain null, and will allow import to continue when data conversions fail. In most cases, this should be set to TRUE only when developing the import process for a new data source.
`-sd` or `--sourceDirectory <path>` `-sf` or `--sourceFile <exact file name>` `-sg` or `--sourceGlob <file name pattern>`	`sourceDirectory`, `sourceFile`, and `sourceGlob` are all optional. If none of these are provided, the system will attempt to do a multi-file import. Otherwise, `sourceDirectory` will be used in conjunction with `sourceFile` or `sourceGlob`. If `sourceDiretory` is not provided, but `sourceFile` is, then `sourceFile` will be used as a fully qualified file name. If `sourceDirectory` is not provided, but `sourceGlob` is, then `sourceDirectory` will default to the configured log file directory from the prop file being used.
`-sn` or `--sourceName <ImportSource name>`	Specific `ImportSource` to use. If not specified, the importer will use the first `ImportSource` block that it finds that matches the type of the import (CSV/XML/JSON/JDBC).
`-fps` or `--filePathSeparator <file path separator>`	Specifies how many JSON items to examine in the source file(s) prior to import, in order to infer the "columns" which exist in the source data, and validate against the destination table. This inference step is necessary because JSON permits missing values. By default, the importer will read all items. For large files it may be improve performance to use a smaller value, if you know your source data does not contain missing values.
`-mi` or `--maxInferItems <max infer items>`	Specifies how many JSON items to examine in the source file(s) prior to import, in order to infer the "columns", which exist in the source data, and validate against the destination table. This inference step is necessary because JSON permits missing values. By default the importer will read all items. For large files it may be improve performance to use a smaller value, if you know your source data does not contain missing values.
`-cv` or `--constantColumnValue <constant column value>`	A literal value to use for the import column with `sourceType="CONSTANT"`, if the destination schema requires it.

JSON import query

There is presently no support for JSON imports via a Persistent Query.