Why does Deephaven's batch data importer create temporary files during a multi-partition import?

Can users set the location for temporary files?

When Deephaven performs a single partition import, the data is read directly, and no temp files are created.

When Deephaven performs a multi-partition import, where one of the source columns is being specified as the provider of the column partition value, the importer must first split the file into temp files for each partition. The original data source is iterated, to split it into one temporary file per partition value. These files are written to the Java default temporary folder, (e.g., /tmp/deephavenCSV.csv) and are then imported to intraday.

Pre-partitioning the CSV file reduces the costs associated with opening and closing table writers, especially if the tables have a lot of columns.

You should ensure that the temporary target directory has enough space for this process, especially when loading large bulk data sets. Deephaven uses your local machine's default Java temporary-file directory. You may configure this location with the following property:

  • -Djava.io.tmpdir=/path/to/tmpdir