Class ParquetInstructions.Builder

java.lang.Object
io.deephaven.parquet.table.ParquetInstructions.Builder
Enclosing class:
ParquetInstructions

public static class ParquetInstructions.Builder extends Object
  • Constructor Details

  • Method Details

    • addColumnNameMapping

      public ParquetInstructions.Builder addColumnNameMapping(String parquetColumnName, String columnName)
    • getTakenNames

      public Set<String> getTakenNames()
    • addColumnCodec

      public ParquetInstructions.Builder addColumnCodec(String columnName, String codecName)
    • addColumnCodec

      public ParquetInstructions.Builder addColumnCodec(String columnName, String codecName, String codecArgs)
    • useDictionary

      public ParquetInstructions.Builder useDictionary(String columnName, boolean useDictionary)
      Set a hint that the writer should use dictionary-based encoding for writing this column; never evaluated for non-String columns.
      Parameters:
      columnName - The column name
      useDictionary - The hint value
    • setFieldId

      public ParquetInstructions.Builder setFieldId(String columnName, int fieldId)
      This is currently only used for writing, allowing the setting of field_id in the proper Parquet SchemaElement.

      Setting multiple field ids for a single column name is not allowed.

      Field ids are not typically configured by end users.

      Parameters:
      columnName - the Deephaven column name
      fieldId - the field id
    • setCompressionCodecName

      public ParquetInstructions.Builder setCompressionCodecName(String compressionCodecName)
    • setMaximumDictionaryKeys

      public ParquetInstructions.Builder setMaximumDictionaryKeys(int maximumDictionaryKeys)
      Set the maximum number of unique keys the writer should add to a dictionary page before switching to non-dictionary encoding; never evaluated for non-String columns, ignored if use dictionary is set for the column.
      Parameters:
      maximumDictionaryKeys - The maximum number of dictionary keys; must be >= 0
    • setMaximumDictionarySize

      public ParquetInstructions.Builder setMaximumDictionarySize(int maximumDictionarySize)
      Set the maximum number of bytes the writer should add to the dictionary before switching to non-dictionary encoding; never evaluated for non-String columns, ignored if use dictionary is set for the column.
      Parameters:
      maximumDictionarySize - The maximum size of dictionary (in bytes); must be >= 0
    • setIsLegacyParquet

      public ParquetInstructions.Builder setIsLegacyParquet(boolean isLegacyParquet)
    • setTargetPageSize

      public ParquetInstructions.Builder setTargetPageSize(int targetPageSize)
    • setIsRefreshing

      public ParquetInstructions.Builder setIsRefreshing(boolean isRefreshing)
    • setSpecialInstructions

      public ParquetInstructions.Builder setSpecialInstructions(Object specialInstructions)
    • setGenerateMetadataFiles

      public ParquetInstructions.Builder setGenerateMetadataFiles(boolean generateMetadataFiles)
      Set whether to generate "_metadata" and "_common_metadata" files while writing parquet files. On setting this parameter,
      • When writing a single parquet file, metadata files will be generated in the same parent directory as the parquet file.
      • When writing multiple parquet files in a single write call, the writing code insists that all parquet files should be written to the same parent directory, and only then metadata files will be generated in the same parent directory.
      • When writing key-value partitioned parquet data, metadata files are generated in the root directory of the partitioned parquet files.
    • setBaseNameForPartitionedParquetData

      public ParquetInstructions.Builder setBaseNameForPartitionedParquetData(String baseNameForPartitionedParquetData)
      Set the base name for partitioned parquet data. This is used to generate the file name for partitioned parquet files, and therefore, this parameter is only used when writing partitioned parquet data. Users can provide the following tokens to be replaced in the base name:
      • The token "{i}" will be replaced with an automatically incremented integer for files in a directory. For example, a base name of "table-{i}" will result in files named like "PC=partition1/table-0.parquet", "PC=partition1/table-1.parquet", etc., where PC is a partitioning column.
      • The token "{uuid}" will be replaced with a random UUID. For example, a base name of "table-{uuid}" will result in files named like "table-8e8ab6b2-62f2-40d1-8191-1c5b70c5f330.parquet".
      • The token "{partitions}" will be replaced with an underscore-delimited, concatenated string of partition values. For example, a base name of "{partitions}-table" will result in files like "PC1=partition1/PC2=partitionA/PC1=partition1_PC2=partitionA-table.parquet", where "PC1" and "PC2" are partitioning columns.
      The default value of this parameter is "{uuid}".
    • setFileLayout

      Set the expected file layout when reading a parquet file or a directory. This info can be used to skip some computations to deduce the file layout from the source directory structure.
    • setTableDefinition

      public ParquetInstructions.Builder setTableDefinition(TableDefinition tableDefinition)
      • When reading a parquet file, this corresponds to the table definition to use instead of the one implied by the parquet file being read. Providing a definition can help save additional computations to deduce the table definition from the parquet files as well as from the directory layouts when reading partitioned data.
      • When writing a parquet file, this corresponds to the table definition to use instead of the one implied by the table being written
      This definition can be used to skip some columns or add additional columns with null values.
    • addIndexColumns

      public ParquetInstructions.Builder addIndexColumns(String... indexColumns)
      Add a list of columns to persist together as indexes. The write operation will store the index info as sidecar tables. This argument is used to narrow the set of indexes to write, or to be explicit about the expected set of indexes present on all sources. Indexes that are specified but missing will be computed on demand.
    • addAllIndexColumns

      public ParquetInstructions.Builder addAllIndexColumns(Iterable<List<String>> indexColumns)
      Adds provided lists of columns to persist together as indexes. This method accepts an Iterable of lists, where each list represents a group of columns to be indexed together.

      The write operation will store the index info as sidecar tables. This argument is used to narrow the set of indexes to write, or to be explicit about the expected set of indexes present on all sources. Indexes that are specified but missing will be computed on demand. To prevent the generation of index files, provide an empty iterable.

    • setOnWriteCompleted

      public ParquetInstructions.Builder setOnWriteCompleted(ParquetInstructions.OnWriteCompleted onWriteCompleted)
      Adds a callback to be executed when on completing each parquet data file write (excluding the index and metadata files).
    • build

      public ParquetInstructions build()