java.lang.Object

io.deephaven.parquet.table.ParquetInstructions.Builder

Enclosing class:: ParquetInstructions

public static class ParquetInstructions.Builder extends Object

Constructor Summary

Constructors

Constructor

Description

Builder()

For each additional field added, make sure to update the copy constructor builder Builder(ParquetInstructions)

Builder(ParquetInstructions parquetInstructions)
Method Summary

Modifier and Type

Method

Description

ParquetInstructions.Builder

addAllIndexColumns(Iterable<List<String>> indexColumns)

Adds provided lists of columns to persist together as indexes.

ParquetInstructions.Builder

addColumnCodec(String columnName, String codecName)

ParquetInstructions.Builder

addColumnCodec(String columnName, String codecName, String codecArgs)

ParquetInstructions.Builder

addColumnNameMapping(String parquetColumnName, String columnName)

ParquetInstructions.Builder

addIndexColumns(String... indexColumns)

Add a list of columns to persist together as indexes.

ParquetInstructions

build()

Set<String>

getTakenNames()

ParquetInstructions.Builder

setBaseNameForPartitionedParquetData(String baseNameForPartitionedParquetData)

Set the base name for partitioned parquet data.

ParquetInstructions.Builder

setColumnResolverFactory(ParquetColumnResolver.Factory columnResolverFactory)

Sets the column resolver factory to allow higher-level managers (such as Iceberg) to use advanced column resolution logic based on the table key and table location key.

ParquetInstructions.Builder

setCompressionCodecName(String compressionCodecName)

ParquetInstructions.Builder

setFieldId(String columnName, int fieldId)

This is currently only used for writing, allowing the setting of field_id in the proper Parquet SchemaElement.

ParquetInstructions.Builder

setFileLayout(ParquetInstructions.ParquetFileLayout fileLayout)

Set the expected file layout when reading a parquet file or a directory.

ParquetInstructions.Builder

setGenerateMetadataFiles(boolean generateMetadataFiles)

Set whether to generate "_metadata" and "_common_metadata" files while writing parquet files.

ParquetInstructions.Builder

setIsLegacyParquet(boolean isLegacyParquet)

ParquetInstructions.Builder

setIsRefreshing(boolean isRefreshing)

ParquetInstructions.Builder

setMaximumDictionaryKeys(int maximumDictionaryKeys)

Set the maximum number of unique keys the writer should add to a dictionary page before switching to non-dictionary encoding; never evaluated for non-String columns, ignored if use dictionary is set for the column.

ParquetInstructions.Builder

setMaximumDictionarySize(int maximumDictionarySize)

Set the maximum number of bytes the writer should add to the dictionary before switching to non-dictionary encoding; never evaluated for non-String columns, ignored if use dictionary is set for the column.

ParquetInstructions.Builder

setOnWriteCompleted(ParquetInstructions.OnWriteCompleted onWriteCompleted)

Adds a callback to be executed when on completing each parquet data file write (excluding the index and metadata files).

ParquetInstructions.Builder

setSpecialInstructions(Object specialInstructions)

ParquetInstructions.Builder

setTableDefinition(TableDefinition tableDefinition)

When reading a parquet file, this corresponds to the table definition to use instead of the one implied by the parquet file being read.

ParquetInstructions.Builder

setTargetPageSize(int targetPageSize)

ParquetInstructions.Builder

useDictionary(String columnName, boolean useDictionary)

Set a hint that the writer should use dictionary-based encoding for writing this column; never evaluated for non-String columns.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- Builder
  
  public Builder()
  
  For each additional field added, make sure to update the copy constructor builder Builder(ParquetInstructions)
- Builder
  
  public Builder(ParquetInstructions parquetInstructions)
Method Details
- addColumnNameMapping
  
  public ParquetInstructions.Builder addColumnNameMapping(String parquetColumnName, String columnName)
- getTakenNames
  
  public Set<String> getTakenNames()
- addColumnCodec
  
  public ParquetInstructions.Builder addColumnCodec(String columnName, String codecName)
- addColumnCodec
  
  public ParquetInstructions.Builder addColumnCodec(String columnName, String codecName, String codecArgs)
- useDictionary
  
  public ParquetInstructions.Builder useDictionary(String columnName, boolean useDictionary)
  
  Set a hint that the writer should use dictionary-based encoding for writing this column; never evaluated for non-String columns.
  
  Parameters:
  
  columnName - The column name
  
  useDictionary - The hint value
- setFieldId
  
  public ParquetInstructions.Builder setFieldId(String columnName, int fieldId)
  
  This is currently only used for writing, allowing the setting of field_id in the proper Parquet SchemaElement.
  Setting multiple field ids for a single column name is not allowed.
  Field ids are not typically configured by end users.
  
  Parameters:
  
  columnName - the Deephaven column name
  
  fieldId - the field id
- setCompressionCodecName
  
  public ParquetInstructions.Builder setCompressionCodecName(String compressionCodecName)
- setMaximumDictionaryKeys
  
  public ParquetInstructions.Builder setMaximumDictionaryKeys(int maximumDictionaryKeys)
  
  Set the maximum number of unique keys the writer should add to a dictionary page before switching to non-dictionary encoding; never evaluated for non-String columns, ignored if use dictionary is set for the column.
  
  Parameters:
  
  maximumDictionaryKeys - The maximum number of dictionary keys; must be >= 0
- setMaximumDictionarySize
  
  public ParquetInstructions.Builder setMaximumDictionarySize(int maximumDictionarySize)
  
  Set the maximum number of bytes the writer should add to the dictionary before switching to non-dictionary encoding; never evaluated for non-String columns, ignored if use dictionary is set for the column.
  
  Parameters:
  
  maximumDictionarySize - The maximum size of dictionary (in bytes); must be >= 0
- setIsLegacyParquet
  
  public ParquetInstructions.Builder setIsLegacyParquet(boolean isLegacyParquet)
- setTargetPageSize
  
  public ParquetInstructions.Builder setTargetPageSize(int targetPageSize)
- setIsRefreshing
  
  public ParquetInstructions.Builder setIsRefreshing(boolean isRefreshing)
- setSpecialInstructions
  
  public ParquetInstructions.Builder setSpecialInstructions(Object specialInstructions)
- setGenerateMetadataFiles
  
  public ParquetInstructions.Builder setGenerateMetadataFiles(boolean generateMetadataFiles)
  Set whether to generate "_metadata" and "_common_metadata" files while writing parquet files. On setting this parameter,
  
  When writing a single parquet file, metadata files will be generated in the same parent directory as the parquet file.
  When writing multiple parquet files in a single write call, the writing code insists that all parquet files should be written to the same parent directory, and only then metadata files will be generated in the same parent directory.
  When writing key-value partitioned parquet data, metadata files are generated in the root directory of the partitioned parquet files.
- setBaseNameForPartitionedParquetData
  
  public ParquetInstructions.Builder setBaseNameForPartitionedParquetData(String baseNameForPartitionedParquetData)
  Set the base name for partitioned parquet data. This is used to generate the file name for partitioned parquet files, and therefore, this parameter is only used when writing partitioned parquet data. Users can provide the following tokens to be replaced in the base name:
  
  The token "{i}" will be replaced with an automatically incremented integer for files in a directory. For example, a base name of "table-{i}" will result in files named like "PC=partition1/table-0.parquet", "PC=partition1/table-1.parquet", etc., where PC is a partitioning column.
  
  The token "{uuid}" will be replaced with a random UUID. For example, a base name of "table-{uuid}" will result in files named like "table-8e8ab6b2-62f2-40d1-8191-1c5b70c5f330.parquet".
  
  The token "{partitions}" will be replaced with an underscore-delimited, concatenated string of partition values. For example, a base name of "{partitions}-table" will result in files like "PC1=partition1/PC2=partitionA/PC1=partition1_PC2=partitionA-table.parquet", where "PC1" and "PC2" are partitioning columns.
  
  The default value of this parameter is "{uuid}".
- setFileLayout
  
  public ParquetInstructions.Builder setFileLayout(ParquetInstructions.ParquetFileLayout fileLayout)
  
  Set the expected file layout when reading a parquet file or a directory. This info can be used to skip some computations to deduce the file layout from the source directory structure.
- setTableDefinition
  
  public ParquetInstructions.Builder setTableDefinition(TableDefinition tableDefinition)
  When reading a parquet file, this corresponds to the table definition to use instead of the one implied by the parquet file being read. Providing a definition can help save additional computations to deduce the table definition from the parquet files as well as from the directory layouts when reading partitioned data.
  
  When writing a parquet file, this corresponds to the table definition to use instead of the one implied by the table being written
  
  This definition can be used to skip some columns or add additional columns with null values.
- addIndexColumns
  
  public ParquetInstructions.Builder addIndexColumns(String... indexColumns)
  
  Add a list of columns to persist together as indexes. The write operation will store the index info as sidecar tables. This argument is used to narrow the set of indexes to write, or to be explicit about the expected set of indexes present on all sources. Indexes that are specified but missing will be computed on demand.
- addAllIndexColumns
  
  public ParquetInstructions.Builder addAllIndexColumns(Iterable<List<String>> indexColumns)
  
  Adds provided lists of columns to persist together as indexes. This method accepts an Iterable of lists, where each list represents a group of columns to be indexed together.
  The write operation will store the index info as sidecar tables. This argument is used to narrow the set of indexes to write, or to be explicit about the expected set of indexes present on all sources. Indexes that are specified but missing will be computed on demand. To prevent the generation of index files, provide an empty iterable.
- setOnWriteCompleted
  
  public ParquetInstructions.Builder setOnWriteCompleted(ParquetInstructions.OnWriteCompleted onWriteCompleted)
  
  Adds a callback to be executed when on completing each parquet data file write (excluding the index and metadata files).
- setColumnResolverFactory
  
  public ParquetInstructions.Builder setColumnResolverFactory(ParquetColumnResolver.Factory columnResolverFactory)
  
  Sets the column resolver factory to allow higher-level managers (such as Iceberg) to use advanced column resolution logic based on the table key and table location key. When set, setTableDefinition(TableDefinition) must also be set. As such, the factory is not used for inference purposes.
  This is not typically set by end-users.
  
  Parameters:
  
  columnResolverFactory - the column resolver factory
- build
  
  public ParquetInstructions build()

Class ParquetInstructions.Builder

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

Builder

Builder

Method Details

addColumnNameMapping

getTakenNames

addColumnCodec

addColumnCodec

useDictionary

setFieldId

setCompressionCodecName

setMaximumDictionaryKeys

setMaximumDictionarySize

setIsLegacyParquet

setTargetPageSize

setIsRefreshing

setSpecialInstructions

setGenerateMetadataFiles

setBaseNameForPartitionedParquetData

setFileLayout

setTableDefinition

addIndexColumns

addAllIndexColumns

setOnWriteCompleted

setColumnResolverFactory

build