Package io.deephaven.parquet.table
Class ParquetInstructions.Builder
java.lang.Object
io.deephaven.parquet.table.ParquetInstructions.Builder
- Enclosing class:
- ParquetInstructions
-
Constructor Summary
ConstructorDescriptionBuilder()
For each additional field added, make sure to update the copy constructor builderBuilder(ParquetInstructions)
Builder
(ParquetInstructions parquetInstructions) -
Method Summary
Modifier and TypeMethodDescriptionaddAllIndexColumns
(Iterable<List<String>> indexColumns) Adds provided lists of columns to persist together as indexes.addColumnCodec
(String columnName, String codecName) addColumnCodec
(String columnName, String codecName, String codecArgs) addColumnNameMapping
(String parquetColumnName, String columnName) addIndexColumns
(String... indexColumns) Add a list of columns to persist together as indexes.build()
setBaseNameForPartitionedParquetData
(String baseNameForPartitionedParquetData) Set the base name for partitioned parquet data.setCompressionCodecName
(String compressionCodecName) setFieldId
(String columnName, int fieldId) This is currently only used for writing, allowing the setting offield_id
in the proper ParquetSchemaElement
.setFileLayout
(ParquetInstructions.ParquetFileLayout fileLayout) Set the expected file layout when reading a parquet file or a directory.setGenerateMetadataFiles
(boolean generateMetadataFiles) Set whether to generate "_metadata" and "_common_metadata" files while writing parquet files.setIsLegacyParquet
(boolean isLegacyParquet) setIsRefreshing
(boolean isRefreshing) setMaximumDictionaryKeys
(int maximumDictionaryKeys) Set the maximum number of unique keys the writer should add to a dictionary page before switching to non-dictionary encoding; never evaluated for non-String columns, ignored ifuse dictionary
is set for the column.setMaximumDictionarySize
(int maximumDictionarySize) Set the maximum number of bytes the writer should add to the dictionary before switching to non-dictionary encoding; never evaluated for non-String columns, ignored ifuse dictionary
is set for the column.setOnWriteCompleted
(ParquetInstructions.OnWriteCompleted onWriteCompleted) Adds a callback to be executed when on completing each parquet data file write (excluding the index and metadata files).setSpecialInstructions
(Object specialInstructions) setTableDefinition
(TableDefinition tableDefinition) When reading a parquet file, this corresponds to the table definition to use instead of the one implied by the parquet file being read.setTargetPageSize
(int targetPageSize) useDictionary
(String columnName, boolean useDictionary) Set a hint that the writer should use dictionary-based encoding for writing this column; never evaluated for non-String columns.
-
Constructor Details
-
Builder
public Builder()For each additional field added, make sure to update the copy constructor builderBuilder(ParquetInstructions)
-
Builder
-
-
Method Details
-
addColumnNameMapping
public ParquetInstructions.Builder addColumnNameMapping(String parquetColumnName, String columnName) -
getTakenNames
-
addColumnCodec
-
addColumnCodec
public ParquetInstructions.Builder addColumnCodec(String columnName, String codecName, String codecArgs) -
useDictionary
Set a hint that the writer should use dictionary-based encoding for writing this column; never evaluated for non-String columns.- Parameters:
columnName
- The column nameuseDictionary
- The hint value
-
setFieldId
This is currently only used for writing, allowing the setting offield_id
in the proper ParquetSchemaElement
.Setting multiple field ids for a single column name is not allowed.
Field ids are not typically configured by end users.
- Parameters:
columnName
- the Deephaven column namefieldId
- the field id
-
setCompressionCodecName
-
setMaximumDictionaryKeys
Set the maximum number of unique keys the writer should add to a dictionary page before switching to non-dictionary encoding; never evaluated for non-String columns, ignored ifuse dictionary
is set for the column.- Parameters:
maximumDictionaryKeys
- The maximum number of dictionary keys; must be>= 0
-
setMaximumDictionarySize
Set the maximum number of bytes the writer should add to the dictionary before switching to non-dictionary encoding; never evaluated for non-String columns, ignored ifuse dictionary
is set for the column.- Parameters:
maximumDictionarySize
- The maximum size of dictionary (in bytes); must be>= 0
-
setIsLegacyParquet
-
setTargetPageSize
-
setIsRefreshing
-
setSpecialInstructions
-
setGenerateMetadataFiles
Set whether to generate "_metadata" and "_common_metadata" files while writing parquet files. On setting this parameter,- When writing a single parquet file, metadata files will be generated in the same parent directory as the parquet file.
- When writing multiple parquet files in a single write call, the writing code insists that all parquet files should be written to the same parent directory, and only then metadata files will be generated in the same parent directory.
- When writing key-value partitioned parquet data, metadata files are generated in the root directory of the partitioned parquet files.
-
setBaseNameForPartitionedParquetData
public ParquetInstructions.Builder setBaseNameForPartitionedParquetData(String baseNameForPartitionedParquetData) Set the base name for partitioned parquet data. This is used to generate the file name for partitioned parquet files, and therefore, this parameter is only used when writing partitioned parquet data. Users can provide the following tokens to be replaced in the base name:- The token "{i}" will be replaced with an automatically incremented integer for files in a directory. For example, a base name of "table-{i}" will result in files named like "PC=partition1/table-0.parquet", "PC=partition1/table-1.parquet", etc., where PC is a partitioning column.
- The token "{uuid}" will be replaced with a random UUID. For example, a base name of "table-{uuid}" will result in files named like "table-8e8ab6b2-62f2-40d1-8191-1c5b70c5f330.parquet".
- The token "{partitions}" will be replaced with an underscore-delimited, concatenated string of partition values. For example, a base name of "{partitions}-table" will result in files like "PC1=partition1/PC2=partitionA/PC1=partition1_PC2=partitionA-table.parquet", where "PC1" and "PC2" are partitioning columns.
-
setFileLayout
Set the expected file layout when reading a parquet file or a directory. This info can be used to skip some computations to deduce the file layout from the source directory structure. -
setTableDefinition
- When reading a parquet file, this corresponds to the table definition to use instead of the one implied by the parquet file being read. Providing a definition can help save additional computations to deduce the table definition from the parquet files as well as from the directory layouts when reading partitioned data.
- When writing a parquet file, this corresponds to the table definition to use instead of the one implied by the table being written
null
values. -
addIndexColumns
Add a list of columns to persist together as indexes. The write operation will store the index info as sidecar tables. This argument is used to narrow the set of indexes to write, or to be explicit about the expected set of indexes present on all sources. Indexes that are specified but missing will be computed on demand. -
addAllIndexColumns
Adds provided lists of columns to persist together as indexes. This method accepts anIterable
of lists, where each list represents a group of columns to be indexed together.The write operation will store the index info as sidecar tables. This argument is used to narrow the set of indexes to write, or to be explicit about the expected set of indexes present on all sources. Indexes that are specified but missing will be computed on demand. To prevent the generation of index files, provide an empty iterable.
-
setOnWriteCompleted
public ParquetInstructions.Builder setOnWriteCompleted(ParquetInstructions.OnWriteCompleted onWriteCompleted) Adds a callback to be executed when on completing each parquet data file write (excluding the index and metadata files). -
build
-