Package io.deephaven.parquet.table
Class ParquetInstructions
java.lang.Object
io.deephaven.parquet.table.ParquetInstructions
- All Implemented Interfaces:
ColumnToCodecMappings
This class provides instructions intended for read and write parquet operations (which take it as an optional
argument) specifying desired transformations. Examples are mapping column names and use of specific codecs during
(de)serialization.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
static enum
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionabstract String
static ParquetInstructions.Builder
builder()
abstract boolean
abstract String
getCodecArgs
(String columnName) abstract String
getCodecName
(String columnName) abstract String
getColumnNameFromParquetColumnName
(String parquetColumnName) final String
getColumnNameFromParquetColumnNameOrDefault
(String parquetColumnName) abstract String
static String
static int
static int
static int
Get the current default target page size in bytes.abstract Optional<Collection<List<String>>>
abstract int
abstract int
abstract String
getParquetColumnNameFromColumnNameOrDefault
(String columnName) abstract Object
abstract Optional<TableDefinition>
abstract int
abstract boolean
abstract boolean
static boolean
static void
Deprecated.static void
setDefaultMaximumDictionaryKeys
(int maximumDictionaryKeys) Set the default forgetMaximumDictionaryKeys()
.static void
setDefaultMaximumDictionarySize
(int maximumDictionarySize) Set the default forgetMaximumDictionarySize()
.static void
setDefaultTargetPageSize
(int newDefaultSizeBytes) Set the default target page size (in bytes) used to section rows of data into pages during column writing.abstract boolean
useDictionary
(String columnName) abstract ParquetInstructions
withLayout
(ParquetInstructions.ParquetFileLayout fileLayout) Creates a newParquetInstructions
object with the same properties as the current object but layout set as the providedParquetInstructions.ParquetFileLayout
.abstract ParquetInstructions
withTableDefinition
(TableDefinition tableDefinition) Creates a newParquetInstructions
object with the same properties as the current object but definition set as the providedTableDefinition
.abstract ParquetInstructions
withTableDefinitionAndLayout
(TableDefinition tableDefinition, ParquetInstructions.ParquetFileLayout fileLayout) Creates a newParquetInstructions
object with the same properties as the current object but definition and layout set as the provided values.
-
Field Details
-
MIN_TARGET_PAGE_SIZE
public static final int MIN_TARGET_PAGE_SIZE -
EMPTY
-
-
Constructor Details
-
ParquetInstructions
public ParquetInstructions()
-
-
Method Details
-
setDefaultCompressionCodecName
Deprecated.Set the default forgetCompressionCodecName()
.- Parameters:
name
- The new default
-
getDefaultCompressionCodecName
- Returns:
- The default for
getCompressionCodecName()
-
setDefaultMaximumDictionaryKeys
public static void setDefaultMaximumDictionaryKeys(int maximumDictionaryKeys) Set the default forgetMaximumDictionaryKeys()
.- Parameters:
maximumDictionaryKeys
- The new default- See Also:
-
getDefaultMaximumDictionaryKeys
public static int getDefaultMaximumDictionaryKeys()- Returns:
- The default for
getMaximumDictionaryKeys()
-
setDefaultMaximumDictionarySize
public static void setDefaultMaximumDictionarySize(int maximumDictionarySize) Set the default forgetMaximumDictionarySize()
.- Parameters:
maximumDictionarySize
- The new default- See Also:
-
getDefaultMaximumDictionarySize
public static int getDefaultMaximumDictionarySize()- Returns:
- The default for
getMaximumDictionarySize()
-
setDefaultTargetPageSize
public static void setDefaultTargetPageSize(int newDefaultSizeBytes) Set the default target page size (in bytes) used to section rows of data into pages during column writing. This number should be no smaller thanMIN_TARGET_PAGE_SIZE
.- Parameters:
newDefaultSizeBytes
- the new default target page size.
-
getDefaultTargetPageSize
public static int getDefaultTargetPageSize()Get the current default target page size in bytes.- Returns:
- the current default target page size in bytes.
-
getColumnNameFromParquetColumnNameOrDefault
-
getParquetColumnNameFromColumnNameOrDefault
-
getColumnNameFromParquetColumnName
-
getCodecName
- Specified by:
getCodecName
in interfaceColumnToCodecMappings
-
getCodecArgs
- Specified by:
getCodecArgs
in interfaceColumnToCodecMappings
-
useDictionary
- Returns:
- A hint that the writer should use dictionary-based encoding for writing this column; never evaluated for non-String columns, defaults to false
-
getSpecialInstructions
-
getCompressionCodecName
-
getMaximumDictionaryKeys
public abstract int getMaximumDictionaryKeys()- Returns:
- The maximum number of unique keys the writer should add to a dictionary page before switching to
non-dictionary encoding; never evaluated for non-String columns, ignored if
useDictionary(String)
-
getMaximumDictionarySize
public abstract int getMaximumDictionarySize()- Returns:
- The maximum number of bytes the writer should add to a dictionary before switching to non-dictionary
encoding; never evaluated for non-String columns, ignored if
useDictionary(String)
-
isLegacyParquet
public abstract boolean isLegacyParquet() -
getTargetPageSize
public abstract int getTargetPageSize() -
isRefreshing
public abstract boolean isRefreshing()- Returns:
- if the data source is refreshing
-
generateMetadataFiles
public abstract boolean generateMetadataFiles()- Returns:
- should we generate "_metadata" and "_common_metadata" files while writing parquet files?
-
getFileLayout
-
getTableDefinition
-
getIndexColumns
-
withTableDefinition
Creates a newParquetInstructions
object with the same properties as the current object but definition set as the providedTableDefinition
. -
withLayout
Creates a newParquetInstructions
object with the same properties as the current object but layout set as the providedParquetInstructions.ParquetFileLayout
. -
withTableDefinitionAndLayout
public abstract ParquetInstructions withTableDefinitionAndLayout(TableDefinition tableDefinition, ParquetInstructions.ParquetFileLayout fileLayout) Creates a newParquetInstructions
object with the same properties as the current object but definition and layout set as the provided values. -
baseNameForPartitionedParquetData
- Returns:
- the base name for partitioned parquet data. Check
setBaseNameForPartitionedParquetData
for more details about different tokens that can be used in the base name.
-
sameColumnNamesAndCodecMappings
@VisibleForTesting public static boolean sameColumnNamesAndCodecMappings(ParquetInstructions i1, ParquetInstructions i2) -
builder
-
ParquetInstructions.Builder.setCompressionCodecName(String)
instead.