Class ParquetInstructions

java.lang.Object
io.deephaven.parquet.table.ParquetInstructions
All Implemented Interfaces:
ColumnToCodecMappings

public abstract class ParquetInstructions extends Object implements ColumnToCodecMappings
This class provides instructions intended for read and write parquet operations (which take it as an optional argument) specifying desired transformations. Examples are mapping column names and use of specific codecs during (de)serialization.
  • Field Details

    • DEFAULT_COMPRESSION_CODEC_NAME

      public static final String DEFAULT_COMPRESSION_CODEC_NAME
    • DEFAULT_MAXIMUM_DICTIONARY_KEYS

      public static final int DEFAULT_MAXIMUM_DICTIONARY_KEYS
      See Also:
    • DEFAULT_MAXIMUM_DICTIONARY_SIZE

      public static final int DEFAULT_MAXIMUM_DICTIONARY_SIZE
      See Also:
    • MIN_TARGET_PAGE_SIZE

      public static final int MIN_TARGET_PAGE_SIZE
    • DEFAULT_TARGET_PAGE_SIZE

      public static final int DEFAULT_TARGET_PAGE_SIZE
    • EMPTY

      public static final ParquetInstructions EMPTY
  • Method Details

    • getColumnNameFromParquetColumnNameOrDefault

      public final String getColumnNameFromParquetColumnNameOrDefault(String parquetColumnName)
    • getParquetColumnNameFromColumnNameOrDefault

      public abstract String getParquetColumnNameFromColumnNameOrDefault(String columnName)
    • getColumnNameFromParquetColumnName

      public abstract String getColumnNameFromParquetColumnName(String parquetColumnName)
    • getCodecName

      public abstract String getCodecName(String columnName)
      Specified by:
      getCodecName in interface ColumnToCodecMappings
    • getCodecArgs

      public abstract String getCodecArgs(String columnName)
      Specified by:
      getCodecArgs in interface ColumnToCodecMappings
    • useDictionary

      public abstract boolean useDictionary(String columnName)
      Returns:
      A hint that the writer should use dictionary-based encoding for writing this column; never evaluated for non-String columns, defaults to false
    • getFieldId

      public abstract OptionalInt getFieldId(String columnName)
      The field ID for the given columnName.
      Parameters:
      columnName - the Deephaven column name
      Returns:
      the field id
    • getSpecialInstructions

      public abstract Object getSpecialInstructions()
    • getCompressionCodecName

      public abstract String getCompressionCodecName()
    • getMaximumDictionaryKeys

      public abstract int getMaximumDictionaryKeys()
      Returns:
      The maximum number of unique keys the writer should add to a dictionary page before switching to non-dictionary encoding; never evaluated for non-String columns, ignored if useDictionary(String)
    • getMaximumDictionarySize

      public abstract int getMaximumDictionarySize()
      Returns:
      The maximum number of bytes the writer should add to a dictionary before switching to non-dictionary encoding; never evaluated for non-String columns, ignored if useDictionary(String)
    • isLegacyParquet

      public abstract boolean isLegacyParquet()
    • getTargetPageSize

      public abstract int getTargetPageSize()
    • isRefreshing

      public abstract boolean isRefreshing()
      Returns:
      if the data source is refreshing
    • generateMetadataFiles

      public abstract boolean generateMetadataFiles()
      Returns:
      should we generate "_metadata" and "_common_metadata" files while writing parquet files?
    • getFileLayout

      public abstract Optional<ParquetInstructions.ParquetFileLayout> getFileLayout()
    • getTableDefinition

      public abstract Optional<TableDefinition> getTableDefinition()
    • getIndexColumns

      public abstract Optional<Collection<List<String>>> getIndexColumns()
    • withTableDefinition

      public abstract ParquetInstructions withTableDefinition(TableDefinition tableDefinition)
      Creates a new ParquetInstructions object with the same properties as the current object but definition set as the provided TableDefinition.
    • withLayout

      public abstract ParquetInstructions withLayout(ParquetInstructions.ParquetFileLayout fileLayout)
      Creates a new ParquetInstructions object with the same properties as the current object but layout set as the provided ParquetInstructions.ParquetFileLayout.
    • withTableDefinitionAndLayout

      public abstract ParquetInstructions withTableDefinitionAndLayout(TableDefinition tableDefinition, ParquetInstructions.ParquetFileLayout fileLayout)
      Creates a new ParquetInstructions object with the same properties as the current object but definition and layout set as the provided values.
    • baseNameForPartitionedParquetData

      public abstract String baseNameForPartitionedParquetData()
      Returns:
      the base name for partitioned parquet data. Check setBaseNameForPartitionedParquetData for more details about different tokens that can be used in the base name.
    • onWriteCompleted

      public abstract Optional<ParquetInstructions.OnWriteCompleted> onWriteCompleted()
      Returns:
      A callback to be executed when on completing each parquet data file write (excluding the index and metadata files). This callback gets invoked by the writing thread in a linear fashion.
    • sameColumnNamesAndCodecMappings

      @VisibleForTesting public static boolean sameColumnNamesAndCodecMappings(ParquetInstructions i1, ParquetInstructions i2)
    • builder

      public static ParquetInstructions.Builder builder()