Class ParquetTableLocationKey

All Implemented Interfaces:
LogOutputAppendable, ImmutableTableLocationKey, TableLocationKey, NamedImplementation, Comparable<TableLocationKey>

public class ParquetTableLocationKey extends URITableLocationKey
TableLocationKey implementation for use with data stored in the parquet format.
  • Constructor Details

    • ParquetTableLocationKey

      public ParquetTableLocationKey(@NotNull @NotNull File file, int order, @Nullable @Nullable Map<String,Comparable<?>> partitions, @NotNull @NotNull ParquetInstructions readInstructions)
      Construct a new ParquetTableLocationKey for the supplied file and partitions.
      Parameters:
      file - The parquet file that backs the keyed location. Will be adjusted to an absolute path.
      order - Explicit ordering index, taking precedence over other fields
      partitions - The table partitions enclosing the table location keyed by this. Note that if this parameter is null, the location will be a member of no partitions. An ordered copy of the map will be made, so the calling code is free to mutate the map after this call
      readInstructions - the instructions for customizations while reading
    • ParquetTableLocationKey

      public ParquetTableLocationKey(@NotNull @NotNull URI parquetFileUri, int order, @Nullable @Nullable Map<String,Comparable<?>> partitions, @NotNull @NotNull ParquetInstructions readInstructions)
      Construct a new ParquetTableLocationKey for the supplied parquetFileUri and partitions.
      Parameters:
      parquetFileUri - The parquet file that backs the keyed location. Will be adjusted to an absolute path.
      order - Explicit ordering index, taking precedence over other fields
      partitions - The table partitions enclosing the table location keyed by this. Note that if this parameter is null, the location will be a member of no partitions. An ordered copy of the map will be made, so the calling code is free to mutate the map after this call
      readInstructions - the instructions for customizations while reading
  • Method Details

    • getImplementationName

      public String getImplementationName()
      Description copied from interface: NamedImplementation

      Get a name for the implementing class. Useful for abstract classes that implement LogOutputAppendable or override toString.

      The default implementation is correct, but not suitable for high-frequency usage.

      Specified by:
      getImplementationName in interface NamedImplementation
      Overrides:
      getImplementationName in class URITableLocationKey
      Returns:
      A name for the implementing class
    • verifyFileReader

      public boolean verifyFileReader()
      Returns true if a previous ParquetFileReader has been created, or if one was successfully created on-demand.

      When false, this may mean that the file:

      1. does not exist, or is otherwise inaccessible
      2. is in the process of being written, and is not yet a valid parquet file
      3. is _not_ a parquet file
      4. is a corrupt parquet file
      Callers wishing to handle these cases more explicit may call ParquetTools.getParquetFileReaderChecked(URI, ParquetInstructions).
      Returns:
      true if the file reader exists or was successfully created
    • getFileReader

      public ParquetFileReader getFileReader()
      Get a previously-set or on-demand created ParquetFileReader for this location key's file.
      Returns:
      A ParquetFileReader for this location key's file.
    • setFileReader

      public void setFileReader(ParquetFileReader fileReader)
      Set the ParquetFileReader that will be returned by getFileReader(). Pass null to force on-demand construction at the next invocation. Always clears cached ParquetMetadata and RowGroup indices.
      Parameters:
      fileReader - The new ParquetFileReader
    • getMetadata

      public org.apache.parquet.hadoop.metadata.ParquetMetadata getMetadata()
      Get a previously-set or on-demand created ParquetMetadata for this location key's file.
      Returns:
      A ParquetMetadata for this location key's file.
    • setMetadata

      public void setMetadata(org.apache.parquet.hadoop.metadata.ParquetMetadata metadata)
      Set the ParquetMetadata that will be returned by getMetadata() ()}. Pass null to force on-demand construction at the next invocation.
      Parameters:
      metadata - The new ParquetMetadata
    • getRowGroupIndices

      public int[] getRowGroupIndices()
      Get previously-set or on-demand created RowGroup indices for this location key's current ParquetFileReader.
      Returns:
      RowGroup indices for this location key's current ParquetFileReader.
    • setRowGroupIndices

      public void setRowGroupIndices(int[] rowGroupIndices)
      Set the RowGroup indices that will be returned by getRowGroupIndices()
      Parameters:
      rowGroupIndices - The new RowGroup indices