Package io.deephaven.parquet.base
Interface ColumnChunkReader
public interface ColumnChunkReader
-
Nested Class Summary
Modifier and TypeInterfaceDescriptionstatic interface
static interface
Used to iterate over column page readers for each page with the capability to set channel context to for reading the pages.static final class
-
Field Summary
Modifier and TypeFieldDescriptionstatic final org.apache.parquet.column.Dictionary
-
Method Summary
Modifier and TypeMethodDescriptionFunction<SeekableChannelContext,
org.apache.parquet.column.Dictionary> int
getMaxRl()
org.apache.parquet.internal.column.columnindex.OffsetIndex
getOffsetIndex
(SeekableChannelContext context) getPageAccessor
(org.apache.parquet.internal.column.columnindex.OffsetIndex offsetIndex, PageMaterializerFactory pageMaterializerFactory) getPageIterator
(PageMaterializerFactory pageMaterializerFactory) org.apache.parquet.schema.PrimitiveType
getType()
getURI()
@Nullable String
boolean
long
numRows()
long
boolean
-
Field Details
-
NULL_DICTIONARY
static final org.apache.parquet.column.Dictionary NULL_DICTIONARY
-
-
Method Details
-
columnName
String columnName()- Returns:
- The name of the column this ColumnChunk represents.
-
getURI
URI getURI()- Returns:
- The URI of the file this column chunk reader is reading from.
-
numRows
long numRows()- Returns:
- The number of rows in this ColumnChunk, or -1 if it's unknown.
-
numValues
long numValues()- Returns:
- The value stored under the corresponding ColumnMetaData.num_values field.
-
getMaxRl
int getMaxRl()- Returns:
- The depth of the number of nested repeated fields this column is a part of. 0 means this is a simple (non-repeating) field, 1 means this is a flat array.
-
hasOffsetIndex
boolean hasOffsetIndex()- Returns:
- Whether the column chunk has offset index information set in the metadata or not.
-
getOffsetIndex
org.apache.parquet.internal.column.columnindex.OffsetIndex getOffsetIndex(SeekableChannelContext context) - Parameters:
context
- The channel context to use for reading the offset index.- Returns:
- Get the offset index for a column chunk.
- Throws:
UnsupportedOperationException
- If the column chunk does not have an offset index.
-
getPageIterator
ColumnChunkReader.ColumnPageReaderIterator getPageIterator(PageMaterializerFactory pageMaterializerFactory) throws IOException - Parameters:
pageMaterializerFactory
- The factory to use for constructing page materializers.- Returns:
- An iterator over individual parquet pages.
- Throws:
IOException
-
getPageAccessor
ColumnChunkReader.ColumnPageDirectAccessor getPageAccessor(org.apache.parquet.internal.column.columnindex.OffsetIndex offsetIndex, PageMaterializerFactory pageMaterializerFactory) - Parameters:
pageMaterializerFactory
- The factory to use for constructing page materializers.- Returns:
- An accessor for individual parquet pages which uses the provided offset index.
-
usesDictionaryOnEveryPage
boolean usesDictionaryOnEveryPage()- Returns:
- Whether this column chunk uses a dictionary-based encoding on every page.
-
getDictionarySupplier
Function<SeekableChannelContext,org.apache.parquet.column.Dictionary> getDictionarySupplier()- Returns:
- Supplier for a Parquet dictionary for this column chunk
- ApiNote:
- The result will never return
null
. It will instead supplyNULL_DICTIONARY
.
-
getType
org.apache.parquet.schema.PrimitiveType getType() -
getVersion
- Returns:
- The "version" string from deephaven specific parquet metadata, or null if it's not present.
-
getChannelsProvider
SeekableChannelsProvider getChannelsProvider()- Returns:
- The channel provider for this column chunk reader.
-