Class ParquetTools
-
Field Summary
Modifier and TypeFieldDescriptionstatic final ParquetInstructions
static final ParquetInstructions
static final ParquetInstructions
static final ParquetInstructions
Deprecated.Use LZ4_RAW instead, as explained herestatic final ParquetInstructions
static final ParquetInstructions
static final ParquetInstructions
static final ParquetInstructions
static final ParquetInstructions
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
deleteTable
(String path) Deletes a table on disk.static String
legacyGroupingFileName
(@NotNull File tableDest, @NotNull String columnName) Legacy method for generating a grouping file name.static Table
readParquetSchemaAndTable
(@NotNull File source, @NotNull ParquetInstructions readInstructionsIn, @Nullable org.apache.commons.lang3.mutable.MutableObject<ParquetInstructions> mutableInstructionsOut) static Table
readTable
(@NotNull TableLocationKeyFinder<ParquetTableLocationKey> locationKeyFinder, @NotNull ParquetInstructions readInstructions) Reads in a table from files discovered withlocationKeyFinder
using a definition either provided usingParquetInstructions
or built from the highest (bylocation key
order) location found, which must have non-null partition values for all partition keys.static Table
Reads in a table from a single parquet file, metadata file, or directory with recognized layout.static Table
readTable
(@NotNull String source, @NotNull ParquetInstructions readInstructions) Reads in a table from a single parquet file, metadata file, or directory with recognized layout.static void
writeKeyValuePartitionedTable
(@NotNull PartitionedTable partitionedTable, @NotNull String destinationDir, @NotNull ParquetInstructions writeInstructions) Write a partitioned table to disk in parquet format with all thekey columns
as "key=value" format in a nested directory structure.static void
writeKeyValuePartitionedTable
(@NotNull Table sourceTable, @NotNull String destinationDir, @NotNull ParquetInstructions writeInstructions) Write table to disk in parquet format withpartitioning columns
written as "key=value" format in a nested directory structure.static void
writeTable
(@NotNull Table sourceTable, @NotNull String destination) Write a table to a file.static void
writeTable
(@NotNull Table sourceTable, @NotNull String destination, @NotNull ParquetInstructions writeInstructions) Write a table to a file.static void
writeTables
(@NotNull Table[] sources, @NotNull String[] destinations, @NotNull ParquetInstructions writeInstructions) Write out tables to disk.
-
Field Details
-
UNCOMPRESSED
-
LZ4
Deprecated.Use LZ4_RAW instead, as explained here -
LZ4_RAW
-
LZO
-
GZIP
-
ZSTD
-
SNAPPY
-
BROTLI
-
LEGACY
-
-
Method Details
-
readTable
Reads in a table from a single parquet file, metadata file, or directory with recognized layout. The source provided can be a local file path or a URI to be resolved.This method attempts to "do the right thing." It examines the source to determine if it's a single parquet file, a metadata file, or a directory. If it's a directory, it additionally tries to guess the layout to use. Unless a metadata file is supplied or discovered in the directory, the highest (by
location key
order) location found will be used to infer schema.- Parameters:
source
- The path or URI of file or directory to examine- Returns:
- table
- See Also:
-
readTable
public static Table readTable(@NotNull @NotNull String source, @NotNull @NotNull ParquetInstructions readInstructions) Reads in a table from a single parquet file, metadata file, or directory with recognized layout. The source provided can be a local file path or a URI to be resolved.If the
ParquetInstructions.ParquetFileLayout
is not provided in theinstructions
, this method attempts to "do the right thing." It examines the source to determine if it's a single parquet file, a metadata file, or a directory. If it's a directory, it additionally tries to guess the layout to use. Unless a metadata file is supplied or discovered in the directory, the highest (bylocation key
order) location found will be used to infer schema.- Parameters:
source
- The path or URI of file or directory to examinereadInstructions
- Instructions for customizations while reading- Returns:
- table
- See Also:
-
writeTable
public static void writeTable(@NotNull @NotNull Table sourceTable, @NotNull @NotNull String destination) Write a table to a file. Data indexes to write are determined by those present onsourceTable
.- Parameters:
sourceTable
- source tabledestination
- destination path or URI; the file name should end in ".parquet" extension. If the path includes non-existing directories, they are created. If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
-
writeTable
public static void writeTable(@NotNull @NotNull Table sourceTable, @NotNull @NotNull String destination, @NotNull @NotNull ParquetInstructions writeInstructions) Write a table to a file. Data indexes to write are determined by those present onsourceTable
.- Parameters:
sourceTable
- source tabledestination
- destination path or URI; the file name should end in ".parquet" extension. If the path includes non-existing directories, they are created. If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usewriteInstructions
- instructions for customizations while writing
-
legacyGroupingFileName
@VisibleForTesting public static String legacyGroupingFileName(@NotNull @NotNull File tableDest, @NotNull @NotNull String columnName) Legacy method for generating a grouping file name. We used to place grouping files right next to the original table destination.- Parameters:
tableDest
- Destination path for the main table containing these grouping columnscolumnName
- Name of the grouping column- Returns:
- The relative grouping file path. For example, for table with destination
"table.parquet"
and grouping column"GroupingColName"
, the method will return"table_GroupingColName_grouping.parquet"
-
writeKeyValuePartitionedTable
public static void writeKeyValuePartitionedTable(@NotNull @NotNull Table sourceTable, @NotNull @NotNull String destinationDir, @NotNull @NotNull ParquetInstructions writeInstructions) Write table to disk in parquet format withpartitioning columns
written as "key=value" format in a nested directory structure. To generate these individual partitions, this method will callpartitionBy
on all the partitioning columns of provided table. The generated parquet files will have names of the format provided byParquetInstructions.baseNameForPartitionedParquetData()
. By default, any indexing columns present on the source table will be written as sidecar tables. To write only a subset of the indexes or add additional indexes while writing, useParquetInstructions.Builder.addIndexColumns(java.lang.String...)
.- Parameters:
sourceTable
- The table to partition and writedestinationDir
- The path or URI to destination root directory to store partitioned data in nested format. Non-existing directories are created.writeInstructions
- Write instructions for customizations while writing
-
writeKeyValuePartitionedTable
public static void writeKeyValuePartitionedTable(@NotNull @NotNull PartitionedTable partitionedTable, @NotNull @NotNull String destinationDir, @NotNull @NotNull ParquetInstructions writeInstructions) Write a partitioned table to disk in parquet format with all thekey columns
as "key=value" format in a nested directory structure. To generate the partitioned table, users can callpartitionBy
on the required columns. The generated parquet files will have names of the format provided byParquetInstructions.baseNameForPartitionedParquetData()
. By default, this method does not write any indexes as sidecar tables to disk. To write such indexes, useParquetInstructions.Builder.addIndexColumns(java.lang.String...)
.- Parameters:
partitionedTable
- The partitioned table to writedestinationDir
- The path or URI to destination root directory to store partitioned data in nested format. Non-existing directories are created.writeInstructions
- Write instructions for customizations while writing
-
writeTables
public static void writeTables(@NotNull @NotNull Table[] sources, @NotNull @NotNull String[] destinations, @NotNull @NotNull ParquetInstructions writeInstructions) Write out tables to disk. Data indexes to write are determined by those already present on the first source or those provided throughParquetInstructions.Builder.addIndexColumns(java.lang.String...)
. If all source tables have the same definition, this method will use the common definition for writing. Else, a definition must be provided through thewriteInstructions
.- Parameters:
sources
- The tables to writedestinations
- The destination paths or URIs. Any non-existing directories in the paths provided are created. If there is an error, any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use.writeInstructions
- Write instructions for customizations while writing
-
deleteTable
Deletes a table on disk.- Parameters:
path
- path to delete
-
readTable
public static Table readTable(@NotNull @NotNull TableLocationKeyFinder<ParquetTableLocationKey> locationKeyFinder, @NotNull @NotNull ParquetInstructions readInstructions) Reads in a table from files discovered withlocationKeyFinder
using a definition either provided usingParquetInstructions
or built from the highest (bylocation key
order) location found, which must have non-null partition values for all partition keys.Callers may prefer the simpler methods
readTable(String, ParquetInstructions)
with layout provided usingParquetInstructions.Builder.setFileLayout(io.deephaven.parquet.table.ParquetInstructions.ParquetFileLayout)
.- Parameters:
locationKeyFinder
- The source oflocation keys
to includereadInstructions
- Instructions for customizations while reading- Returns:
- The table
-
readParquetSchemaAndTable
@VisibleForTesting public static Table readParquetSchemaAndTable(@NotNull @NotNull File source, @NotNull @NotNull ParquetInstructions readInstructionsIn, @Nullable @Nullable org.apache.commons.lang3.mutable.MutableObject<ParquetInstructions> mutableInstructionsOut)
-