read_csv
The read_csv
method will read a CSV file into an in-memory table.
Syntax
read_csv(
path: str,
header: dict[str, dht.DType] = None,
headless: bool = False,
header_row: int = 0
skip_rows: int = 0,
num_rows: int = MAX_LONG,
ignore_empty_lines: bool = False,
allow_missing_columns: bool = False,
ignore_excess_columns: bool = False,
delimiter: str = ",",
quote: str = '"',
ignore_surrounding_spaces: bool = True,
trim: bool = False,
) -> Table:
The deephaven
package's read_csv
method is identical in function to deephaven.csv.read
; however, read_csv
is the preferred method as it differentiates the method from deephaven.parquet.read
.
Parameters
Parameter | Type | Description |
---|---|---|
path | str | The file to load into a table. Note that compressed files are accepted: paths ending in ".tar.zip", ".tar.bz2", ".tar.gz", ".tar.7z", ".tar.zst", ".zip", ".bz2", ".gz", ".7z", ".zst", or ".tar" will automatically be decompressed before they are read. |
header optional | dict | Define a dictionary for the header and the data type: |
headless optional | bool |
|
header_row optional | int | The header row number:
|
skip_rows optional | int | The number of rows to skip before processing data. Default is none. |
num_rows optional | int | The maximum number of rows to process. Default is all rows in a file. |
ignore_empty_lines optional | bool |
|
allow_missing_columns optional | bool |
|
ignore_excess_columns optional | bool |
|
delimiter optional | char | The delimiter for the file.
|
quote optional | char | The char surrounding a string value. Default is |
ignore_surrounding_spaces optional | bool | Trim leading and trailing blanks from non-quoted values. Default is |
trim optional | bool | Trim leading and trailing blanks from inside quoted values. Default is |
charset optional | str | The character set. Default is |
csvSpecs optional | CsvSpecs | Specifications for how to load the CSV file. |
Only one format parameter can be used at a time.
Returns
A new in-memory table from a CSV file.
Examples
In this guide, we read data from locations relative to the base of the Docker container. See Docker data volumes to learn more about the relation between locations in the container and the local file system.
In the following example, write_csv
writes the source table to /data/file.csv
, and read_csv
loads the file into a Deephaven table.
from deephaven import new_table
from deephaven.column import string_col, int_col, double_col
from deephaven import read_csv, write_csv
from deephaven.constants import NULL_INT
source = new_table(
[
string_col("X", ["A", "B", None, "C", "B", "A", "B", "B", "C"]),
int_col("Y", [2, 4, 2, 1, 2, 3, 4, NULL_INT, 3]),
int_col("Z", [55, 76, 20, NULL_INT, 230, 50, 73, 137, 214]),
]
)
write_csv(source, "/data/file.csv")
result = read_csv("/data/file.csv")
- source
- result
In the following examples, the example data found in Deephaven's example repository will be used. Follow the instructions in the README to download the data to the proper location for use with Deephaven.
In the following example, read_csv
is used to load the file DeNiro CSV into a Deephaven table.
from deephaven import read_csv
result = read_csv(
"https://media.githubusercontent.com/media/deephaven/examples/main/DeNiro/csv/deniro.csv"
)
- result
Any character can be used as a delimiter. The pipe and tab characters (|
and \t
) are common. In the following example, the second input parameter is used to read pipe- and tab-delimited files into memory.
from deephaven import read_csv
result_psv = read_csv(
"https://raw.githubusercontent.com/deephaven/examples/main/DeNiro/csv/deniro.psv",
delimiter="|",
)
result_tsv = read_csv(
"https://raw.githubusercontent.com/deephaven/examples/main/DeNiro/csv/deniro.tsv",
delimiter="\t",
)
- result_psv
- result_tsv