read_csv

The read_csv method will read a CSV file into an in-memory table.

Syntax

read_csv(
    path: str,
    header: dict[str, dht.DType] = None,
    headless: bool = False,
    header_row: int = 0
    skip_rows: int = 0,
    num_rows: int = MAX_LONG,
    ignore_empty_lines: bool = False,
    allow_missing_columns: bool = False,
    ignore_excess_columns: bool = False,
    delimiter: str = ",",
    quote: str = '"',
    ignore_surrounding_spaces: bool = True,
    trim: bool = False,
) -> Table:

Note

The deephaven package's read_csv method is identical in function to deephaven.csv.read; however, read_csv is the preferred method as it differentiates the method from deephaven.parquet.read.

Parameters

Parameter	Type	Description
path	str	The file to load into a table. Note that compressed files are accepted: paths ending in ".tar.zip", ".tar.bz2", ".tar.gz", ".tar.7z", ".tar.zst", ".zip", ".bz2", ".gz", ".7z", ".zst", or ".tar" will automatically be decompressed before they are read.
header optional	dict	Define a dictionary for the header and the data type: `[str, DataType]`. Default is `None`.
headless optional	bool	`False` (default) - first row contains header information. `True` - first row is included in your dataset.
header_row optional	int	The header row number: All the rows before it will be skipped. The default is 0. Must be 0 if headless is `True`, otherwise an exception will be raised
skip_rows optional	int	The number of rows to skip before processing data. Default is none.
num_rows optional	int	The maximum number of rows to process. Default is all rows in a file.
ignore_empty_lines optional	bool	`False` (default) - Empty lines are treated as errors. `True` - Empty lines in the CSV file are ignored.
allow_missing_columns optional	bool	`False` (default) - Missing columns are treated as errors. `True` - Missing columns in rows are treated as empty strings.
ignore_excess_columns optional	bool	`False` (default) - Extra columns in rows are treated as errors. `True` - Excess columns in rows are ignored.
delimiter optional	char	The delimiter for the file. `<delimiter>` is the delimiter being used by the text file. Any non-newline string can be specified (i.e.,`,`, `;`, `:`, `\`, `\|`, etc.). The default is `,`.
quote optional	char	The char surrounding a string value. Default is `\"`.
ignore_surrounding_spaces optional	bool	Trim leading and trailing blanks from non-quoted values. Default is `True`.
trim optional	bool	Trim leading and trailing blanks from inside quoted values. Default is `False`.
charset optional	str	The character set. Default is `utf-8`.
csvSpecs optional	CsvSpecs	Specifications for how to load the CSV file.

Note

Only one format parameter can be used at a time.

Returns

A new in-memory table from a CSV file.

Examples

Note

In this guide, we read data from locations relative to the base of the Docker container. See Docker data volumes to learn more about the relation between locations in the container and the local file system.

In the following example, write_csv writes the source table to /data/file.csv, and read_csv loads the file into a Deephaven table.

from deephaven import new_table
from deephaven.column import string_col, int_col, double_col
from deephaven import read_csv, write_csv
from deephaven.constants import NULL_INT

source = new_table(
    [
        string_col("X", ["A", "B", None, "C", "B", "A", "B", "B", "C"]),
        int_col("Y", [2, 4, 2, 1, 2, 3, 4, NULL_INT, 3]),
        int_col("Z", [55, 76, 20, NULL_INT, 230, 50, 73, 137, 214]),
    ]
)

write_csv(source, "/data/file.csv")

result = read_csv("/data/file.csv")

Note

In the following examples, the example data found in Deephaven's example repository will be used. Follow the instructions in the README to download the data to the proper location for use with Deephaven.

In the following example, read_csv is used to load the file DeNiro CSV into a Deephaven table.

from deephaven import read_csv

result = read_csv(
    "https://media.githubusercontent.com/media/deephaven/examples/main/DeNiro/csv/deniro.csv"
)

Any character can be used as a delimiter. The pipe and tab characters (| and \t) are common. In the following example, the second input parameter is used to read pipe- and tab-delimited files into memory.

from deephaven import read_csv

result_psv = read_csv(
    "https://raw.githubusercontent.com/deephaven/examples/main/DeNiro/csv/deniro.psv",
    delimiter="|",
)
result_tsv = read_csv(
    "https://raw.githubusercontent.com/deephaven/examples/main/DeNiro/csv/deniro.tsv",
    delimiter="\t",
)

read_csv

Syntax

Parameters

Returns

Examples

Related documentation