Import and Export Data

Data I/O is mission-critical for any real-time data analysis platform. Deephaven supports a wide variety of data sources and formats, including CSV, Parquet, Kafka, and more. This document covers those formats in Deephaven.

CSV

Deephaven can read CSV files that exist locally or remotely. This example reads a local CSV file.

It can also write data to CSV. The code below writes that same table back to a CSV file.

Just to show that it's there:

Parquet

Apache Parquet is a columnar storage format that supports compression to store more data in less space. Deephaven supports reading and writing single, nested, and partitioned Parquet files. Parquet data can be stored locally or in S3.

The example below reads from a local Parquet file.

That same table can be written back to a Parquet file:

Just to show that it worked:

The example below reads a Parquet file from S3. This example uses MinIO as a local S3-compatible object store. The s3 module in Deephaven provides a way to specify how to connect to the S3 instance.

Kafka

Apache Kafka is a distributed event streaming platform that can be used to publish and subscribe to streams of records. Deephaven can consume and publish to Kafka streams. The code below consumes a stream.

Similarly, this code publishes the data in a Deephaven table to a Kafka stream.

Iceberg

Apache Iceberg is a high-performance, open table format for huge analytic datasets. Deephaven's deephaven.experimental.iceberg module allows you to interact with Iceberg tables. For more on Deephaven's Iceberg integration, see the Iceberg user guide.

The following example reads data from an existing Iceberg table into a Deephaven table. It uses a custom Docker deployment found here.

Similarly, this code writes a Deephaven table to an Iceberg table. If the target table does not exist, it will be created.

HTML

Deephaven tables can be converted into an HTML representation using the to_html function from the deephaven.html module. This is useful for displaying tables in web pages or for creating simple HTML reports.

Pandas DataFrames

Deephaven provides a seamless way to convert tables to Pandas DataFrames and vice-versa using the deephaven.pandas module. This is particularly useful when you want to leverage Pandas' extensive data manipulation and analysis capabilities or integrate with other Python libraries that operate on DataFrames.

To convert a Deephaven table to a Pandas DataFrame, use the to_pandas() function.

Note: Converting an entire large table to a Pandas DataFrame will load all data into memory. For very large tables, consider filtering or aggregating the data within Deephaven first before converting to a DataFrame to avoid potential memory issues.

You can also convert a Pandas DataFrame back to a Deephaven table using to_table() from the same module.

Function generated tables

Function generated tables are tables populated by a Python function. The function is reevaluated when source tables change or at a regular interval. The following example re-generates data in a table once per second.

Function generated tables, on their own, don't do any data I/O. However, Python functions evaluated at a regular interval to create a ticking table are a powerful tool for data ingestion from external sources like WebSockets, databases, and much more. Check out this blog post that uses WebSockets to stream data into Deephaven with function generated tables.