Can I integrate custom data sources with Deephaven?

Yes, you can integrate custom data sources with Deephaven. While Deephaven includes a proprietary columnar store for persistent historical and intraday data, you can integrate your own data stores to leverage Deephaven's efficient engine, analytics, and data visualization capabilities.

There are three main integration approaches:

  • Static in-memory tables - Similar to CSV and JDBC imports.
  • Dynamic in-memory tables - For real-time data feeds like multicast distribution systems.
  • Lazily-loaded on-disk tables - For large datasets like Apache Parquet files.

Understanding Deephaven table structure

Each Deephaven table consists of:

  • A RowSet - An ordered set of long keys representing valid row addresses.
  • Named ColumnSources - A map from column name to ColumnSource, which acts as a dictionary from row key to cell value.

In Python, you typically use higher-level APIs like new_table(), DynamicTableWriter, or pandas integration rather than working directly with RowSets and ColumnSources.

Static in-memory tables

For simple static data sources, use new_table() to create tables from Python data structures.

Here's an example of creating a static table from custom data:

The new_table() function automatically infers column types from the provided data.

Alternative: Using pandas

You can also create tables from pandas DataFrames:

Dynamic in-memory tables

Dynamic tables allow you to integrate real-time data feeds. These tables update on each Deephaven update cycle and notify downstream operations of changes.

The easiest way to create dynamic tables in Python is using DynamicTableWriter:

Using table replayer for time-based data

For replaying historical data or simulating real-time feeds:

Advanced: Java interop for custom ColumnSources

For advanced use cases requiring custom data loading logic, you can use Java interop to create custom ColumnSources. This approach is similar to the Groovy examples but uses Python's Java integration.

Warning

This is an advanced technique requiring knowledge of Deephaven's Java internals. For most use cases, new_table(), DynamicTableWriter, or pandas integration are recommended.

Working with external data sources

Reading from custom file formats

For custom file formats, read the data into Python structures and use new_table():

Streaming data from APIs

For streaming data from external APIs, use DynamicTableWriter with a background thread:

Note

These FAQ pages contain answers to questions about Deephaven Community Core that our users have asked in our Community Slack. If you have a question that is not in our documentation, join our Community and we'll be happy to help!