Derived Table Writer

A Derived Table Writer (DTW) enables real-time data transformation and persistence in Deephaven Enterprise. It listens to live tables, applies transformations, and writes the results to persistent storage with exactly-once delivery guarantees.

When to use a Derived Table Writer

Use a Derived Table Writer when you need to:

  • Transform and persist streaming data in real-time with exactly-once guarantees.
  • Resume processing after system restarts without data loss or duplication.
  • Preserve computed results from complex analytics or custom logic.
  • Downsample high-frequency data into time-based aggregations.
  • Ensure exactly-once delivery of processed data to downstream systems.

This guide demonstrates a complete end-to-end example using market data processing to show you how to:

  1. Understand core concepts and partitioning
  2. Set up the necessary infrastructure components
  3. Transform and persist real-time data
  4. Access and query the results

Understanding partitioning in Deephaven

Before working with DTW, it's essential to understand how Deephaven organizes data using a multi-level partitioning system. This organization improves query performance, enables efficient data storage, and facilitates data lifecycle management.

Date partitioning

The most important partition in Deephaven is the Date partition. Every table written with DTW requires a special Date column which serves as the primary partitioning key.

  • The Date column physically separates data into different directories on disk.
  • Queries that filter on the Date column are much faster as they only scan relevant partitions.
  • Data lifecycle management uses Date partitioning for retention policies.

When using DTW, you specify the Date partition value with the columnPartition parameter, which is typically set to today() for current data:

Internal partitioning

Within each Date partition, Deephaven further organizes data using internal partitions, specified by the internalPartition parameter.

  • Internal partitions group related data together for better organization.
  • They facilitate parallel processing across different data streams.
  • They enable separate handling of different data categories or sources.

For example, you might use different internal partitions for:

  • Different data feeds ("nyse", "nasdaq")
  • Different processing stages ("raw", "filtered", "aggregated")
  • Different use cases ("live", "backtest", "research")

Internal partitions are specified as strings:

Partitioned tables

For high-volume data processing across multiple streams (e.g., processing many stock symbols in parallel), Deephaven offers partitioned tables that leverage both date and key-based partitioning:

Partitioned tables maintain separate sequence tracking for each partition, allowing for independent, parallel processing.

Core concepts

Input table requirements

Derived Table Writer accepts only specific table types:

  • Add-only tables: Tables where rows are only added, never modified or removed.
  • Blink tables: Tables that show only the most recent changes, with automatic cleanup.

Sequence numbers and exactly-once delivery

For guaranteed exactly-once delivery, your source table must include a sequence column with strictly ascending values:

Without sequence numbers, rows may be duplicated or lost during system restarts.

Namespaces and system tables

Derived Table Writer writes to system tables within specific namespaces. System tables provide:

  • Persistent storage across restarts.
  • Automatic partitioning by date.
  • Integration with Deephaven's query engine.

End-to-end example: Real-time market data processing

This section demonstrates a complete DTW implementation using market data processing. We'll build a system that ingests real-time market data, applies transformations, and persists the results with exactly-once delivery guarantees.

Create a sample data source

First, let's create a simulated market data feed that mimics real trading data:

This creates a table with:

  • Symbol: Rotating between four major stocks.
  • Price: Random prices around $100-110.
  • Volume: Random trade volumes between 100-1000 shares.
  • Timestamp: Current time for each row.
  • Sequence: Ascending sequence numbers for exactly-once delivery.

Design your data transformation

Next, we'll add calculated fields that provide additional insights:

Our transformations add:

  • DollarVolume: Total dollar value of each trade.
  • PriceChange: Price movement from the previous trade.

Set up Data Import Server and schema

Please refer to the Setup section for detailed instructions on setting up your Data Import Server and schema.

Write the ingestion query

Create your ingestion query that connects the source data to the Derived Table Writer:

This example demonstrates the core DTW pattern: transform data, configure persistence, and access results.

Setup

Schema preparation

Create a schema that defines your table structure. The simplest approach is to derive it from an existing table:

Deploy the schema using dhconfig:

Alternative: Add schema programmatically:

Storage directory setup

Create a dedicated storage directory for your Data Import Server:

Important considerations:

  • Directory must be exclusively owned by the DIS.
  • The dbmerge account needs read/write access for Persistent Queries.
  • Use the pattern /db/dataImportServers/[DIS_Name]/ for consistency.

Data Import Server configuration

Register your DIS with the routing system:

Configuration options:

  • --name: Unique identifier for your DIS.
  • --claim: Namespace.TableName that this DIS will handle.
  • Dynamic ports are used by default (recommended).
  • Private storage requires specifying the directory path in your code.

For detailed configuration options, see Add a Data Import Server.

Writing the ingestion query

Create your ingestion query that connects the source data to the Derived Table Writer:

Examples

JSON message processing

This example shows how to parse JSON messages and persist structured data:

Example JSON message format:

This approach uses Jackson's tree parsing for flexibility without requiring custom POJOs.

Trade data downsampling

This example demonstrates downsampling high-frequency trade data into minute-level aggregations:

Key features:

  • Time bucketing: Groups trades into 1-minute intervals.
  • OHLC calculation: Computes open, high, low, close prices.
  • Volume aggregation: Sums share and dollar volumes.
  • Complete intervals only: Excludes the current (incomplete) minute.
  • Restart safety: Uses sequence numbers for exactly-once delivery.

Handling restarts and resumption

For production systems, implement restart logic to resume processing from the last checkpoint:

Restart benefits:

  • No data loss: Resumes from last processed sequence.
  • No duplicates: Sequence tracking prevents reprocessing.
  • Automatic checkpointing: DTW manages sequence state.
  • Feed integration: Can pass sequence to external data feeds.

Accessing persisted data

Retrieve ingested data like any other Deephaven table:

Production considerations

Partitioned table handling

When processing data with multiple independent streams, use partitioned tables for better performance and organization:

Benefits of partitioned ingestion:

  • Independent processing: Each partition maintains its own sequence numbers.
  • Parallel execution: Partitions process simultaneously.
  • Scalability: Handles high-volume multi-stream data efficiently.

Monitoring and troubleshooting

Common error patterns

1. Sequence number violations:

2. Schema mismatches:

3. Storage directory permissions:

Performance monitoring

Monitor your DTW performance with these queries:

Debugging ingestion issues

Enable detailed logging:

Verify DIS status:

Production deployment considerations

High availability setup

For production environments, consider these deployment patterns:

1. Multiple DIS instances:

2. Load balancing:

Backup and recovery

Schema backup:

Data verification:

Operational procedures

Daily maintenance:

Capacity planning:

Reference

API methods

DerivedTableWriter.ingestTable()

DerivedTableWriter.ingestPartitionedTable()

Configuration options

DerivedTableWriter.Options:

  • sequenceColumn(String): Specify sequence column for exactly-once delivery.
  • transactionsEnabled(boolean): Enable atomic transactions for multi-row operations.
  • logInitial(boolean): Whether to log initial table state on startup.
  • chunkSize(int): Set processing chunk size for performance tuning.
  • lastBy(String, Collection<String>): Configure last-by state tracking.
  • Additional options available in the Javadoc.

DerivedTableWriter.PartitionedOptions:

  • columnPartitionKey(String): Column containing partition names for dynamic partitioning.
  • columnPartitionValue(String): Fixed partition value for all partitions.
  • internalPartitionKey(String): Column containing internal partition names for dynamic partitioning.
  • internalPartitionValue(String): Fixed internal partition value for all partitions.
  • Additional options available in the Javadoc.

Transaction support

For data that requires atomic operations across multiple rows (e.g., position snapshots), enable transactions:

When to use transactions:

  • Position snapshots that must be consistent across symbols.
  • Order book updates that span multiple price levels.
  • Any multi-row data where partial updates would be incorrect.