Vectorization and the recipe paradigm

Deephaven's query engine uses vectorized operations and a declarative "recipe" paradigm to achieve high performance on both static and real-time data. This guide explains the technical foundations of this approach and why it matters for your queries.

The recipe paradigm: Instead of writing step-by-step instructions that process data one element at a time, you define what result you want — like a recipe that describes the finished dish. Deephaven's engine then figures out how to compute it efficiently, processing data in optimized batches.

The paradigm shift: Imperative vs declarative

Traditional programming: Imperative SISD

In traditional Python, you write imperative code that processes one data element at a time. This is Single Instruction, Single Data (SISD) — each instruction operates on a single value:

This approach:

  • Executes instructions sequentially.
  • Processes one data element per instruction.
  • Requires explicit loops for multiple elements.
  • Creates intermediate Python objects for each value.

Deephaven: Declarative SIMD

Deephaven uses a declarative approach. The engine processes data in chunks — Single Instruction, Multiple Data (SIMD) — applying one operation across many values at once:

See empty_table and update for more details.

This approach:

  • Specifies what to compute, not how.
  • Processes data in optimized chunks (vectorization).
  • Enables SIMD-style operations across many values.
  • Avoids intermediate Python objects.

Why the recipe approach is faster

The recipe approach avoids the overhead of the Python interpreter for data-processing work:

  1. Vectorization - Processes multiple values per CPU instruction.
  2. No Python overhead - Computation stays in compiled code.
  3. Better memory access - Sequential columnar reads are cache-friendly.
  4. Parallelization - Engine can split work across cores.

What is vectorization?

CPU-level vectorization

Modern CPUs have special instructions that operate on multiple data elements simultaneously. For example, instead of adding two numbers at a time:

Vectorized CPUs can do:

How Deephaven enables vectorization

Deephaven's engine is designed to enable CPU vectorization:

  1. Columnar storage - Data for a column is stored contiguously in memory.
  2. Chunk-oriented processing - Operations work on blocks of data at once.
  3. Type-specific operations - Specialized code for each data type avoids type checks in inner loops.
  4. JIT compilation - The JVM can optimize and vectorize hot code paths.

By structuring engine operations as chunk-oriented kernels, Deephaven allows the JVM's JIT compiler to vectorize computations where possible.

The Chunk architecture

Deephaven moves data using a structure called a Chunk:

When you write:

The engine:

  1. Reads column X in chunks (e.g., 4096 values at a time).
  2. Applies the operation to each chunk (vectorized multiplication).
  3. Writes results to column Y in chunks.

This approach:

  • Amortizes memory access costs.
  • Enables vectorization.
  • Reduces per-element overhead.
  • Works efficiently with CPU caches.

The recipe paradigm: How it works

Recipes are specifications

When you write a Deephaven query:

See time_table for more details.

You're creating a specification (recipe) that says "Y should always equal X times 2". You're not executing a loop or directly computing values.

Lazy evaluation and dependency tracking

The engine builds a Directed Acyclic Graph (DAG) of dependencies:

When data ticks:

  1. New rows arrive in t1.
  2. Engine detects that t2 depends on t1.
  3. Engine automatically computes Y for the new rows.
  4. Updates propagate through the DAG.

This requires significant additional infrastructure with imperative loops — a loop executes once and stops, so you would need to build your own subscription and recomputation logic.

Update propagation example

See update_by and cum_sum for more details.

Watch this table in the UI. Every second:

  • A new row arrives in source.
  • XSquared is computed for the new row.
  • SumX is updated for the new row.
  • You wrote the recipe once, it runs forever.

Real-world example: Time operations

Here's a more complex example that demonstrates multiple concepts working together - time manipulation, chained operations, and Java function integration:

This example illustrates several key concepts:

  • Declarative recipes - Each .update() specifies what to compute, not how to loop.
  • Automatic propagation - All three tables (t1, t2, t3) update every second.
  • Chained operations - Tables build on each other through the DAG.
  • Real-time execution - New rows trigger automatic recomputation.
  • Java integration - Using epochNanosToInstant() from DateTimeUtils.
  • Type conversions - Converting between epoch nanos, Instants, and timestamps.

Every second, a new row arrives and all formulas execute automatically. The engine handles:

  • Dependency tracking between t1t2t3.
  • Type conversions and time arithmetic.
  • Efficient execution of all operations.

Query compilation

Under the hood, Deephaven:

  1. Parses your query string into an Abstract Syntax Tree (AST).
  2. Analyzes the AST to determine dependencies and types.
  3. Generates optimized Java code (or uses pre-compiled classes for simple operations).
  4. Compiles the generated code.
  5. Executes the compiled code on chunks of data.

For example, "Y = X * 2" might become:

This compiled code:

  • Has no Python overhead.
  • Can be JIT-optimized by the JVM.
  • Can be vectorized by the CPU.
  • Runs at native speed.

Real-time processing: The killer feature

Why recipes enable real-time

The recipe paradigm makes real-time processing trivial. Compare:

Loop approach (doesn't work for real-time):

Recipe approach (automatically handles updates):

Incremental computation

The engine is smart about updates. It doesn't recompute everything - it only processes what changed:

When a new row arrives:

  • Only the new row is processed.
  • All formulas are evaluated for that row.
  • Results are appended to output columns.
  • Nothing else is recomputed.

For updates or modifications:

  • Only affected rows are recomputed.
  • Dependencies are tracked automatically.
  • Downstream tables update accordingly.

Real-world example: Live aggregations

This query:

  • Processes streaming trade data.
  • Maintains separate rolling averages per symbol.
  • Updates automatically as new data arrives.
  • Would be extremely difficult to implement with loops.

Memory efficiency

No intermediate Python objects

Loop approach creates Python objects:

Recipe approach stays in native memory:

Column sharing and copy-on-write

Deephaven uses smart memory management:

See where for more details.

Deephaven tables can share their RowSet with other tables in the same update graph that contain the same row keys. This sharing avoids copying data unnecessarily.

Columnar vs row-oriented storage

Row-oriented (like Python lists of dicts):

  • Accessing column X requires skipping Y values.
  • Poor cache locality for column operations.
  • Can't vectorize efficiently.

Columnar (like Deephaven):

  • Column X is contiguous in memory.
  • Excellent cache locality.
  • Enables vectorization.

Common patterns: Technical details

Pattern: Element-wise operations

Engine execution:

  1. Reads X and Y columns in chunks.
  2. Applies vectorized operations chunk-by-chunk.
  3. Writes results to Z column.
  4. No Python overhead, no intermediate objects.

Pattern: Conditional operations

The ternary operator compiles to generated Java code with no Python interpreter overhead.

Pattern: Cross-row operations

These operations:

  • Maintain state efficiently.
  • Update incrementally when data ticks.
  • Would require significantly more code to implement with loops, and would lose automatic real-time propagation.
  • Are highly optimized in the engine.

When loops ARE appropriate

Valid use case: Data extraction

This is extraction, not transformation. The data is leaving Deephaven.

Valid use case: Control flow

You're using loops to control table creation, not to transform table data.

Invalid use case: Column transformations

Use .update() instead!

Performance best practices

1. Let the engine vectorize

Good - Vectorizable:

⚠️ Careful - Complex functions may not vectorize:

2. Minimize cross-language calls

Slow - Calls Python for every row:

Fast - Stays in compiled code:

3. Use appropriate operations

For rolling calculations, use update_by:

For aggregations, use dedicated methods:

4. Filter early

Advanced: JVM and vectorization

JIT compilation

The Java Virtual Machine (JVM) uses Just-In-Time (JIT) compilation to optimize hot code paths. For Deephaven queries:

  1. Initial execution - Code is interpreted.
  2. Profiling - JVM identifies hot methods.
  3. Compilation - Hot methods are compiled to native code.
  4. Optimization - Compiler applies vectorization, loop unrolling, etc.

This means:

  • First execution may be slower (compilation overhead).
  • Subsequent executions are much faster.
  • Long-running queries benefit most.

Key takeaways

  1. Think declaratively - Specify what to compute, not how to iterate.
  2. Recipes enable real-time - Declarative queries update automatically.
  3. Vectorization = performance - SIMD-style operations process multiple elements at once.
  4. No Python overhead - Computation stays in compiled code.
  5. Use loops for extraction, not transformation - Get data out, don't transform inside loops.

The paradigm shift:

  • Old way: "For each row, multiply X by 2 and store in Y".
  • Deephaven way: "Y should always equal X times 2".

This shift unlocks:

  • High performance through vectorization.
  • Automatic real-time updates.
  • Cleaner, more maintainable code.
  • Efficient memory usage.