Your queries just got faster

Your query runs fine on one core, but what happens when Deephaven tries to split the work across eight? Or sixteen? In version 41, we've made a fundamental change to how filters and selectables behave - and it's going to make your queries faster without you lifting a finger.

Starting with Deephaven 41, queries run in parallel by default. In previous versions, Deephaven assumed all formulas required sequential processing. Now it assumes they can run in parallel. For most users, this change is invisible — your queries simply run faster. But if you have code that modifies shared variables or depends on row order, you'll need to make a small update.

What changed and why it matters

Deephaven's query engine can process different segments of a column in parallel during both initialization and updates. There's a catch, though. The engine can only parallelize stateless operations — those that produce the same output regardless of execution order or which thread runs them.

Stateless operations don't maintain internal state between invocations, making them safe for concurrent execution across multiple cores.

In versions before 41, Deephaven assumed all formulas required sequential processing by default. This was safe, but conservative — the engine couldn't automatically parallelize work across threads. With stateless as the new default, the engine can automatically parallelize more operations without any configuration.

The performance impact

Consider a simple filter operation on a 10-million-row table:

# This filter now runs in parallel by default
result = large_table.where("Price > 100 && Volume > 1000")

In previous versions, Deephaven would process this sequentially. Now it automatically divides the rows among CPU cores, with each core evaluating the filter for its assigned rows simultaneously.

The same applies to column calculations in update and select:

# Column calculations now parallelize by default
result = large_table.update("TotalValue = Price * Quantity")

Do you need to change your code?

Most users: No. If your filters and column expressions are pure calculations — they depend only on their inputs and don't maintain state between rows — you don't need to do anything. Your queries will simply run faster.

Some users: Yes. If your code relies on stateful behavior, you'll need to update it.

Quick check: Does your code use global variables, depend on row order, or modify external state? If yes, keep reading.

What makes an operation stateless?

An operation is stateless if it:

Doesn't read or modify global variables.
Doesn't depend on which row is processed first.
Produces the same output for the same input, regardless of when or how it runs.

These are all stateless and parallelize safely:

from deephaven import empty_table

# Create a sample table
source = empty_table(100).update(
    [
        "Price = randomDouble(50.0, 150.0)",
        "Quantity = randomInt(1, 100)",
        "FirstName = `User`",
        "LastName = `` + i",
        "Age = randomInt(18, 65)",
        "X = randomDouble(1.0, 100.0)",
    ]
)

# Pure column arithmetic
result1 = source.update("Total = Price * Quantity")

# String manipulation
result2 = source.update("FullName = FirstName + ` ` + LastName")

# Conditional logic
result3 = source.where("Age > 21")

# Built-in functions
result4 = source.update("Squared = sqrt(X)")

Example: A stateful operation that breaks

# BEFORE 41: This worked because operations were sequential by default
counter = 0


def get_and_increment():
    global counter
    counter += 1
    return counter


# In version 41+, this produces incorrect results!
result = table.update("ID = get_and_increment()")

With parallel execution, multiple threads increment counter simultaneously. You'll see gaps in the sequence, duplicate values, or values that don't follow the expected pattern.

How to force sequential execution

If you have legitimately stateful operations, use .with_serial() to force rows to be processed one at a time, in order:

from deephaven.table import Selectable
from deephaven import empty_table

counter = 0


def get_and_increment():
    global counter
    counter += 1
    return counter


# Force serial execution with .with_serial()
col = Selectable.parse("ID = get_and_increment()").with_serial()
result = empty_table(1_000_000).update(col)

For filters, construct a Filter object explicitly:

from deephaven.filters import Filter

# Force serial filter evaluation
my_filter = Filter.from_("my_stateful_condition(X)").with_serial()
result = table.where(my_filter)

Caution

You cannot use .with_serial() with view or update_view. These operations compute values on-demand when cells are accessed, so they cannot guarantee processing order. Use select or update instead when you need serial execution.

When operations depend on each other: Barriers

Sometimes you need one operation to complete before another starts — for example, when column A populates data that column B reads. Use barriers to control this ordering:

from deephaven.concurrency_control import Barrier
from deephaven.table import Selectable
from deephaven import empty_table

cache = {}


def init_cache(key):
    cache[key] = f"Value_{key}"
    return key


def use_cache(key):
    return cache.get(key, "Not found")


barrier = Barrier()

# Column A declares the barrier (must finish first)
col_a = Selectable.parse("A = init_cache(Key)").with_declared_barriers(barrier)

# Column B respects the barrier (waits for A to finish)
col_b = Selectable.parse("B = use_cache(Key)").with_respected_barriers(barrier)

source = empty_table(10).update("Key = i")
result = source.update([col_a, col_b])

With this barrier, column A processes all rows completely before column B starts. Both columns can still be parallelized internally — the barrier only controls the ordering between them.

Note

Serial operations automatically create barriers between each other by default. If you have two serial columns in the same update, the first finishes completely before the second starts.

Rethinking stateful patterns

Before marking operations as serial, consider whether you can restructure your logic to be stateless. Stateless operations are:

Faster: They parallelize automatically.
Simpler: No hidden dependencies between rows.
Safer: No race conditions or ordering issues.

Alternative: Use table operations for accumulation

Instead of accumulating state within a filter, use Deephaven's built-in aggregation and windowed operations:

from deephaven import empty_table, updateby as uby

# Create sample data
table = empty_table(100).update("Value = randomInt(1, 50)")

# Use update_by for running calculations
result = table.update_by([uby.cum_sum("RunningTotal = Value")])

# Then filter on the derived columns
filtered = result.where("RunningTotal <= 1000")

This approach keeps each operation stateless while achieving the same result — and it's more explicit about what your query is doing.

Understanding the parallelization model

Deephaven parallelizes queries in two phases:

Initialization parallelism

When you first run a table operation, the engine splits the work across the Operation Initialization Thread Pool. For a where or update on a large table, different chunks of rows are processed simultaneously on different cores.

Update parallelism

For ticking tables, the Update Graph Processor Thread Pool handles ongoing updates. This pool parallelizes in two ways:

Within each operation: Rows are divided among cores, just like during initialization.
Across operations: Independent tables in the update graph are computed simultaneously.

Both thread pools benefit from stateless operations. You can configure their sizes:

Property	Default	Description
`OperationInitializationThreadPool.threads`	-1 (all cores)	Threads for parallel initialization
`PeriodicUpdateGraph.updateThreads`	-1 (all cores)	Threads for parallel update processing

Configuration properties

You can change the default behavior using these properties:

QueryTable.statelessSelectByDefault — controls default for select and update.
QueryTable.statelessFiltersByDefault — controls default for filters.

Caution

In Python builds that use the GIL (global interpreter lock), parallelizing filters and selectables can negatively impact query performance. To prevent performance regressions, even stateless operations that use Python objects are not parallelized unless the Python build is free-threaded.

Quick reference

Scenario	Solution	Why
Pure column math	Default (parallel)	Thread-safe, no shared state
Global counter	`.with_serial()`	Needs sequential row processing
Column A must finish before Column B	Barriers	Controls cross-operation ordering
File I/O or logging	`.with_serial()`	Serialize access to shared resource
Non-thread-safe library	`.with_serial()`	Forces single-threaded access

Migration checklist

If you're upgrading to Deephaven 41:

Quick check: Does your code use global variables, depend on row order, or modify external state?
Test thoroughly: Run your existing queries and verify that results match expectations.
Add .with_serial(): For operations that need sequential processing, use Selectable.parse(...).with_serial().
Add barriers: If one operation must complete before another starts, use Barrier with .with_declared_barriers() and .with_respected_barriers().
Consider refactoring: Where possible, restructure stateful logic to use table operations like update_by instead.

The bigger picture

This change is part of Deephaven's ongoing work to maximize performance automatically. By making stateless the default, we're:

Reducing cognitive load: You don't need to think about parallelization for most queries.
Improving performance: More operations parallelize out of the box.
Encouraging best practices: Stateless operations are generally cleaner and more predictable.

The best code change is one you don't have to make. With stateless defaults, your existing queries get faster automatically.

Next steps

Review the version 41 release notes for the complete list of changes.
Learn more about parallelization in Deephaven.
Explore update-by operations for stateless alternatives to accumulation patterns.
See the ConcurrencyControl API for full details on .with_serial() and barriers.

Questions about migrating your queries? Join our Slack community — we're happy to help you take advantage of better parallelism.

How Deephaven 41 unlocks better parallelism with stateless defaults