Initialization and updates

Deephaven code often looks like it runs sequentially, line by line, just like standard Python. But with ticking tables, that's not quite what happens — and this surprises many developers.

This guide explains:

Why ticking table code behaves differently than standard Python.
The two phases of execution: initialization and updates.
How to safely mix Python code with Deephaven ticking tables.
Clear "do this, don't do that" guidelines.

The surprise: code isn't always sequential

Consider this code:

# This LOOKS sequential, but it's not!
trades = kafka_consumer.consume(...)  # Create ticking table
first_price = trades.first_by("Symbol").to_pandas()["Price"][0]  # Get first price
print(f"First price: {first_price}")

In standard Python, this would work fine — create the data source, then read from it. But with a Kafka consumer (or any external data source), the table might be empty when you try to read from it. The data hasn't arrived yet.

This is the core issue: Deephaven table operations define relationships, not sequential data transformations.

Two phases: initialization and updates

Deephaven execution has two distinct phases:

Phase 1: Initialization

When your script runs:

Table operations execute and build the DAG (dependency graph).
Initial calculations run on any existing data.
External data sources may not have data yet.

Phase 2: Real-time updates

After initialization completes:

Data flows in from sources (Kafka, time tables, etc.).
Changes propagate through the DAG automatically.
Derived tables update incrementally.

┌─────────────────────────────────────────────────────────────────┐
│                    Script Execution                             │
├─────────────────────────────┬───────────────────────────────────┤
│  INITIALIZATION             │  REAL-TIME UPDATES                │
│  (script runs)              │  (automatic, continuous)          │
├─────────────────────────────┼───────────────────────────────────┤
│ • Tables created            │ • Data arrives from sources       │
│ • DAG built                 │ • Changes propagate through DAG   │
│ • Initial calcs on existing │ • Listeners fire                  │
│   data (may be empty!)      │ • UI updates                      │
│ • Python extractions run    │                                   │
│   HERE - possibly too early │                                   │
└─────────────────────────────┴───────────────────────────────────┘

The problem occurs when you try to extract data to Python during initialization, before the real-time data has arrived.

Prerequisites

Before reading this guide, you should understand:

For details on how Deephaven processes updates internally, see Incremental update model.

Do this, don't do that

Here's the quick reference. Details follow in the sections below.

DO	DON'T
Use table operations — let the engine handle consistency.	Extract data to Python immediately after creating a ticking table.
Wait for data with `await_update` before extracting.	Assume data exists just because you created the table.
Use `snapshot` to get a static copy for Python processing.	Call `to_pandas` directly on a ticking table.
Use listeners for reacting to updates.	Access other tables inside listeners.
Initialize state with `do_replay=True`.	Mix data from different update cycles.

Waiting for data: the `await_update` pattern

The most common initialization problem is trying to read data before it arrives. Use await_update to pause until the table receives an update:

from deephaven import kafka_consumer as ck
from deephaven import agg
import deephaven.pandas as dhpd
import deephaven.dtypes as dht

# Create a Kafka consumer (ticking table)
trades = ck.consume(
    {"bootstrap.servers": "redpanda:29092"},
    "trades",
    key_spec=ck.KeyValueSpec.IGNORE,
    value_spec=ck.json_spec(
        [
            ("ts", dht.Instant),
            ("symbol", dht.string),
            ("size", dht.double),
            ("price", dht.double),
        ]
    ),
    table_type=ck.TableType.ring(10000),
)

# Wait for the first update before extracting data
trades.await_update()

# Now it's safe to extract - data has arrived
first_ts = trades.agg_by([agg.min_("ts")])
first_ts_py = dhpd.to_pandas(first_ts).iloc[0, 0]

Note

await_update blocks until the table receives at least one update. In app mode startup scripts, this ensures your initialization code waits for data to flow in before proceeding.

You can also specify a timeout in milliseconds: trades.await_update(timeout=5000) returns False if the timeout is reached before an update occurs.

When to use `await_update`

App mode startup scripts — When you need to extract initial values from external data sources.
Sequential initialization — When later code depends on data from earlier tables.
One-time setup — When you need a value once at startup, not continuously.

When NOT to use `await_update`

In listeners — Listeners already receive data when it's ready.
For continuous processing — Use table operations or listeners instead.
In the console — Console code runs interactively; just wait and re-run.

Safe patterns for initialization

Pattern 1: Let the engine do the work

The safest approach is to express your logic using Deephaven table operations rather than extracting data to Python. The engine guarantees consistency for all table operations.

from deephaven import time_table

# DO: Use table operations - engine handles consistency
source = time_table("PT1s").update(["X = ii", "Y = X * 2"])

# Filtering, aggregating, joining - all safe
filtered = source.where("X > 10")
aggregated = source.view(["X", "Y"]).sum_by()

If you need custom Python logic, consider using it in a formula via Python functions:

def custom_calculation(x, y):
    # Your Python logic here
    return x**2 + y


# DO: Python functions in formulas - engine handles consistency
result = source.update("Z = custom_calculation(X, Y)")

Pattern 2: Use `snapshot` for point-in-time extraction

When you need a static copy of ticking data for Python processing, use snapshot:

from deephaven import time_table
import deephaven.pandas as dhpd

source = time_table("PT0.5s").update("X = ii")

# DO: Snapshot first, then extract
static_copy = source.snapshot()
df = dhpd.to_pandas(static_copy)

# Now df is a regular pandas DataFrame you can process safely
print(f"Captured {len(df)} rows")

The snapshot creates a static table that won't change, making it safe to extract and process in Python.

Pattern 3: Use `snapshot_when` for controlled timing

When you need to extract data at specific intervals or synchronized with other events, use snapshot_when:

from deephaven import time_table

# Data source updating frequently
source = time_table("PT0.1s").update("X = ii")

# "trigger" table will update every 5 seconds
trigger = time_table("PT5s").rename_columns("TriggerTime = Timestamp")

# on every change to "trigger", snapshot with a consistent view
periodic_snapshot = source.snapshot_when(trigger)

This is useful for:

Reducing update frequency for downstream processing.
Synchronizing data extraction with external systems.
Batching updates for efficiency.

Safe patterns for reacting to updates

Pattern 4: Use listeners with `TableUpdate`

When using table listeners, the TableUpdate object provides safe, consistent access to changed data:

from deephaven import time_table
from deephaven.table_listener import listen


def on_update(update, is_replay):
    # DO: Access data through TableUpdate methods
    added_data = update.added()
    if added_data:
        print(f"New rows: {added_data}")


source = time_table("PT1s").update("X = ii")
handle = listen(source, on_update)

Caution

Keep listener execution fast — listeners block further Update Graph processing while running.

The .added, .modified, .removed, and .modified_prev methods return dictionaries with column names as keys and NumPy arrays as values. This data is a consistent snapshot from the update cycle.

Pattern 5: Initialize state with `do_replay`

When setting up a listener, use do_replay=True to receive the table's existing data as the first update:

from deephaven import time_table
from deephaven.table_listener import listen


class StatefulListener:
    def __init__(self):
        self.running_sum = 0

    def on_update(self, update, is_replay):
        added = update.added()
        if added:
            # DO: Initialize from replay, then process updates
            self.running_sum += sum(added.get("X", []))
            if is_replay:
                print(f"Initialized with sum: {self.running_sum}")
            else:
                print(f"Updated sum: {self.running_sum}")


source = time_table("PT1s").update("X = ii")
listener = StatefulListener()

# do_replay=True delivers existing rows as first update
handle = listen(source, listener.on_update, do_replay=True)

Pattern 6: Initialize with a snapshot before listening

For complex initialization, take a snapshot first, then attach a listener for ongoing updates:

from deephaven import time_table
from deephaven.table_listener import listen
import deephaven.pandas as dhpd


def initialize_from_table(table):
    """Initialize state from current table contents."""
    snapshot = table.snapshot()
    df = dhpd.to_pandas(snapshot)
    initial_state = {"count": len(df), "sum": df["X"].sum() if len(df) > 0 else 0}
    print(f"Initialized: {initial_state}")
    return initial_state


def create_update_handler(state):
    """Create a listener that updates the state."""

    def on_update(update, is_replay):
        added = update.added()
        if added and "X" in added:
            state["count"] += len(added["X"])
            state["sum"] += sum(added["X"])
            print(f"Updated: {state}")

    return on_update


source = time_table("PT1s").update("X = ii")

# DO: Initialize from snapshot, then listen for updates
state = initialize_from_table(source)
handle = listen(source, create_update_handler(state))

Pattern 7: Use locking for direct table access (advanced)

For advanced use cases where you need to read directly from a ticking table, use the update graph lock:

from deephaven import time_table
from deephaven.update_graph import shared_lock
import deephaven.pandas as dhpd

source = time_table("PT0.5s").update("X = ii")


def extract_data_safely():
    # DO: Use lock for direct access to ticking data
    with shared_lock(source):
        # Table is guaranteed consistent while lock is held
        df = dhpd.to_pandas(source)
        return df


# Call when needed
df = extract_data_safely()
print(f"Extracted {len(df)} rows safely")

Caution

Keep lock duration minimal — it blocks update processing while held.

Unsafe patterns

DON'T read ticking data without synchronization

from deephaven import time_table
import deephaven.pandas as dhpd

source = time_table("PT0.1s").update("X = ii")

# DON'T: Direct extraction from ticking table
df = dhpd.to_pandas(source)  # May get inconsistent data!

What can go wrong:

The table might be mid-update when you read.
Some columns could have new values, others old values.
Row keys might have shifted.

Do instead: Use snapshot first, or use locking.

DON'T access other tables inside listeners

from deephaven.table_listener import listen


def bad_listener(update, is_replay):
    # DON'T: Access other tables in listeners
    other_data = other_table.to_pandas()  # Dangerous - may be inconsistent!


handle = listen(source, bad_listener)

What can go wrong:

Other tables are not guaranteed to have been consistently updated.
You may see partial or stale data from other tables.

Do instead: Only access the table you are listening to within a listener. If you need data from multiple tables, add them as dependencies:

from deephaven import time_table
from deephaven.table_listener import listen

source = time_table("PT1s").update("X = ii")

# DO: Create derived tables BEFORE the listener
filtered = source.where("X > 5")
derived = source.update("Y = X * 2")


def good_listener(update, is_replay):
    # Safe to READ from dependencies
    print(f"Source update: {update.added()}")


# Add derived tables as dependencies
handle = listen(source, good_listener, dependencies=[filtered, derived])

DON'T mix data from different update cycles

class BadListener:
    def __init__(self):
        self.previous_data = None

    def on_update(self, update, is_replay):
        current = update.added()
        if self.previous_data is not None:
            # DON'T: Compare data across cycles without care
            # self.previous_data is from a DIFFERENT cycle
            diff = compare(self.previous_data, current)
        self.previous_data = current  # Stale next cycle!

What can go wrong:

Row keys from previous cycles may no longer be valid.
Comparisons assume consistency that doesn't exist.

Do instead: Use modified_prev (which reads pre-shift values) for before/after comparisons within the same update, or store only computed results (not raw data) between cycles.

DON'T hold locks longer than necessary

from deephaven.update_graph import shared_lock
import deephaven.pandas as dhpd

with shared_lock(source):
    df = dhpd.to_pandas(source)
    # DON'T: Long computation while holding the lock
    result = expensive_ml_training(df)  # Blocks ALL updates!
    save_to_database(result)  # I/O while locked!

What can go wrong:

The entire update graph is blocked.
All ticking tables freeze.
User interfaces become unresponsive.

Do instead: Extract data quickly, release the lock, then process:

from deephaven.update_graph import shared_lock
import deephaven.pandas as dhpd

# DO: Minimize lock duration
with shared_lock(source):
    df = dhpd.to_pandas(source)  # Quick extraction

# Process OUTSIDE the lock
result = expensive_ml_training(df)
save_to_database(result)

Common scenarios

Scenario	Recommended approach
Wait for Kafka/external data before extracting	`table.await_update` before extraction.
Export ticking data to pandas periodically	`snapshot_when` with a trigger table, then `to_pandas`.
React to updates and call external API	Listener with `added` data access.
Aggregate ticking data with custom Python logic	Use table operations (`agg_by`, `update_by`), or Python functions in formulas.
Initialize ML model from table state	`do_replay=True` in listener, or snapshot before listening.
Join ticking data with external database	Snapshot the ticking table, then join in Python.
Debug why data looks inconsistent	Check if you're reading without synchronization; add logging to listener.

Understanding the update cycle

For debugging and advanced use cases, it helps to understand the update cycle in more detail:

Idle phase: Tables are consistent. Console commands run here (under exclusive lock automatically).
Update phase: Changes propagate through the DAG. Listeners fire. Data is in flux.

┌─────────────────────────────────────────────────────────────┐
│                      Update Cycle                            │
├─────────────┬─────────────────────────────┬─────────────────┤
│  Idle       │  Updating                   │  Idle           │
│  (safe)     │  (data in flux)             │  (safe)         │
├─────────────┼─────────────────────────────┼─────────────────┤
│ • Console   │ • Sources receive data      │ • Console       │
│   commands  │ • DAG propagates changes    │   commands      │
│ • Snapshots │ • Listeners fire            │ • Snapshots     │
│   complete  │ • DON'T read directly       │   complete      │
└─────────────┴─────────────────────────────┴─────────────────┘

The exclusive lock ensures you're in the idle phase. Listeners fire during the update phase but receive consistent data through TableUpdate.

Table listeners — React to table updates
Incremental update model — Deep dive into how updates work
Synchronization and locking — Advanced locking mechanisms
Table types — Static vs refreshing tables
Reduce update frequency — Optimize update processing

Initialization and updates

The surprise: code isn't always sequential

Two phases: initialization and updates

Phase 1: Initialization

Phase 2: Real-time updates

Prerequisites

Do this, don't do that

Waiting for data: the await_update pattern

When to use await_update

When NOT to use await_update

Safe patterns for initialization

Pattern 1: Let the engine do the work

Pattern 2: Use snapshot for point-in-time extraction

Pattern 3: Use snapshot_when for controlled timing

Safe patterns for reacting to updates

Pattern 4: Use listeners with TableUpdate

Pattern 5: Initialize state with do_replay

Pattern 6: Initialize with a snapshot before listening

Pattern 7: Use locking for direct table access (advanced)

Unsafe patterns

DON'T read ticking data without synchronization

DON'T access other tables inside listeners

DON'T mix data from different update cycles

DON'T hold locks longer than necessary

Common scenarios

Understanding the update cycle

Related documentation

Waiting for data: the `await_update` pattern

When to use `await_update`

When NOT to use `await_update`

Pattern 2: Use `snapshot` for point-in-time extraction

Pattern 3: Use `snapshot_when` for controlled timing

Pattern 4: Use listeners with `TableUpdate`

Pattern 5: Initialize state with `do_replay`