Building scalable real-time systems: Understanding Deephaven's client-server architecture

When you're building a real-time data application, one of the first questions you face is deceptively simple: Where should the work happen? Should your Python script do the heavy lifting, or should the server handle it? Do you need one powerful machine or several specialized ones? These architectural decisions can make or break your application's performance and scalability.

If you've worked with traditional databases or analytics tools, you might be used to pulling data to your local machine for processing - but when you're dealing with millions of rows updating in real time, that approach quickly falls apart. You need a system designed from the ground up for real-time streaming, able to handle the complexity of distributed architectures without forcing you to become a distributed systems expert.

That's where Deephaven's architecture shines. Understanding how it works - and how to leverage it effectively - unlocks the ability to build applications that scale from prototype to production without fundamental rewrites.

The client-server model in practice

Deephaven follows a straightforward client-server architecture. When you connect a client to a Deephaven server, you establish a communication channel. The server processes your requests and sends responses back. For ticking tables, Deephaven uses its Barrage protocol to efficiently stream data: the client receives an initial snapshot followed by incremental updates containing only what changed.

This is harder than it sounds. Most systems force you to choose between polling (wasteful and high-latency) or building custom streaming infrastructure (complex and error-prone). Deephaven's Barrage protocol handles this automatically - your client code just subscribes to a table and receives updates in real time. No manual state management, no websocket wrangling, no custom serialization logic.

Here's what makes this powerful: the heavy lifting happens on the server. Your client sends requests and consumes responses, but the real computational work - joins, aggregations, complex calculations - happens where the data lives. This isn't just about performance; it fundamentally changes what's possible. You can join a billion-row historical table with a high-frequency real-time stream, and your client just sees the results flowing in.

The server does the processing. The client merely sends requests and consumes responses. This separation is what enables Deephaven applications to handle massive real-time workloads.

Language flexibility

Deephaven supports multiple languages, but they serve different purposes:

Server-side scripting languages run directly on the Deephaven server with full access to the table API:

Client APIs allow applications to connect to a Deephaven server from various languages:

This separation is powerful in practice. Your Python data science team can build analytical workflows on the server while your C++ trading systems consume the results through native clients. A web dashboard written in JavaScript can display the same real-time data that a Go-based microservice is processing - no translation layer needed, no data format mismatches. The client is agnostic to what language is running on the server, and vice versa.

When one server isn't enough

So when do you need multiple servers? There's no formula that takes your requirements and spits out a configuration, but several factors signal it's time to scale horizontally.

Data intensity matters more than row count

You'll often hear people ask, "How many rows can Deephaven handle?" But that's the wrong question. Data intensity depends on the interaction of multiple factors:

Row volume and update frequency: Processing millions of rows with simple updates is different from processing thousands with complex calculations.
Column characteristics: Numeric data is compact and fast. String data, especially with high cardinality or long values, consumes significantly more memory. Complex types add further overhead.
Query complexity: Simple aggregations have different performance profiles than multi-table joins or window operations.

A single server might easily handle billions of rows of compact numeric data, while struggling with millions of rows containing large strings and complex joins. You can't rely on general thresholds - you need to profile your specific workload.

Resource constraints drive architecture decisions

Your architecture depends on matching computational resources to workload requirements:

Memory (RAM) is often the first constraint you'll hit. Deephaven keeps active tables in memory for fast access. Running out of memory forces frequent garbage collection, tanking performance.

CPU cores improve throughput for complex queries and allow multiple operations to run in parallel. Operations like joins, aggregations, and formula evaluations benefit from higher core counts.

Network bandwidth becomes critical when sharing data between servers or serving many clients. In high-frequency trading or IoT applications, network capacity can become a bottleneck before CPU or memory.

Disk I/O matters for loading historical data, writing logs, or persisting snapshots. SSD storage is strongly preferred.

Specialized workloads benefit from separation

Consider using multiple servers when your application has distinct processing requirements:

Data segregation: Different data sets need separate processing for security or compliance reasons.
Resource optimization: Some workloads are CPU-intensive while others are memory-intensive.
Specialized processing: Different parts of your application require different libraries or configurations.

For example, you might dedicate one server to real-time data ingestion and another to complex analytical queries, allowing each to be optimized for its specific task.

Real-world architecture examples

Let's look at how these considerations play out in practice.

Example: Portfolio risk monitoring

Consider a trading desk that needs real-time risk management. Here's how a Deephaven client-server architecture enables this:

Server side	Client side
Ingests real-time market data (prices, volumes, volatility) and continuously updates position valuations.	Portfolio managers access risk dashboards from their workstations via Python clients.
Performs complex calculations including Greeks for options positions, Value at Risk (VaR), portfolio exposure by sector and geography, and stress testing scenarios.	Traders query specific positions and run hypothetical scenarios from their terminals.
Maintains historical tables for compliance and reporting.	Risk officers request custom reports and filtered views for high-risk positions and specific asset classes.
Generates real-time alerts when risk limits are breached.	Compliance teams access audit trails and historical snapshots from separate applications.
	Multiple users simultaneously access the same underlying data without impacting server performance.

This architecture centralizes intensive calculations on the server while allowing diverse users to access exactly the data they need through lightweight clients.

Example: Multi-server trading application

For higher-volume applications, Deephaven's peer-to-peer architecture enables distributed processing:

Server A (Data Ingestion & Storage):

Dedicated to high-speed market data ingestion from multiple sources: stock markets, cryptocurrency exchanges, futures, options.
Maintains connections to historical databases.
Performs initial data cleaning and normalization.
Exposes ticking tables via URIs for other servers to access.

Server B (Analysis & Client Services):

Connects to Server A using URIs to access live market data.
Performs complex calculations: risk analytics, trading signals, portfolio optimization.
Serves processed data to client applications.
Executes automated trading strategies based on signals.

Server C (Load-Balanced Analysis):

Configured as a peer to Server B, sharing the same analytical workload.
Provides load balancing for high-demand periods.
Acts as a fallback if Server B experiences hardware failure or needs maintenance.

Benefits of this architecture:

Separation of concerns allows each server to be optimized for its specific task.
Data ingestion continues uninterrupted during intensive analysis operations.
System scales by adding specialized servers for specific markets or strategies.
Load balancing distributes both client requests and computational workload.
Fault tolerance: if one analytical server fails, the other continues serving clients.
Maintenance can be performed on individual servers without taking the entire system offline.

Setting up this kind of distributed architecture usually requires significant engineering effort - message queues, distributed state management, custom protocols for data sharing. In Deephaven, Server B can access Server A's tables with a simple URI reference. The ticking table on Server A appears on Server B as if it were local, complete with real-time updates. No ETL pipelines, no data duplication, no synchronization logic. You write the same table operations whether the data is local or remote.

How should I choose my architecture?

The challenge is that there's no one-size-fits-all answer. The right architecture depends entirely on your specific workload characteristics. Here's a practical approach:

Start simple. Begin with a single server running your workload. This provides a baseline for understanding actual resource consumption and performance characteristics.

Measure and profile. Monitor memory usage, CPU utilization, query latency, and update throughput under realistic conditions. These measurements reveal actual bottlenecks rather than theoretical concerns.

Identify constraints. Determine which resource (memory, CPU, network, disk I/O) becomes the limiting factor first. This guides whether you need more resources, different resources, or multiple servers.

Scale incrementally. When a single server becomes insufficient, consider whether vertical scaling (more RAM/CPU) or horizontal scaling (additional servers) better addresses your specific constraint.

The examples and considerations in this post provide a framework for thinking about architecture decisions, but testing with your actual data and workload remains the most reliable way to determine appropriate infrastructure.

The iterative approach

Building scalable real-time systems is an iterative process. You can't predict every bottleneck or performance characteristic upfront. Start with the simplest architecture that meets your immediate needs. Measure everything. Let your actual data and usage patterns guide your scaling decisions.

Deephaven's architecture gives you the flexibility to start small and grow as needed - whether that means adding more resources to a single server or distributing work across multiple machines. The key is understanding where computation should happen and why.

Learn more

Want to dive deeper into Deephaven's architecture and capabilities? Check out these resources:

Have questions about architecting your Deephaven application? Our Slack community is full of people building real-time systems at scale who are happy to share their experiences. Join us!

A practical guide to architecting distributed data applications