How can I monitor heap availability across machines during a cluster upgrade?

When upgrading machines in a Deephaven Enterprise cluster to newer hardware, you need to monitor heap usage to ensure optimal resource allocation and prevent performance issues.

Scenario

A common situation occurs when migrating a cluster to new machines with different hardware specifications. For example, if you're upgrading to new machines each with 500GB of available heap, you need to track how much heap is being used throughout the day to make informed decisions about resource allocation and complete the migration safely.

Solution

Use the ResourceUtilization system table to monitor heap usage across your cluster. This table provides real-time and historical data about resource consumption on each machine.

Example: Monitor specific machines

The following example shows how to monitor heap usage for three new machines in your cluster:

// Define the machines you want to monitor
hosts = newTable(stringCol("Host", "Machine1", "Machine2", "Machine3"))

// Query today's heap usage for these machines
host_usage_today = db.liveTable("DbInternal", "ResourceUtilization")
    .where("Date=today()")
    .whereIn(hosts, "ResourceProcessName=Host")
    .select("Date", "Timestamp", "Host=ResourceProcessName", "HeapUsageMB", "WorkerCount")
    .updateView("MaxHeap=500000")
from deephaven import new_table
from deephaven.column import string_col

# Define the machines you want to monitor
hosts = new_table([string_col("Host", ["Machine1", "Machine2", "Machine3"])])

# Query today's heap usage for these machines
host_usage_today = (
    db.live_table("DbInternal", "ResourceUtilization")
    .where("Date=today()")
    .where_in(hosts, "ResourceProcessName=Host")
    .select(
        ["Date", "Timestamp", "Host=ResourceProcessName", "HeapUsageMB", "WorkerCount"]
    )
    .update_view("MaxHeap=500000")
)

Visualize the results

You can plot the heap usage data using either:

  • deephaven.ui: For modern, interactive visualizations
  • Legacy plotting API: For traditional plotting functionality

This allows you to see trends in heap usage throughout the day and identify when machines are approaching their capacity limits.

Best practices

  • Monitor heap usage regularly during cluster migrations to catch potential issues early.
  • Set alerts if heap usage exceeds a certain threshold (e.g., 80% of max heap).
  • Use historical data from ResourceUtilization to understand usage patterns before and after migration.
  • Ensure sufficient headroom in heap allocation to handle peak loads.