Skip to main content
Version: Python

How to use Deephaven's performance tables for system monitoring

This guide shows how to use Deephaven's internal performance tables.

Deephaven provides several options to enable users to monitor query performance and resolve issues as they arise. Many of these methods are available directly in the user interface. For example, users can see how long a query has been running by opening the History Panel and looking at the command details or hovering over the console command: the tooltip will show the formatted duration of the query. The most common performance issues are slow initialization, a query failing to update quickly, or a sluggish user interface.

If a query seems to be taking too long or throws an error, looking at the process and worker information in Deephaven's internal tables is a great place to start. Performance tables are a powerful tool for analyzing performance and debugging issues in Deephaven. This information is intended to help you troubleshoot queries independently. If you need more help, we encourage you to visit our Community Discussions page or contact us on Slack. You can also send us your logs and/or Exception messages.

Available performance tables

Deephaven's perfmon package contains various methods to create and open performance tables in the Deephaven IDE.

import deephaven.perfmon as pm

upl = pm.update_performance_log()
qopl = pm.query_operation_performance_log()
qpl = pm.query_performance_log()
pml = pm.process_metrics_log()
pil = pm.process_info_log()
ssl = pm.server_state_log()

Update Performance Log

The update performance log (UPL) describes the time and resources used by table operations over the course of a series of one minute intervals. The interval length can be changed if desired. At the end of the interval, operations where updates happen are logged. Each operation takes one row.

The EntryDescription column identifies the operations themselves. IntervalStartTime and IntervalEndTime show the beginning and end of the interval over which the operations were logged. UsageNanos and CpuNanos describe how long the operation took in total time and CPU time, respectively, in nanoseconds. There are several columns that detail data about row usage. Next, there are several rows that describe memory usage and garbage collection. Lastly, there are two rows that cover the auth context and update graph used by the operations.

Query Operation Performance Log

The query operation performance log (QOPL) contains performance data for individual subqueries. It is similar to the update performance log, however, unlike the UPL, the QOPL collects information about operations as they happen, rather than over an interval. The QOPL does not have an EntryDescription column, so you'll need to identify operations by their EvaluationNumber, which exists in the UPL.

Query Performance Log

The query performance log (QPL) contains many of the same columnas as the UPL and QOPL. This table contains query performance data. For information on individual sub-operations, see the query operation performance log.

Process Metrics Log

This table contains metrics collected for the current Deephaven engine process.

Process Info Log

This static table contains process information for the current Deephaven engine process.

Server State Log

This table contains memory utilization, Periodic Update Graph (UG), and garbage collection statistics sampled on a periodic basis.

Example use case

The tables available in perfmon contain a lot of information, and it can seem overwhelming without some context. Let's explore an example use case, particularly the update performance log and query operation performance log.

Below is the entire query.

Performance table query
import deephaven.perfmon as pm

# Key perf tables for this example
upl = pm.update_performance_log()
qopl = pm.query_operation_performance_log()

# Other perf tables that may be relevant in other cases
qpl = pm.query_performance_log()
pml = pm.process_metrics_log()
pil = pm.process_info_log()
ssl = pm.server_state_log()

import time
from deephaven import empty_table, time_table


def fast(x: int) -> int:
return int(x * (x + 1) / 2)


def slow(x: int) -> int:
time.sleep(0.25)
return int(x * (x + 1) / 2)


# Static table and query startup performance go to qopl table
t = empty_table(100).update_view(["A = ii"])
t2 = t.update(["X = fast(A)"])
t3 = t2.update(["Y = slow(A)"])

# Filtered and highlighted perf data to look at for the static or initialization cases
qopl_update = qopl.update_view(["DurationNanos = EndTime - StartTime"]).move_columns_up(
[
"Description",
"StartTime",
"EndTime",
"DurationNanos",
"CpuNanos",
"UserCpuNanos",
"FreeMemoryChange",
"TotalMemoryChange",
]
)

# Dynamic table updates go to the upl table
td = time_table("PT00:00:01").update_view(["A = ii"])
td2 = td.update(["X = fast(A)"])
td3 = td2.update(["Y = slow(A)"])

# Filtered and highlighted perf data to look at for the dynamic case
upl_update = (
upl.update_view(
[
"IntervalDurationNanos = IntervalEndTime - IntervalStartTime",
"Ratio = UsageNanos / IntervalDurationNanos",
]
)
.move_columns_up(
[
"EntryDescription",
"IntervalStartTime",
"IntervalEndTime",
"IntervalDurationNanos",
"UsageNanos",
"CpuNanos",
"UserCpuNanos",
"Ratio",
]
)
.sort_descending(["Ratio"])
)

There are a couple of key pieces in the above query worth noting:

  • The functions fast and slow do the same thing. slow is slow because it waits for 1/4 of a second to perform and return the calculation.
  • upl_update adds two columns: IntervalDurationNanos and Ratio. Ratio calculates the percent of the interval which was spent performing a particular operation. The table is also sorted by Ratio in descending order so that the operations that take the most time show up at the top.

img

In the above image, the Update([Y]) (the slow function) takes a whopping 25% of the total interval time. This makes sense based on what's happening; every second, a new row is added to the ticking tables. For a quarter of that second, the Deephaven engine sits around waiting for time.sleep(0.25) to be finished. Clearly, slow is a bottleneck in this particular code. In real applications, it won't be as obvious why something causes a slowdown. Still, this example query is a simple way to identify areas where performance improvements will be more significant.

Get help

If you can’t find an obvious cause for an error or slow performance in your query, you may have uncovered a bug. In this case, you should file a bug report at https://github.com/deephaven/deephaven-core/issues. Be sure to include the following with your bug report:

  • The version of the Deephaven system.
  • The complete stack trace, not just the list of Caused By expressions.
  • A code snippet or a description of the query.
  • Any logs from your user console or, if running Docker, from your terminal.
  • Support logs exported from the browser. In the Settings menu, click the Export Logs button to download a zip with the browser logs and the current state of the application.

You can also get help by asking questions in our GitHub Discussions forum or Slack Community.