Skip to main content
Version: Python

How to use Deephaven's performance tables for system monitoring

This guide shows how to use Deephaven's internal performance tables.

Deephaven provides several options to enable users to monitor query performance and resolve issues as they arise. Many of these methods are available directly in the user interface. For example, users can see how long a query has been running by opening the History Panel and looking at the command details or hovering over the console command: the tooltip will show the formatted duration of the query. The most common performance issues are slow initialization, a query failing to update quickly, or a sluggish user interface.

If a query seems to be taking too long or throws an error, looking at the process and worker information in Deephaven's internal tables is a great place to start. Performance tables are a powerful tool for analyzing performance and debugging issues in Deephaven. This information is intended to help you troubleshoot queries independently. If you need more help, we encourage you to visit our Community Discussions page or contact us on Slack. You can also send us your logs and/or Exception messages.

Available performance tables

Deephaven's perfmon package contains various methods to create and open performance tables in the Deephaven IDE.

import deephaven.perfmon as pm

qpl = pm.query_performance_log()
upl = pm.update_performance_log()
qopl = pm.query_operation_performance_log()
pml = pm.process_metrics_log()
pil = pm.process_info_log()
ssl = pm.server_state_log()
qptt = pm.query_performance_tree_table()
qoptt = pm.query_operation_performance_tree_table()

Query Performance Log

The Query Performance Log (QPL) contains many of the same columns as the Update Performance Log and Query Operation Performance Log. This table contains details on query-level performance. A given Deephaven instance may run multiple queries, each with its own set of query performance log entries. See the QOPL for information on individual queries' sub-operations.

A query, in this context, is the code that is run:

  • Whenever you type a new command in the console and type Return. Each 'return' where you have to wait for an answer from the worker is a new query.
  • When a Persistent Query initializes.
  • As a result of a batch operation (e.g., sort and filter) in the UI.

The most significant columns in this table are:

  • UsageNanos — How long this query ran in nanoseconds.
  • FreeMemory — The free memory at the end of this operation.
  • TotalMemory — The total size of the JVM heap memory at the end of this operation.
  • FreeMemoryChange - The change in free memory during this operation.

The Query Performance Log contains details on query-level performance. A given Deephaven instance may be running multiple queries, each with its own set of query performance log entries.

Column NameData TypeDescription
EvaluationNumberlongAn increasing integral value for requests made to the worker.
ParentEvaluationNumberlongWhen a query is part of a batch, the evaluation number of the enclosing parent.
SessionIdStringA unique identifier for the user-session performing the operation.
DescriptionStringA textual description of the operation, generally including the name and some of the arguments.
StartTimeInstantThe time this operation began.
EndTimeInstantThe time this operation finished.
UsageNanoslongHow long, in nanoseconds, this operation took.
CpuNanoslongHow long, in nanoseconds, this operation used the CPU.
UserCpuNanoslongHow long, in nanoseconds, this operation used the CPU in user-space. System time can be derived by subtracting UserCpuNanos from CpuNanos.
FreeMemorylongThe free memory at the end of this operation.
TotalMemorylongThe total size of the JVM heap memory at the end of this operation.
FreeMemoryChangelongThe change in free memory during this operation. Multiple threads may execute in parallel, including other operations, therefore the change in JVM memory may not be reliable. To definitively determine operations that are using significant heap, the FreeMemoryChange column should be used in concert with profiling and the AllocatedBytes column (which also includes memory allocated but subject to GC during or after the operation is complete).
TotalMemoryChangelongThe change in total JVM heap memory during this operation.
CollectionslongThe number of garbage collections while executing this operation.
CollectionTimeNanoslongThe number of nanoseconds spent in the garbage collector while executing this operation.
AllocatedByteslongThe number of bytes allocated by this operation.
PoolAllocatedByteslongThe number of bytes allocated from pools by this operation.
WasInterruptedbooleanWas this operation interrupted?
ExceptionStringIf this query produced an exception, the Exception information. Otherwise null.
AuthContextStringThe authentication context used for executing this operation.

Update Performance Log

The update performance log (UPL) describes the time and resources used by the UpdateGraphProcessor to maintain incremental results for table operations over a series of one-minute intervals. The interval length can be changed if desired. At the end of the interval, operations where updates happen are logged. Each operation takes one row.

The EntryDescription column identifies the operations themselves. IntervalStartTime and IntervalEndTime show the beginning and end of the interval over which the operations were logged. UsageNanos and CpuNanos describe how long the operation took in total time and CPU time, respectively, in nanoseconds. Several columns detail data about row usage. Next, there are several rows that describe memory usage and garbage collection. Lastly, there are two rows that cover the auth context and update graph used by the operations.

Column NameData TypeDescription
EntryIDlongA numeric identifier for this performance entry, which can be used to identify the same operation across cycles.
EvaluationNumberlongThe evaluation number of the corresponding entry in the QueryOperationPerformanceLogLogger
OperationNumberintThe operation number of the corresponding entry in the QueryOperationPerformanceLogLogger
EntryDescriptionStringA textual description of the operation, generally including the name and some of the arguments.
EntryCallerLineStringThe file and line number where the operation was initiated, if available.
IntervalStartTimeInstantThe beginning of the interval this row represents.
IntervalEndTimeInstantThe end of the interval this row represents.
UsageNanoslongHow long, in nanoseconds, this operation took in this interval.
CpuNanoslongHow long, in nanoseconds, this operation used the CPU in this interval.
UserCpuNanoslongHow long, in nanoseconds, this operation used the CPU in user-space in this interval. System time can be derived by subtracting UserCpuNanos from CpuNanos.
RowsAddedlongHow many rows were added in this interval.
RowsRemovedlongHow many rows were removed in this interval.
RowsModifiedlongHow many rows were modified in this interval.
RowsShiftedlongHow many rows were shifted in this interval (this may not be an exact count, a sparse region may be shifted and the number of present rows is not calculated).
InvocationCountlongHow many times this entry was executed in this interval.
MinFreeMemorylongThe minimum memory at the end of an operation.
MaxTotalMemorylongThe maximum total memory at the end of an operation.
CollectionslongThe number of garbage collections while executing this operation in this interval.
CollectionTimeNanoslongThe number of nanoseconds spent in the garbage collector while executing this operation in this interval.
AllocatedByteslongThe number of bytes allocated by this operation in this interval.
PoolAllocatedByteslongThe number of bytes allocated from pools by this operation in this interval.
AuthContextStringThe authentication context used for executing this operation.
UpdateGraphStringThe name of the update graph for this entry.

Query Operation Performance Log

The Query Operation Performance Log (QOPL) contains performance data for individual subqueries. It is similar to the Update Performance Log, however, unlike the UPL, the QOPL collects information about operations as they happen, rather than over an interval. The QOPL does not have an EntryDescription column, so you'll need to identify operations by their EvaluationNumber, which exists in the UPL.

The most significant columns in this table are:

  • StartTime — The time at which this operation started.
  • EndTime — The time at which this operation ended.
  • OperationNumber — Monotonically increasing sequence numbers for each operation of a query.
  • TimeSecs — The time (in seconds) that the operation took.
  • FreeMemory — The free memory at the end of this operation.
  • TotalMemory — The total size of the JVM heap memory at the end of this operation.
  • FreeMemoryChange - The change in free memory during this operation.
Column NameData TypeDescription
EvaluationNumberlongAn increasing numeric value for requests made to the worker.
`ParentEvaluationNumberlongWhen a query is part of a batch, the evaluation number of the enclosing parent.
OperationNumberintAn identifier for an operation within an evaluation.
ParentOperationNumberintWhen a query is part of a batch, the enclosing operation within the parent.
DepthintThe level of operation within a nested operation, zero when there is no parent.
SessionIdStringA unique identifier for the user-session performing the operation.
DescriptionStringA textual description of the operation, generally including the name and some of the arguments.
CallerLineStringThe file and line number where the operation was initiated, if available.
IsCompilationbooleanDoes this represent a formula compilation operation?
StartTimeInstantThe time this operation began.
EndTimeInstantThe time this operation finished.
UsageNanoslongHow long, in nanoseconds, this operation took.
CpuNanoslongHow long, in nanoseconds, this operation used the CPU.
UserCpuNanoslongHow long, in nanoseconds, this operation used the CPU in user-space. System time can be derived by subtracting UserCpuNanos from CpuNanos.
FreeMemorylongThe free memory at the end of this operation.
TotalMemorylongThe total size of the JVM heap memory at the end of this operation.
FreeMemoryChangelongThe change in free memory during this operation.
TotalMemoryChangelongThe change in total JVM heap memory during this operation.
CollectionslongThe number of garbage collections while executing this operation.
CollectionTimeNanoslongThe number of nanoseconds spent in the garbage collector while executing this operation.
AllocatedByteslongThe number of bytes allocated by this operation.
PoolAllocatedByteslongThe number of bytes allocated from pools by this operation.
InputSizeLonglongThe number of rows in the input for this operation.
WasInterruptedbooleanWas this operation interrupted?
AuthContextStringThe authentication context used for executing this operation.

Process Metrics Log

This table contains metrics collected for the current Deephaven engine process.

Column NameData TypeDescription
ProcessUniqueIdStringThe process's unique identifier String.
TimestampInstantA Java Instant representing the exact time the process ran.
NameStringThe name of the metric being reported.
IntervalStringThe length of the interval being reported (for example, "1s" or "15m").
TypeStringThe type of the process - state, counter, etc.
NlongThe number of samples.
SumlongThe sum of all samples in this interval.
LastlongThe last sample in this interval.
MinlongThe minimum sample in this interval.
MaxlongThe maximum sample in this interval.
AvglongThe mean of all samples in this interval.
Sum2longThe sum of the squares of all samples in this interval.
StdDevlongThe standard deviation of all samples in this interval.

Process Info Log

This static table contains process information for the current Deephaven engine process.

Column NameData TypeDescription
IdStringAn identifier string for this process.
TypeStringThe type of process performed in this row.
KeyStringA key describing the process or property.
ValueStringThe value of the property or process described by the Key.

Server State Log

This table contains JVM info about memory utilization, the Periodic Update Graph (UG), and garbage collection statistics sampled on a periodic basis.

Column NameData TypeDescription
IntervalStartTimeInstantThe start time of the interval.
IntervalDurationMicroslongThe number of microseconds in the interval.
TotalMemoryMiBintThe amount of memory allocated to the JVM.
FreeMemoryMiBintThe amount of free memory in the JVM.
IntervalCollectionsshortThe number of garbage collection events in the interval.
IntervalCollectionTimeMicrosintThe number of microseconds (approximately) spent collecting garbage in the interval.
IntervalUGPCyclesOnBudgetshortThe number of update graph processor cycles on budget in the interval.
IntervalUGPCyclesTimeMicrosStringThe duration of updage graph processor cycles, in microseconds in the interval.
IntervalUGPCyclesSafePointsshortThe number of safe points in the interval.
IntervalUGPCyclesSafePointTimeMicrosintThe number of microseconds spent in safe points in the interval.

Query Performance Tree Table

This tree table contains Deephaven query performance data. It displays information that can also be found in the Query Performance Log, in tree table form. It is similar to a traditional call graph, placing every operation with a parent operation into expandable groups. For example, we can click on the row where EvaluationNumber = 20 to view the sub-operation with EvaluationNumber = 21:

img

Performance data for individual sub-operations as a tree table is available from calling query_operation_performance_tree_table.

Column NameData TypeDescription
EvaluationNumberlongThe evaluation number of the corresponding entry in the QueryOperationPerformanceLogLogger.
ParentEvaluationNumberlongThe evaluation number of this row's parent in the QueryOperationPerformanceLogLogger.
SessionIdStringA unique identifier for the user-session performing the operation.
DescriptionStringA textual description of the operation, generally including the name and some of the arguments.
StartTimeInstantThe time this operation began.
EndTimeInstantThe time this operation finished.
UsageNanoslongHow long, in nanoseconds, this operation took.
CpuNanoslongHow long, in nanoseconds, this operation used the CPU.
UserCpuNanoslongHow long, in nanoseconds, this operation used the CPU in user-space. System time can be derived by subtracting UserCpuNanos from CpuNanos.
FreeMemorylongThe free memory at the end of this operation.
TotalMemorylongThe total size of the JVM heap memory at the end of this operation.
FreeMemoryChangelongThe change in free memory during this operation.
TotalMemoryChangelongThe change in total JVM heap memory during this operation.
CollectionslongThe number of garbage collections while executing this operation in this interval.
CollectionTimeNanoslongThe number of nanoseconds spent in the garbage collector while executing this operation.
AllocatedByteslongThe number of bytes allocated by this operation.
PoolAllocatedByteslongThe number of bytes allocated from pools by this operation.
WasInterruptedbooleanWas this operation interrupted?
ExceptionStringIf this query produced an exception, the Exception information. Otherwise null.
AuthContextStringThe authentication context used for executing this operation.

Query Operation Performance Tree Table

This tree table contains Deephaven performance data on how long each individual operation of a query (where, update, natural_join, etc., as well as internal functions) takes to execute, and the change in resource consumption while each was executing. It displays information that can also be found in the Query Operation Performance Log, in tree table form. It is similar to a traditional call graph, placing every operation with a parent operation into expandable groups.

img

Column NameData TypeDescription
EvalKeyStringA numerical string that corresponds to this row's EvaluationNumber.
ParentEvalKeyStringThe EvalKey of this row's parent, used to construct the tree.
SessionIdStringA unique identifier for the user-session performing the operation.
DescriptionStringA textual description of the operation, generally including the name and some of the arguments.
StartTimeInstantThe time this operation began.
EndTimeInstantThe time this operation finished.
UsageNanoslongHow long, in nanoseconds, this operation took.
CpuNanoslongHow long, in nanoseconds, this operation used the CPU.
UserCpuNanoslongHow long, in nanoseconds, this operation used the CPU in user-space. System time can be derived by subtracting UserCpuNanos from CpuNanos.
FreeMemorylongThe free memory at the end of this operation.
TotalMemorylongThe total size of the JVM heap memory at the end of this operation.
FreeMemoryChangelongThe change in free memory during this operation.
TotalMemoryChangelongThe change in total JVM heap memory during this operation.
CollectionslongThe number of garbage collections while executing this operation in this interval.
CollectionTimeNanoslongThe number of nanoseconds spent in the garbage collector while executing this operation.
AllocatedByteslongThe number of bytes allocated by this operation.
PoolAllocatedByteslongThe number of bytes allocated from pools by this operation.
WasInterruptedbooleanWas this operation interrupted?
ExceptionStringIf this query produced an exception, the Exception information. Otherwise null.
AuthContextStringThe authentication context used for executing this operation.
DepthintHow deep in the tree structure the given row is. A row with no parents has a depth of 0; a row with one parent has a depth of 1; and so on.
CallerLineStringThe file and line number where the operation was initiated, if available.
IsCompilationbooleanWhether the operation on this row was a compile-type operation.
InputSizeLonglongThe number of rows in the input tables.
EvaluationNumberlongThe evaluation number of the corresponding entry in the QueryOperationPerformanceLogLogger
ParentEvaluationNumberlongThe evaluation number of this row's parent in the QueryOperationPerformanceLogLogger.
OperationNumberintAn identifier for an operation within an evaluation.
ParentOperationNumberintWhen a query is part of a batch, the enclosing operation within the parent.

Example use case

The tables available in perfmon contain a lot of information, and it can seem overwhelming without some context. Let's explore an example use case, particularly the Update Performance Log and Query Operation Performance Log.

Below is the entire query.

Performance table query
import deephaven.perfmon as pm

# Key perf tables for this example
upl = pm.update_performance_log()
qopl = pm.query_operation_performance_log()

# Other perf tables that may be relevant in other cases
qpl = pm.query_performance_log()
pml = pm.process_metrics_log()
pil = pm.process_info_log()
ssl = pm.server_state_log()
qptt = pm.query_performance_tree_table()
qoptt = pm.query_operation_performance_tree_table()

import time
from deephaven import empty_table, time_table


def fast(x: int) -> int:
return int(x * (x + 1) / 2)


def slow(x: int) -> int:
time.sleep(0.25)
return int(x * (x + 1) / 2)


# Static table and query startup performance go to qopl table
t = empty_table(100).update_view(["A = ii"])
t2 = t.update(["X = fast(A)"])
t3 = t2.update(["Y = slow(A)"])

# Filtered and highlighted perf data to look at for the static or initialization cases
qopl_update = qopl.update_view(["DurationNanos = EndTime - StartTime"]).move_columns_up(
[
"Description",
"StartTime",
"EndTime",
"DurationNanos",
"CpuNanos",
"UserCpuNanos",
"FreeMemoryChange",
"TotalMemoryChange",
]
)

# Dynamic table updates go to the upl table
td = time_table("PT00:00:01").update_view(["A = ii"])
td2 = td.update(["X = fast(A)"])
td3 = td2.update(["Y = slow(A)"])

# Filtered and highlighted perf data to look at for the dynamic case
upl_update = (
upl.update_view(
[
"IntervalDurationNanos = IntervalEndTime - IntervalStartTime",
"Ratio = UsageNanos / IntervalDurationNanos",
]
)
.move_columns_up(
[
"EntryDescription",
"IntervalStartTime",
"IntervalEndTime",
"IntervalDurationNanos",
"UsageNanos",
"CpuNanos",
"UserCpuNanos",
"Ratio",
]
)
.sort_descending(["Ratio"])
)

There are a couple of key pieces in the above query worth noting:

  • The functions fast and slow do the same thing. slow is slow because it waits for 1/4 of a second to perform and return the calculation.
  • upl_update adds two columns: IntervalDurationNanos and Ratio. Ratio calculates the percent of the interval that was spent performing a particular operation. The table is also sorted by Ratio in descending order so that the operations that take the most time show up at the top.

img

In the above image, the Update([Y]) (the slow function) takes a whopping 25% of the total interval time. This makes sense based on what's happening; every second, a new row is added to the ticking tables. For a quarter of that second, the Deephaven engine sits around waiting for time.sleep(0.25) to be finished. Clearly, slow is a bottleneck in this particular code. In real applications, it won't be as obvious why something causes a slowdown. Still, this example query is a simple way to identify areas where performance improvements will be more significant.

Using performance tree tables

The Query Performance Tree Table and Query Operation Performance Tree Table can be useful tools for pinpointing the cause of unexpected slow performance in your queries. A common use case is to sort the table by the UsageNanos column so that the slowest operations appear at the top of the table:

img

Expanding the parent rows in the tree table displays the sub-operations that contribute to the parent operation's total time, allowing you to identify specifically which computations and sub-operations could be causing performance issues:

img

Get help

If you can’t find an obvious cause for an error or slow performance in your query, you may have uncovered a bug. In this case, you should file a bug report at https://github.com/deephaven/deephaven-core/issues. Be sure to include the following with your bug report:

  • The version of the Deephaven system.
  • The complete stack trace, not just the list of Caused By expressions.
  • A code snippet or a description of the query.
  • Any logs from your user console or, if running Docker, from your terminal.
  • Support logs exported from the browser. In the Settings menu, click the Export Logs button to download a zip with the browser logs and the current state of the application.

You can also get help by asking questions in our Slack Community.