Troubleshoot Java processes
Since Deephaven's core components and user queries operate as Java processes, a deep understanding of Java's diagnostic tools is essential for effective troubleshooting. This guide covers several standard Java utilities that allow you to inspect the state of a running process at a low level. The techniques described here are invaluable for diagnosing complex issues related to process configuration, SSL/TLS connectivity, and performance bottlenecks by analyzing thread activity.
Debug process configuration
Use the sudo jinfo <pid>
command to print Java configuration information for a specified Java process like the CLASSPATH
and other useful info about a running Java process.
Debug SSL/TLS connections
Understanding SSL/TLS connection problems can sometimes be difficult, especially when it is not clear what messages are actually being sent and received.
You can get excellent certificate debugging by adding -Djavax.net.debug=all
to your Java launch parameters to help troubleshoot or diagnose SSL/TLS connections. As this produces significant log output, it is not enabled by default.
For the Deephaven launcher, this is best accomplished with jvmargs.txt. For server processes, you must add EXTRA_ARGS
parameters to the hostconfig
file.
Heap dumps
A heap dump is a snapshot of the memory of a Java process. Heap dumps are crucial for diagnosing memory-related problems, such as memory leaks, garbage collection issues, and java.lang.OutOfMemoryError
exceptions. The dump file can be analyzed with tools like Eclipse MAT or VisualVM to inspect the objects on the heap.
To create a heap dump, you can use the jmap
utility:
sudo jmap -dump:format=b,file=<file-path> <pid>
file-path
is the path where the heap dump file will be saved (e.g.,/tmp/worker_heap.bin
).pid
is the process ID of the Deephaven process.
For example, to create a heap dump of worker_1
:
sudo jmap -dump:format=b,file=/tmp/worker_1.bin $(ps -ef | grep worker_1 | awk '{print $2}' | head -n 1)
Caution
Generating a heap dump will pause the Java process, sometimes for a significant amount of time depending on the heap size. This will make the process unresponsive. Perform this action with caution on production systems.
Thread dumps
There are times during incident management when the Deephaven Support team may ask you to perform a thread dump of a particular Deephaven process. This does not impact the system in any way, but records valuable diagnostic information into the logs to be used in troubleshooting.
Perform a thread dump with jstack
To perform a thread dump with jstack, simply run:
sudo jstack -F <pid> > <file-path>
pid
is the process ID of the Deephaven process.file-path
is the file path where the thread dump will be written.
For example, to perform a thread dump of worker_1
, use ps
, grep
, and awk
to capture the process ID:
sudo jstack -F $(ps -ef | grep worker_1 | awk '{print $2}' | head -n 1) > /tmp/threadDumpWorker1.txt
Perform a thread dump with kill -3 (SIGQUIT)
In cases where you want the thread dump to go to the process logs, kill -3
can be used and the thread dump will be sent to the standard output stream of the process.
To perform a thread dump with kill -3
, simply run:
sudo kill -3 <pid>
pid
is the process ID of the Deephaven process.
For example, to perform a thread dump of worker_1
, use ps
, grep
, and awk
to capture the process ID:
sudo kill -3 $(ps -ef | grep worker_1 | awk '{print $2}' | head -n 1)
By default, worker logs are sent to the Deephaven database along with stderr
and stdout
, where stderr
is redirected to the stdout
for the process.
To view stdout
logs for worker_d0498a61
, the following query can be used:
t=db.liveTable("DbInternal", "ProcessEventLog").where("Date=today()", "Process=`worker_d0498a61`", "Level=`STDOUT`").sort("Timestamp")
t = (
db.live_table("DbInternal", "ProcessEventLog")
.where(["Date=today()", "Process=`worker_d0498a61`", "Level=`STDOUT`"])
.sort("Timestamp")
)
Analyzing garbage collection
Analyzing garbage collection (GC) logs is a powerful technique for diagnosing performance issues, latency spikes, and memory pressure in a Java application. By enabling GC logging, you can get detailed insights into heap usage, pause times, and the frequency of collection events.
To enable GC logging, you need to add specific flags to the Java launch command. A recommended configuration for modern Java versions (JDK 9+) is:
-Xlog:gc*:file=<gc-log-path>::filecount=<count>,filesize=<size>
gc-log-path
: The path to the log file (e.g.,/var/log/deephaven/gc.log
).count
: The number of log files to rotate.size
: The size of each log file before it's rotated (e.g.,10M
for 10 megabytes).
For example:
-Xlog:gc*:file=/var/log/deephaven/worker-gc.log::filecount=10,filesize=10M
Refer to the Modifying Java launch parameters section for instructions on how to add these flags to the appropriate Deephaven process.
Once you have captured the GC logs, you can use tools like GCEasy, GCViewer, or Java's built-in Flight Recorder (JFR) to analyze them and visualize the data.
Modify Java launch parameters
Many of the troubleshooting techniques described in this guide, such as enabling detailed logging, require adding arguments to the Java command line that launches a process. The method for setting these parameters varies depending on whether you are targeting a service, a worker, or an interactive script.
For detailed instructions on how to modify launch parameters for each process type, see the Java process launch configuration guide.