Process startup troubleshooting
All Deephaven processes create log files. If a process does not start correctly, these log files might show the cause.
Log files will be written to /var/log/deephaven/<process_name>
or /var/log/deephaven/misc
. If a process lacks write permission or there is insufficient disk space in those folders, logs will be written to /tmp
instead. This is a common cause of logs unexpectedly appearing in /tmp
.
All processes create a log file named <MainJavaClass>.log.<datetime>
or <proc_name>.log.<datetime>
. These log files will periodically roll over, typically every 15 minutes. This interval and retention can be customized via logging configuration. For convenience, look for a symbolic link to the most current file, <prefix>.log.current
.
The process of starting a Deephaven service creates a launch log file named <proc_name>.log.<date>
. This file contains the logging that preceeds the start of the Java process, including the standard output and standard error of the process. If the Java process fails to start, this will be the only log file. This file contains the actual command line and all options. All launch activity for a date is logged to the same file.
Some command line utilities and startup scripts support a --verbose
argument for more detailed logging. For example, the deephaven-jetty.sh
and other service launch scripts may accept --verbose
to increase log detail. Refer to the specific utility's help message or documentation for details.
Common Startup Failures
Below are some typical reasons why a Deephaven process may fail to start. For each, check the log files described above for relevant error messages.
- Port conflicts: Another process is using a required port. Look for errors such as
Address already in use
. - Missing dependencies: Required libraries or files are not present. Look for
No such file or directory
orClassNotFoundException
. - Configuration errors: Syntax errors or missing fields in configuration files. Logs may show
Invalid configuration
orParse error
. - Insufficient system resources: Not enough memory, CPU, or disk. Errors may include
OutOfMemoryError
orNo space left on device
. - Permission issues: The process cannot access necessary files or directories. Look for
Permission denied
errors. - Environment variables unset: Required environment variables are missing. Logs may show
Environment variable not set
or similar messages.
Example Log Snippets
Below are examples of common startup errors and how to interpret them:
-
Port conflict:
java.net.BindException: Address already in use
Action: Identify and stop the conflicting process, or reconfigure the Deephaven service to use a different port.
-
Missing dependency:
java.lang.ClassNotFoundException: com.deephaven.SomeRequiredClass
Action: Verify the installation integrity and ensure all necessary libraries are present in the correct locations (e.g., classpath, module path).
-
Configuration error:
ERROR Invalid configuration: missing field 'auth'
Action: Review the relevant configuration file, correct the syntax or missing field, and ensure it conforms to the expected structure.
-
Out of memory:
java.lang.OutOfMemoryError: Java heap space
Action: Increase the JVM heap size allocated to the process (e.g., via
-Xmx
in startup parameters) or ensure the host system has sufficient available memory. -
Permission denied:
java.io.FileNotFoundException: /var/log/deephaven/myservice.log (Permission denied)
Action: Verify that the user account running the Deephaven process has the necessary read/write/execute permissions for the specified file or directory.
Troubleshooting Checklist
Follow these steps if a Deephaven process does not start:
- Check the log files for errors or warnings (see above for locations).
- Verify configuration files for syntax and completeness.
- Ensure required ports are free. Key Deephaven ports include etcd (default: 2379, 2380). For a comprehensive list, refer to the Deephaven process ports section in the Ops Guide.
- Check system resources (memory, CPU, disk space).
- Verify permissions on log and configuration directories.
- Confirm all required environment variables are set.
- Restart the process after addressing any issues found.
- Consult related documentation for component-specific troubleshooting (see below).
Environment-Specific Considerations
- Docker:
- Logs may be accessed via
docker logs <container>
or by mounting a host directory. - Ensure volume mounts have correct permissions.
- Logs may be accessed via
- Kubernetes:
- Use
kubectl logs <pod>
to access logs. - Log files may be stored in ephemeral storage unless persistent volumes are configured.
- Use
- Bare-metal:
- Logs are written directly to the file system as described above.
- Ensure the process runs as a user with appropriate permissions.
Deephaven services
When Deephaven services, managed by M/Monit, processes fail to start, they usually show status in /usr/illumon/latest/bin/dh_monit summary
that cycles between Does not exist
, Initializing
, and Execution failed
. When investigating the cause of startup failure, start with the process(es) with the most dependencies (see Process dependencies). For example, if the configuration_server
does not start, nothing else can start.
For the particular service being investigated:
- Start with the process's launch log:
/var/log/deephaven/<process_name>/<process_name>.log.<date>
- If the process starts (as indicated by the launch log), but then dies or restarts, check the process's detailed log:
/var/log/deephaven/<process_name>/<class or process_name>.log.current
In the rare case where a process fails to launch and there is little or no information in the launch log, it may be possible to learn more by interactively launching the process. This is an advanced troubleshooting step that bypasses M/Monit's automatic management for this specific attempt, providing direct console output.
Process launch commands are stored in .conf
files in /etc/sysconfig/illumon.d/monit
. For example:
cat /etc/sysconfig/illumon.d/monit/iris_controller.conf
check process iris_controller with pidfile /etc/deephaven/run/iris_controller.pid
start program = "/usr/illumon/latest/bin/iris start iris_controller"
stop program = "/usr/illumon/latest/bin/iris stop iris_controller"
Running "as irisadmin
" (e.g., via sudo su - irisadmin
), the iris_controller
can be interactively started with:
/usr/illumon/latest/bin/iris --debug start iris_controller
This will echo to stdout
all the details as the iris
script attempts to launch the iris_controller
process. Observe this output carefully for errors not captured in standard logs.