Process startup troubleshooting
All Deephaven processes create log files. If a process does not start correctly, these log files might show the cause.
Log files will be written to /var/log/deephaven/<process_name> or /var/log/deephaven/misc. If a process lacks write permission or there is insufficient disk space in those folders, logs will be written to /tmp instead. This is a common cause of logs unexpectedly appearing in /tmp.
All processes create a log file named <MainJavaClass>.log.<datetime> or <proc_name>.log.<datetime>. These log files will periodically roll over, typically every 15 minutes. This interval and retention can be customized via logging configuration. For convenience, look for a symbolic link to the most current file, <prefix>.log.current.
The process of starting a Deephaven service creates a launch log file named <proc_name>.log.<date>. This file contains the logging that preceeds the start of the Java process, including the standard output and standard error of the process. If the Java process fails to start, this will be the only log file. This file contains the actual command line and all options. All launch activity for a date is logged to the same file.
Some command line utilities and startup scripts support a --verbose argument for more detailed logging. For example, the deephaven-jetty.sh and other service launch scripts may accept --verbose to increase log detail. Refer to the specific utility's help message or documentation for details.
Common Startup Failures
Below are some typical reasons why a Deephaven process may fail to start. For each, check the log files described above for relevant error messages.
- Port conflicts: Another process is using a required port. Look for errors such as
Address already in use. - Missing dependencies: Required libraries or files are not present. Look for
No such file or directoryorClassNotFoundException. - Configuration errors: Syntax errors or missing fields in configuration files. Logs may show
Invalid configurationorParse error. - Insufficient system resources: Not enough memory, CPU, or disk. Errors may include
OutOfMemoryErrororNo space left on device. - Permission issues: The process cannot access necessary files or directories. Look for
Permission deniederrors. - Environment variables unset: Required environment variables are missing. Logs may show
Environment variable not setor similar messages.
Example Log Snippets
Below are examples of common startup errors and how to interpret them:
-
Port conflict:
Action: Identify and stop the conflicting process, or reconfigure the Deephaven service to use a different port.
-
Missing dependency:
Action: Verify the installation integrity and ensure all necessary libraries are present in the correct locations (e.g., classpath, module path).
-
Configuration error:
Action: Review the relevant configuration file, correct the syntax or missing field, and ensure it conforms to the expected structure.
-
Out of memory:
Action: Increase the JVM heap size allocated to the process (e.g., via
-Xmxin startup parameters) or ensure the host system has sufficient available memory. -
Permission denied:
Action: Verify that the user account running the Deephaven process has the necessary read/write/execute permissions for the specified file or directory.
Troubleshooting Checklist
Follow these steps if a Deephaven process does not start:
- Check the log files for errors or warnings (see above for locations).
- Verify configuration files for syntax and completeness.
- Ensure required ports are free. Key Deephaven ports include etcd (default: 2379, 2380). For a comprehensive list, refer to the Deephaven process ports section in the Ops Guide.
- Check system resources (memory, CPU, disk space).
- Verify permissions on log and configuration directories.
- Confirm all required environment variables are set.
- Restart the process after addressing any issues found.
- Consult related documentation for component-specific troubleshooting (see below).
Environment-Specific Considerations
- Docker:
- Logs may be accessed via
docker logs <container>or by mounting a host directory. - Ensure volume mounts have correct permissions.
- Logs may be accessed via
- Kubernetes:
- Use
kubectl logs <pod>to access logs. - Log files may be stored in ephemeral storage unless persistent volumes are configured.
- Use
- Bare-metal:
- Logs are written directly to the file system as described above.
- Ensure the process runs as a user with appropriate permissions.
Deephaven services
When Deephaven services, managed by M/Monit, processes fail to start, they usually show status in /usr/illumon/latest/bin/dh_monit summary that cycles between Does not exist, Initializing, and Execution failed. When investigating the cause of startup failure, start with the process(es) with the most dependencies (see Process dependencies). For example, if the configuration_server does not start, nothing else can start.
For the particular service being investigated:
- Start with the process's launch log:
/var/log/deephaven/<process_name>/<process_name>.log.<date> - If the process starts (as indicated by the launch log), but then dies or restarts, check the process's detailed log:
/var/log/deephaven/<process_name>/<class or process_name>.log.current
In the rare case where a process fails to launch and there is little or no information in the launch log, it may be possible to learn more by interactively launching the process. This is an advanced troubleshooting step that bypasses M/Monit's automatic management for this specific attempt, providing direct console output.
Process launch commands are stored in .conf files in /etc/sysconfig/illumon.d/monit. For example:
Running "as irisadmin" (e.g., via sudo su - irisadmin), the iris_controller can be interactively started with:
This will echo to stdout all the details as the iris script attempts to launch the iris_controller process. Observe this output carefully for errors not captured in standard logs.