Process startup troubleshooting

All Deephaven processes create log files. If a process does not start correctly, these log files might show the cause.

Log files will be written to /var/log/deephaven/<process_name> or /var/log/deephaven/misc. If a process lacks write permission or there is insufficient disk space in those folders, logs will be written to /tmp instead. This is a common cause of logs unexpectedly appearing in /tmp. All processes create a log file named <MainJavaClass>.log.<datetime> or <proc_name>.log.<datetime>. These log files will periodically roll over, typically every 15 minutes. This interval and retention can be customized via logging configuration. For convenience, look for a symbolic link to the most current file, <prefix>.log.current. The process of starting a Deephaven service creates a launch log file named <proc_name>.log.<date>. This file contains the logging that preceeds the start of the Java process, including the standard output and standard error of the process. If the Java process fails to start, this will be the only log file. This file contains the actual command line and all options. All launch activity for a date is logged to the same file.

Some command line utilities and startup scripts support a --verbose argument for more detailed logging. For example, the deephaven-jetty.sh and other service launch scripts may accept --verbose to increase log detail. Refer to the specific utility's help message or documentation for details.

Common Startup Failures

Below are some typical reasons why a Deephaven process may fail to start. For each, check the log files described above for relevant error messages.

  • Port conflicts: Another process is using a required port. Look for errors such as Address already in use.
  • Missing dependencies: Required libraries or files are not present. Look for No such file or directory or ClassNotFoundException.
  • Configuration errors: Syntax errors or missing fields in configuration files. Logs may show Invalid configuration or Parse error.
  • Insufficient system resources: Not enough memory, CPU, or disk. Errors may include OutOfMemoryError or No space left on device.
  • Permission issues: The process cannot access necessary files or directories. Look for Permission denied errors.
  • Environment variables unset: Required environment variables are missing. Logs may show Environment variable not set or similar messages.

Example Log Snippets

Below are examples of common startup errors and how to interpret them:

  • Port conflict:

    java.net.BindException: Address already in use
    

    Action: Identify and stop the conflicting process, or reconfigure the Deephaven service to use a different port.

  • Missing dependency:

    java.lang.ClassNotFoundException: com.deephaven.SomeRequiredClass
    

    Action: Verify the installation integrity and ensure all necessary libraries are present in the correct locations (e.g., classpath, module path).

  • Configuration error:

    ERROR Invalid configuration: missing field 'auth'
    

    Action: Review the relevant configuration file, correct the syntax or missing field, and ensure it conforms to the expected structure.

  • Out of memory:

    java.lang.OutOfMemoryError: Java heap space
    

    Action: Increase the JVM heap size allocated to the process (e.g., via -Xmx in startup parameters) or ensure the host system has sufficient available memory.

  • Permission denied:

    java.io.FileNotFoundException: /var/log/deephaven/myservice.log (Permission denied)
    

    Action: Verify that the user account running the Deephaven process has the necessary read/write/execute permissions for the specified file or directory.

Troubleshooting Checklist

Follow these steps if a Deephaven process does not start:

  1. Check the log files for errors or warnings (see above for locations).
  2. Verify configuration files for syntax and completeness.
  3. Ensure required ports are free. Key Deephaven ports include etcd (default: 2379, 2380). For a comprehensive list, refer to the Deephaven process ports section in the Ops Guide.
  4. Check system resources (memory, CPU, disk space).
  5. Verify permissions on log and configuration directories.
  6. Confirm all required environment variables are set.
  7. Restart the process after addressing any issues found.
  8. Consult related documentation for component-specific troubleshooting (see below).

Environment-Specific Considerations

  • Docker:
    • Logs may be accessed via docker logs <container> or by mounting a host directory.
    • Ensure volume mounts have correct permissions.
  • Kubernetes:
    • Use kubectl logs <pod> to access logs.
    • Log files may be stored in ephemeral storage unless persistent volumes are configured.
  • Bare-metal:
    • Logs are written directly to the file system as described above.
    • Ensure the process runs as a user with appropriate permissions.

Deephaven services

When Deephaven services, managed by M/Monit, processes fail to start, they usually show status in /usr/illumon/latest/bin/dh_monit summary that cycles between Does not exist, Initializing, and Execution failed. When investigating the cause of startup failure, start with the process(es) with the most dependencies (see Process dependencies). For example, if the configuration_server does not start, nothing else can start.

For the particular service being investigated:

  • Start with the process's launch log: /var/log/deephaven/<process_name>/<process_name>.log.<date>
  • If the process starts (as indicated by the launch log), but then dies or restarts, check the process's detailed log: /var/log/deephaven/<process_name>/<class or process_name>.log.current

In the rare case where a process fails to launch and there is little or no information in the launch log, it may be possible to learn more by interactively launching the process. This is an advanced troubleshooting step that bypasses M/Monit's automatic management for this specific attempt, providing direct console output.

Process launch commands are stored in .conf files in /etc/sysconfig/illumon.d/monit. For example:

cat /etc/sysconfig/illumon.d/monit/iris_controller.conf

check process iris_controller with pidfile /etc/deephaven/run/iris_controller.pid
    start program = "/usr/illumon/latest/bin/iris start iris_controller"
    stop program  = "/usr/illumon/latest/bin/iris stop iris_controller"

Running "as irisadmin" (e.g., via sudo su - irisadmin), the iris_controller can be interactively started with:

 /usr/illumon/latest/bin/iris --debug start iris_controller

This will echo to stdout all the details as the iris script attempts to launch the iris_controller process. Observe this output carefully for errors not captured in standard logs.