Troubleshooting Envoy
This guide provides steps for troubleshooting common issues with Envoy when used as a front proxy for Deephaven.
General Diagnostic Checklist
Before diving into specific error codes, start with the checks that require the fewest assumptions. In particular, do not begin with the admin-interface commands unless you already know that Envoy's admin interface is enabled and reachable. If you are not sure how Envoy is installed, start with Determine Envoy installation method, then use the matching process and log commands for that installation type.
A good first pass is: confirm that Envoy is running, check for obvious startup or configuration errors in the logs, and only then move on to admin-interface checks such as /clusters or /config_dump. Those admin endpoints are described in more detail in Using the Admin Interface. If the admin interface is not available yet, you may find your answers in the Envoy request logs.
-
Determine how Envoy is installed. If you are not sure whether this is a native, systemd-managed, Docker, or Podman install, follow Determine Envoy installation method first.
-
Confirm that Envoy is running. Use the command that matches your installation method.
-
Check for obvious startup or configuration errors in the logs. Again, use the command that matches your installation method.
-
Check whether backend clusters are healthy.
This endpoint requires the admin interface to be enabled and reachable.
Look for healthy upstreams, including
xds_servicemembers showinghealth_flags::healthy. -
Check whether the expected configuration is loaded.
This endpoint requires the admin interface to be enabled and reachable.
Verify that listeners, routes, and clusters match your expectations. For route-specific problems, you can narrow the output:
-
Inspect the access/request log for failing requests, unexpected routes, or upstream mismatches. The exact collection method depends on how Envoy is installed and how the access log is configured; see Envoy request logs for the supported locations and commands.
When reading request logs, look for:
- the request path and method,
- the returned status code,
- the selected upstream host,
- any timeout, reset, or transport-failure indicators.
Each request to Envoy results in a log line like the following:
Among other fields, Envoy log lines contain the request path (for example,
/io.deephaven.proto.auth.grpc.AuthApi/getToken), HTTP status code (for example,200), and backend address (for example,10.1.2.4:9031).
Determine Envoy installation method
To manage your Envoy installation, you need to know how it is installed.
If you do not know this information, you can inspect the processes running on your Envoy host machine:
The above command will show the running process and its parent processes.
A containerized installation will look like this:
A native installation will look something like this:
For clusters using the native Deephaven installer, you can find the expected installation mode from your cluster.cnf:
Reloading Envoy
When you change the envoy.yaml configuration, it is possible to tell the Envoy process to hot-reload the new configuration without restarting the process or cancelling any connections.
Envoy's hot reload process is fairly complex, as it requires you to keep track of how many times you have previously performed a hot reload.
The simplest option is to restart Envoy, but all in-progress connections will be dropped.
Restarting Envoy
If you have rotated your certificates or made any non-trivial changes to your system, you should restart the Envoy process.
If you have only edited envoy.yaml configuration, you may be able to simply Reload Envoy configuration rather than restarting the process.
The method to restart Envoy depends on your Envoy installation method.
If you used any of the suggested installation methods, one of the following commands will work for you:
When using the Deephaven native installer with Deephaven-managed Envoy, the Envoy process automatically restarts during cluster upgrades. When managing Envoy yourself, you must restart Envoy after the installation completes.
Using the admin interface
The Envoy admin interface is a useful tool for debugging client interaction with the Deephaven server. When enabled, it listens on port 8001. It also serves a web application for browsing Envoy state, but because it can expose internal state and sensitive information in logs, access should be restricted.
Important
Secure the Envoy admin port to localhost connections only. Sample envoy.yaml:
For containerized Envoy installs, access can be retricted to 127.0.0.1 when you publish the port:
docker run -p 127.0.0.1:8001:8001orpodman run -p 127.0.0.1:8001:8001.- For Podman Quadlets:
PublishPort=127.0.0.1:8001:8001. - You may need to destroy and recreate the container for port publishing changes to take effect.
- Check published ports via
docker port deephaven_envoyorpodman port deephaven_envoy.
Your admin port can also be restricted with firewall or security-group rules.
/config_dump: Shows the entire loaded configuration. This is useful for verifying that yourenvoy3.yamland dynamic xDS updates have been applied correctly. You can filter it for specific resources, like routes (?resource=rds_config)./clusters: Provides a detailed status of all upstream clusters, including IP addresses, health status, and connection statistics. This is the best way to check if Envoy can connect to the backend Deephaven services./stats: Outputs a large number of performance metrics. You can usegrepto find specific stats, likeupstream_cx_totalfor connection counts orhttp.downstream_rq_5xxfor server errors./server_info: Displays the running Envoy version and its uptime, which is useful for confirming that a restart was successful.
Envoy application logs
Envoy emits application logs for startup/configuration events, routing decisions, and detailed upstream/downstream error messages.
You can adjust log levels and component-specific logging through the admin interface, which must be enabled for these commands to work. Envoy's documentation on the admin interface logging endpoints includes additional options and examples.
How you view application logs depends on how Envoy is installed:
- systemd / native install:
- Containerized install:
You can temporarily increase logging via the admin interface:
To return to the default level:
When you need details for a specific issue (for example, routing or upstream connection problems), enable debug logging for specific subsystems using paths=...:
Common log levels to increase are http, http2, router, upstream, connection, grpc and filter.
Caution
Debug and trace logging can be very noisy and may include request metadata. Limit the time window, reproduce the issue, then revert the log level.
Envoy request logs
Envoy request logs are recorded via the access log configured on the route/listener (for Deephaven-managed Envoy, this is typically set in the RDS route configuration). The admin interface must be enabled to use the troubleshooting commands in this guide that inspect live config and cluster health.
Envoy's documentation on access logs covers additional output formats and fields.
Access logs are ideal for quickly answering:
- Which requests are failing (paths/methods), and with what status codes?
- Which route was selected?
- Which upstream cluster/host handled the request?
- Whether Envoy saw a transport failure, timeout, or reset.
What to collect for a troubleshooting bundle
When reporting an Envoy issue, collect the following (keep the window small: typically 5–30 minutes around the failure):
- Access log (request log)
- Why: best high-level signal for request failures and routing/upstream selection.
- Example (common location in these docs):
- Application logs (journal/container logs)
- Why: includes warnings/errors that do not appear in access logs (config reload errors, upstream TLS failures, disconnect reasons).
- Admin interface snapshots
- Why: captures the loaded config and current upstream health at the time of failure.
- Examples:
Important
Logs and admin output may contain sensitive information (hostnames, IPs, request headers, and possibly user identifiers). Collect only the minimum time window needed and review/redact before sharing.
Improve access logs with JSON lines
For troubleshooting intermittent failures, JSON access logs are easier to search and parse.
The following is an example access log format block that emits one JSON object per line (unescaped JSON lines). The snippet is shown in minimal context; placement may vary depending on whether your configuration uses per-route or per-virtual-host logging.
Key fields to pay attention to:
request_id: correlates retries and makes it easier to match a single request across logs.route_name: confirms which route matched.upstream_cluster: identifies which backend service Envoy routed to.upstream_transport_failure_reason: often includes a concrete reason when the upstream connection fails (for example, TLS, reset, or connect failures).response_flags: compact flags that indicate common failure categories (timeouts, resets, upstream failures).response_code_details: additional detail behind the response code.grpc_status: for gRPC calls, helps distinguish transport success from application-level status.
Important
Ensure the access log is emitted as JSON lines (one JSON object per line). Avoid double-escaped JSON strings, which are much harder to search and parse.
Common Issues and Resolutions
Connection Refused
- Symptom: Your browser or client shows a "Connection Refused" error when trying to connect to the Envoy port (e.g.,
8000). - Cause: This typically means the Envoy process is not running or not listening on the correct port.
- Troubleshooting Steps:
- Verify that the Envoy process or container is running using the checklist above.
- Check the Envoy logs for startup errors, such as a port conflict or a syntax error in the configuration file.
- Ensure no firewall rules on the host or network are blocking access to the port.
503 Service Unavailable
- Symptom: You receive a
503 Service Unavailableerror. This is often accompanied byno healthy upstreammessages in the logs. - Cause: Envoy is running but cannot establish a healthy connection to the backend Deephaven services.
- Troubleshooting Steps:
- Use the
/clustersadmin endpoint to identify which cluster is unhealthy. - Verify that the backend Deephaven services (e.g.,
web-api,xds_service) are running and accessible from the Envoy host. - Check for network connectivity issues (e.g., firewall rules, incorrect IP addresses in
envoy3.yaml).
- Use the
404 Not Found
- Symptom: You receive a
404 Not Founderror for a specific URL. - Cause: Envoy is running and connected, but the requested URL path does not match any configured route.
- Troubleshooting Steps:
- Verify the URL you are trying to access is correct.
- Dump the route configuration to ensure the routes are correctly defined and loaded from the Deephaven RDS.
- Check the Deephaven Configuration Server logs to ensure it is correctly publishing routes to Envoy.
WebSocket Connection Failures
- Symptom: The Deephaven Web UI loads, but you cannot open a query console, or data does not update in real-time. Browser developer tools show a failed WebSocket handshake.
- Cause: The WebSocket upgrade request is being blocked or misconfigured.
- Troubleshooting Steps:
- Verify that the
upgrade_configssection is present in thehttp_connection_managerfilter in yourenvoy3.yamlfile. - Check for any intermediate network devices (like corporate firewalls or other proxies) between the client and Envoy that might be blocking WebSocket traffic.
- Inspect the Envoy logs for errors related to
upgrade failure.
- Verify that the
TLS/SSL Certificate Issues
- Symptom: The browser shows a security warning (e.g.,
NET::ERR_CERT_AUTHORITY_INVALID), or connections fail with a TLS handshake error. - Cause: The TLS certificate is not correctly configured, trusted, or presented by Envoy.
- Troubleshooting Steps:
- Verify the certificate input that Envoy is expected to read.
- For installer-managed Envoy, confirm
DH_ENVOY_LOCAL_CERTpoints to the correct PEM bundle and that the runtime path inDH_ENVOY_PEM_PATHmatches the generated YAML. - For manual container-based Envoy, confirm the volume mount matches the runtime path in the YAML, commonly
/envoy.pem.
- For installer-managed Envoy, confirm
- Ensure the PEM bundle contains the certificate chain followed by the private key and is readable by the Envoy process.
- Use a command-line tool like
opensslto inspect the certificate that Envoy is presenting:
- Verify the certificate input that Envoy is expected to read.