Troubleshooting Envoy
This guide provides steps for troubleshooting common issues with Envoy when used as a front proxy for Deephaven.
General Diagnostic Checklist
Before diving into specific error codes, follow these initial steps to quickly assess the state of your Envoy instance. This methodical approach can often pinpoint the issue right away.
- Is the Envoy container running?
sudo docker ps -f name=deephaven_envoy
- Are there any obvious errors in the startup logs?
sudo docker logs deephaven_envoy
- Are all backend clusters healthy?
Look for
curl http://localhost:8001/clusters
health_flags::healthy
on allxds_cluster
members. - Is the configuration loaded correctly?
Check that the listeners, routes, and clusters match your expectations.
curl http://localhost:8001/config_dump
Using the Admin Interface
The Envoy admin interface is a powerful tool for debugging. By default, it is accessible on port 8001
. You can use curl
to inspect the configuration, view statistics, and more.
/config_dump
: Shows the entire loaded configuration. This is useful for verifying that yourenvoy3.yaml
and dynamic xDS updates have been applied correctly. You can filter it for specific resources, like routes (?resource=rds_config
)./clusters
: Provides a detailed status of all upstream clusters, including IP addresses, health status, and connection statistics. This is the best way to check if Envoy can connect to the backend Deephaven services./stats
: Outputs a large number of performance metrics. You can usegrep
to find specific stats, likeupstream_cx_total
for connection counts orhttp.downstream_rq_5xx
for server errors./server_info
: Displays the running Envoy version and its uptime, which is useful for confirming that a restart was successful.
Common Issues and Resolutions
Connection Refused
- Symptom: Your browser or client shows a "Connection Refused" error when trying to connect to the Envoy port (e.g.,
8000
). - Cause: This typically means the Envoy process is not running or not listening on the correct port.
- Troubleshooting Steps:
- Verify that the Envoy process or container is running using the checklist above.
- Check the Envoy logs for startup errors, such as a port conflict or a syntax error in the configuration file.
- Ensure no firewall rules on the host or network are blocking access to the port.
503 Service Unavailable
- Symptom: You receive a
503 Service Unavailable
error. This is often accompanied byno healthy upstream
messages in the logs. - Cause: Envoy is running but cannot establish a healthy connection to the backend Deephaven services.
- Troubleshooting Steps:
- Use the
/clusters
admin endpoint to identify which cluster is unhealthy. - Verify that the backend Deephaven services (e.g.,
web-api
,xds_service
) are running and accessible from the Envoy host. - Check for network connectivity issues (e.g., firewall rules, incorrect IP addresses in
envoy3.yaml
).
- Use the
404 Not Found
- Symptom: You receive a
404 Not Found
error for a specific URL. - Cause: Envoy is running and connected, but the requested URL path does not match any configured route.
- Troubleshooting Steps:
- Verify the URL you are trying to access is correct.
- Dump the route configuration to ensure the routes are correctly defined and loaded from the Deephaven RDS.
curl http://localhost:8001/config_dump?resource=rds_config
- Check the Deephaven Configuration Server logs to ensure it is correctly publishing routes to Envoy.
WebSocket Connection Failures
- Symptom: The Deephaven Web UI loads, but you cannot open a query console, or data does not update in real-time. Browser developer tools show a failed WebSocket handshake.
- Cause: The WebSocket upgrade request is being blocked or misconfigured.
- Troubleshooting Steps:
- Verify that the
upgrade_configs
section is present in thehttp_connection_manager
filter in yourenvoy3.yaml
file. - Check for any intermediate network devices (like corporate firewalls or other proxies) between the client and Envoy that might be blocking WebSocket traffic.
- Inspect the Envoy logs for errors related to
upgrade failure
.
- Verify that the
TLS/SSL Certificate Issues
- Symptom: The browser shows a security warning (e.g.,
NET::ERR_CERT_AUTHORITY_INVALID
), or connections fail with a TLS handshake error. - Cause: The TLS certificate is not correctly configured, trusted, or presented by Envoy.
- Troubleshooting Steps:
- Verify that the paths to your TLS certificate (
fullchain.pem
) and private key (privkey.pem
) in thedocker run
command's volume mounts are correct. - Ensure that the files have the correct permissions and are readable by the user ID that Envoy is running as inside the container (e.g.,
9002
). - Use a command-line tool like
openssl
to inspect the certificate that Envoy is presenting:openssl s_client -connect your-envoy-host:8000
- Verify that the paths to your TLS certificate (