Envoy runbook
Envoy is an optional third-party reverse proxy and load balancer that can consolidate all external access to a Deephaven cluster through a single endpoint. It simplifies firewall rules, certificate management, and provides advanced routing, load balancing, and observability features for gRPC and HTTP traffic.
Impact of Envoy failure
| Level | Impact |
|---|---|
| Sev 2 - Moderate | External clients cannot access Deephaven through the Envoy endpoint. Clients with direct access to backend services can continue working. Internal traffic between Deephaven services is unaffected. |
Note
Envoy is optional. If not deployed, its failure has no impact. If deployed as the primary ingress point, Envoy failure blocks external access until restored or clients reconfigure to bypass Envoy.
Envoy purpose in Deephaven
Envoy provides several key benefits:
Unified ingress: Single endpoint for all external traffic (Web UI, gRPC API, worker connections)
Simplified networking:
- One port to expose externally (typically 443).
- Single certificate to manage.
- Simpler firewall rules.
Load balancing: Distribute requests across multiple backend servers
Advanced routing: Route based on path, headers, or other criteria
Observability: Detailed metrics, tracing, and logging for all traffic
Protocol support: HTTP/1.1, HTTP/2, gRPC, WebSocket
Envoy dependencies
Envoy has minimal dependencies on Deephaven:
Optional dependencies:
- Remote Query Dispatcher — Registers workers with Envoy for dynamic routing.
- Web API Server — May register with Envoy.
Envoy can start independently and Deephaven services register with it dynamically.
Checking Envoy status
Check the Envoy container is running:
If Envoy is configured as a systemd service:
Expected output shows the deephaven_envoy container listed, or active (running) for the systemd service.
Test Envoy is accepting connections:
Viewing Envoy logs
View logs from the Envoy container:
For Podman:
If Envoy is configured as a systemd service, container stdout is also available in the journal:
Restart procedure
Restart Envoy (drops active connections):
Warning
Restarting Envoy drops all active connections. To apply configuration changes without downtime, use a hot restart instead.
For a hot restart (applies envoy3.yaml changes without dropping connections):
Verify the restart was successful:
Monitor logs during startup:
Expected startup messages:
- Configuration loaded successfully.
- Listeners started on configured ports.
- Admin interface available.
- Clusters initialized.
Envoy admin console
Envoy provides an admin interface for management and debugging (default port: 8001).
View admin console
Returns links to all admin endpoints.
View current configuration
Shows complete Envoy configuration including listeners, routes, clusters.
View log level
Change log level
View statistics
Returns all Envoy metrics.
View active clusters
Shows backend servers and their health status.
View server info
Shows the running Envoy version and uptime. Useful for confirming a restart completed successfully.
Configuring Envoy for Deephaven
The Deephaven Envoy configuration file is envoy3.yaml, located at:
This file is generated by the Deephaven installer or created manually. It is mounted into the deephaven_envoy container at /config.yaml when the container is started.
See Configuring Envoy for the full configuration reference and examples.
TLS configuration
Envoy handles TLS termination for the Deephaven cluster. By default, the Deephaven installer configures Envoy to use the Client Update Service PEM file:
This file is mounted into the container at /lighttpd.pem. For production installations, replace this with a certificate signed by your organization's CA.
See Configuring Envoy for TLS configuration details.
ALPN (Application-Layer Protocol Negotiation)
ALPN allows clients and servers to negotiate which protocol to use (HTTP/1.1, HTTP/2, gRPC).
Symptoms of ALPN issues
Inbound ALPN failures (client → Envoy):
- Clients cannot establish gRPC connections.
- Connection errors mentioning "ALPN negotiation failed".
- HTTP/2 connections failing.
Outbound ALPN failures (Envoy → backend):
- Envoy cannot connect to backend gRPC services.
- 503 errors from Envoy.
- Connection refused errors in Envoy logs.
Solutions for ALPN issues
Enable ALPN for gRPC in Envoy config:
Disable h2 requirement (not recommended):
Only use in testing environments:
Enable h2c (strongly discouraged):
HTTP/2 without TLS — only for trusted internal networks:
Caution
h2c (HTTP/2 cleartext) should never be used in production. It provides no encryption and should only be used for debugging in completely trusted networks.
Certificate trust configuration
Envoy must trust backend server certificates.
Configure truststore
CA bundle must include:
- Root CA that signed backend certificates.
- Intermediate CA certificates.
- Envoy's own CA if using mutual TLS.
See Configuring Envoy for Deephaven-specific configuration properties and worker registration.
Testing with grpcurl
grpcurl is useful for testing gRPC connections through Envoy.
Get truststore certificate
Get protoset for AuthApi
Protobuf definitions are needed for grpcurl:
Test connection
Configuration files and locations
Envoy configuration: /etc/sysconfig/illumon.d/resources/envoy3.yaml
TLS certificate: /etc/sysconfig/illumon.d/client_update_service/lighttpd.pem
systemd service: /etc/systemd/system/envoy.service
Logs: Via docker logs deephaven_envoy or podman logs deephaven_envoy; also available in journalctl -u envoy when running as a systemd service.
Admin interface: http://localhost:8001/