Envoy runbook

Envoy is an optional third-party reverse proxy and load balancer that can consolidate all external access to a Deephaven cluster through a single endpoint. It simplifies firewall rules, certificate management, and provides advanced routing, load balancing, and observability features for gRPC and HTTP traffic.

Impact of Envoy failure

LevelImpact
Sev 2 - ModerateExternal clients cannot access Deephaven through the Envoy endpoint. Clients with direct access to backend services can continue working. Internal traffic between Deephaven services is unaffected.

Note

Envoy is optional. If not deployed, its failure has no impact. If deployed as the primary ingress point, Envoy failure blocks external access until restored or clients reconfigure to bypass Envoy.

Envoy purpose in Deephaven

Envoy provides several key benefits:

Unified ingress: Single endpoint for all external traffic (Web UI, gRPC API, worker connections)

Simplified networking:

  • One port to expose externally (typically 443).
  • Single certificate to manage.
  • Simpler firewall rules.

Load balancing: Distribute requests across multiple backend servers

Advanced routing: Route based on path, headers, or other criteria

Observability: Detailed metrics, tracing, and logging for all traffic

Protocol support: HTTP/1.1, HTTP/2, gRPC, WebSocket

Envoy dependencies

Envoy has minimal dependencies on Deephaven:

Optional dependencies:

  • Remote Query Dispatcher — Registers workers with Envoy for dynamic routing.
  • Web API Server — May register with Envoy.

Envoy can start independently and Deephaven services register with it dynamically.

Checking Envoy status

Check the Envoy container is running:

If Envoy is configured as a systemd service:

Expected output shows the deephaven_envoy container listed, or active (running) for the systemd service.

Test Envoy is accepting connections:

Viewing Envoy logs

View logs from the Envoy container:

For Podman:

If Envoy is configured as a systemd service, container stdout is also available in the journal:

Restart procedure

Restart Envoy (drops active connections):

Warning

Restarting Envoy drops all active connections. To apply configuration changes without downtime, use a hot restart instead.

For a hot restart (applies envoy3.yaml changes without dropping connections):

Verify the restart was successful:

Monitor logs during startup:

Expected startup messages:

  • Configuration loaded successfully.
  • Listeners started on configured ports.
  • Admin interface available.
  • Clusters initialized.

Envoy admin console

Envoy provides an admin interface for management and debugging (default port: 8001).

View admin console

Returns links to all admin endpoints.

View current configuration

Shows complete Envoy configuration including listeners, routes, clusters.

View log level

Change log level

View statistics

Returns all Envoy metrics.

View active clusters

Shows backend servers and their health status.

View server info

Shows the running Envoy version and uptime. Useful for confirming a restart completed successfully.

Configuring Envoy for Deephaven

The Deephaven Envoy configuration file is envoy3.yaml, located at:

This file is generated by the Deephaven installer or created manually. It is mounted into the deephaven_envoy container at /config.yaml when the container is started.

See Configuring Envoy for the full configuration reference and examples.

TLS configuration

Envoy handles TLS termination for the Deephaven cluster. By default, the Deephaven installer configures Envoy to use the Client Update Service PEM file:

This file is mounted into the container at /lighttpd.pem. For production installations, replace this with a certificate signed by your organization's CA.

See Configuring Envoy for TLS configuration details.

ALPN (Application-Layer Protocol Negotiation)

ALPN allows clients and servers to negotiate which protocol to use (HTTP/1.1, HTTP/2, gRPC).

Symptoms of ALPN issues

Inbound ALPN failures (client → Envoy):

  • Clients cannot establish gRPC connections.
  • Connection errors mentioning "ALPN negotiation failed".
  • HTTP/2 connections failing.

Outbound ALPN failures (Envoy → backend):

  • Envoy cannot connect to backend gRPC services.
  • 503 errors from Envoy.
  • Connection refused errors in Envoy logs.

Solutions for ALPN issues

Enable ALPN for gRPC in Envoy config:

Disable h2 requirement (not recommended):

Only use in testing environments:

Enable h2c (strongly discouraged):

HTTP/2 without TLS — only for trusted internal networks:

Caution

h2c (HTTP/2 cleartext) should never be used in production. It provides no encryption and should only be used for debugging in completely trusted networks.

Certificate trust configuration

Envoy must trust backend server certificates.

Configure truststore

CA bundle must include:

  • Root CA that signed backend certificates.
  • Intermediate CA certificates.
  • Envoy's own CA if using mutual TLS.

See Configuring Envoy for Deephaven-specific configuration properties and worker registration.

Testing with grpcurl

grpcurl is useful for testing gRPC connections through Envoy.

Get truststore certificate

Get protoset for AuthApi

Protobuf definitions are needed for grpcurl:

Test connection

Configuration files and locations

Envoy configuration: /etc/sysconfig/illumon.d/resources/envoy3.yaml

TLS certificate: /etc/sysconfig/illumon.d/client_update_service/lighttpd.pem

systemd service: /etc/systemd/system/envoy.service

Logs: Via docker logs deephaven_envoy or podman logs deephaven_envoy; also available in journalctl -u envoy when running as a systemd service.

Admin interface: http://localhost:8001/