Files
architecture/reverse-proxies.md
rob thijssen 746c55fe94 docs: add reverse-proxy topology + external-TLS conventions
Capture the cert + edge-proxy conventions worked through deploying the
helexa-bench UI:

- external-tls.md — publicly-trusted certs via Let's Encrypt (certbot,
  Cloudflare DNS-01, ECDSA, /root/.certbot-internal); the external
  counterpart to internal-tls.md. Decision rule: public name → LE,
  *.internal → internal CA.
- reverse-proxies.md — names the per-site edge proxies (oolon for
  kosherinata, hanzalova.internal for the office) and what sits behind
  each, the public-vs-mesh access paths + the "public names don't
  hairpin from inside the mesh" gotcha, per-vhost cert choice, nginx
  conventions, and the bench (bench.helexa.ai + bench.internal) worked
  example.
- readme + generic.md §11 cross-reference both.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 15:50:57 +03:00

4.8 KiB

Reverse proxies and edge ingress

Extends generic.md §11 (Network / Ingress). That section says "per-site nginx reverse proxy terminates all WAN inbound 443"; this doc names the proxies, maps what sits behind each, and pins down the two access paths and the per-vhost cert choice — plus the one gotcha that bites every time (a public name doesn't work from inside the mesh).


1. The proxies (one per site)

Each WireGuard site has a single nginx edge proxy. All WAN-inbound 443 for that site is port-forwarded by the site's OPNsense router to its proxy, which terminates TLS and fans out to internal upstreams.

Site Edge proxy (nginx host) Notable hosts behind it
kosherinata (DC) oolon.kosherinata.internal magrathea (Postgres primary), nikola, gramathea, …
hanzalova (office) hanzalova.internal GPU/inference: beast, benjy, quadbrat; bob (helexa-bench API + Agent Zero); frankie (Postgres streaming standby); trillian; the workstation

Site octet encodes the mesh subnet (10.<site>.0.0/16); see generic.md §11. New office services front on hanzalova.internal; new DC services on oolon.

2. Two access paths — and the mesh hairpin gotcha

A service can be reached two ways, and they are not interchangeable:

  • Public (from the WAN): public DNS (Cloudflare, unproxied by default) → site WAN IP → OPNsense forwards :443 → site nginx → upstream. Cert: Let's Encrypt (external-tls.md).
  • Internal (from the mesh): split-horizon .internal DNS → the host/proxy directly over WireGuard → nginx. Cert: internal CA (internal-tls.md).

Gotcha — public names don't hairpin. From inside the mesh, a public name still resolves (via public DNS) to the site's WAN IP, so the packet hits the OPNsense LAN interface — which only forwards :443 inbound from the WAN, not from the LAN. The connection dead-ends (or worse, gets OPNsense's own default cert). So a service that mesh clients also need must be published under a *.internal name with its own internal-CA vhost, in addition to its public vhost.

This is why dual-audience services get two vhosts on the same proxy — one public (LE), one internal (lair CA) — usually sharing one webroot and one upstream.

3. Per-vhost cert choice

vhost audience name cert doc
Public / WAN <svc>.<public-zone> (e.g. bench.helexa.ai) Let's Encrypt (certbot, Cloudflare DNS-01, ECDSA) external-tls.md
Mesh-only <svc>.internal internal CA (step ca, lair provisioner, step@ renewal) internal-tls.md

Provisioner credentials for the internal CA (~/.step/secrets/provisioner, shipped to the host transiently and removed) are covered in internal-tls.md §4.

4. nginx conventions on the proxies

  • sites-available/ + sites-enabled/ symlink, included via /etc/nginx/conf.d/sites-enabled.conf (include /etc/nginx/sites-enabled/*.conf;). One file per server_name; enable with a relative symlink (ln -sf ../sites-available/<name>.conf /etc/nginx/sites-enabled/).
  • Static SPA served from /var/www/<name> with SPA fallback (try_files $uri $uri/ /index.html;); API reverse-proxied to the internal host:port. Internal vhosts add ssl_trusted_certificate <internal root> and pin ssl_protocols TLSv1.3.
  • SELinux (enforcing): webroots must be labelled httpd_sys_content_t or nginx returns 403. After creating/populating /var/www/<name>, run restorecon -R /var/www/<name>; rsynced files inherit the dir's type.
  • Never reference a cert path before the cert existsnginx -t fails on a missing ssl_certificate and blocks the whole server from (re)starting. Issue the cert, then install the TLS vhost (gate on the cert's presence; serve an http-only bootstrap until then if needed).
  • Config + cert/renewal wiring is installed idempotently from each project's infra-setup.sh (deployment-gitea-actions.md §2); the recurring artifact rsync (e.g. built SPA dist/) rides in the deploy workflow.

5. Worked example: helexa-bench UI

The bench visualisation is reached both ways, fronted by hanzalova.internal:

vhost cert DNS
bench.helexa.ai (public) Let's Encrypt Cloudflare A → office WAN IP; OPNsense forwards WAN :443hanzalova
bench.internal (mesh) internal lair CA, renewed by step@bench.timer split-horizon bench.internal → hanzalova mesh IP

Both vhosts share one webroot (/var/www/bench.helexa.ai, the built SPA) and proxy /api to the helexa-bench read API on bob.hanzalova.internal:13132. The internal vhost exists precisely because of §2: from a workstation on the mesh, bench.helexa.ai hairpins to the OPNsense LAN interface and fails, so mesh users hit bench.internal.