docs: add reverse-proxy topology + external-TLS conventions
Capture the cert + edge-proxy conventions worked through deploying the helexa-bench UI: - external-tls.md — publicly-trusted certs via Let's Encrypt (certbot, Cloudflare DNS-01, ECDSA, /root/.certbot-internal); the external counterpart to internal-tls.md. Decision rule: public name → LE, *.internal → internal CA. - reverse-proxies.md — names the per-site edge proxies (oolon for kosherinata, hanzalova.internal for the office) and what sits behind each, the public-vs-mesh access paths + the "public names don't hairpin from inside the mesh" gotcha, per-vhost cert choice, nginx conventions, and the bench (bench.helexa.ai + bench.internal) worked example. - readme + generic.md §11 cross-reference both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
87
reverse-proxies.md
Normal file
87
reverse-proxies.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# Reverse proxies and edge ingress
|
||||
|
||||
Extends `generic.md` §11 (Network / Ingress). That section says "per-site nginx reverse
|
||||
proxy terminates all WAN inbound 443"; this doc names the proxies, maps what sits behind
|
||||
each, and pins down the two access paths and the per-vhost cert choice — plus the one
|
||||
gotcha that bites every time (a public name doesn't work from *inside* the mesh).
|
||||
|
||||
---
|
||||
|
||||
## 1. The proxies (one per site)
|
||||
|
||||
Each WireGuard site has a single nginx edge proxy. All WAN-inbound 443 for that site is
|
||||
port-forwarded by the site's OPNsense router to its proxy, which terminates TLS and
|
||||
fans out to internal upstreams.
|
||||
|
||||
| Site | Edge proxy (nginx host) | Notable hosts behind it |
|
||||
| --- | --- | --- |
|
||||
| **kosherinata** (DC) | `oolon.kosherinata.internal` | `magrathea` (Postgres primary), `nikola`, `gramathea`, … |
|
||||
| **hanzalova** (office) | `hanzalova.internal` | GPU/inference: `beast`, `benjy`, `quadbrat`; `bob` (helexa-bench API + Agent Zero); `frankie` (Postgres streaming standby); `trillian`; the workstation |
|
||||
|
||||
Site octet encodes the mesh subnet (`10.<site>.0.0/16`); see `generic.md` §11. New
|
||||
office services front on `hanzalova.internal`; new DC services on `oolon`.
|
||||
|
||||
## 2. Two access paths — and the mesh hairpin gotcha
|
||||
|
||||
A service can be reached two ways, and they are **not** interchangeable:
|
||||
|
||||
- **Public (from the WAN):** public DNS (Cloudflare, unproxied by default) → site WAN IP
|
||||
→ OPNsense forwards `:443` → site nginx → upstream. Cert: **Let's Encrypt**
|
||||
(`external-tls.md`).
|
||||
- **Internal (from the mesh):** split-horizon `.internal` DNS → the host/proxy directly
|
||||
over WireGuard → nginx. Cert: **internal CA** (`internal-tls.md`).
|
||||
|
||||
> **Gotcha — public names don't hairpin.** From *inside* the mesh, a public name still
|
||||
> resolves (via public DNS) to the site's **WAN** IP, so the packet hits the OPNsense
|
||||
> **LAN** interface — which only forwards `:443` inbound from the **WAN**, not from the
|
||||
> LAN. The connection dead-ends (or worse, gets OPNsense's own default cert). So a
|
||||
> service that mesh clients also need must be published under a **`*.internal` name with
|
||||
> its own internal-CA vhost**, in addition to its public vhost.
|
||||
|
||||
This is why dual-audience services get **two vhosts** on the same proxy — one public
|
||||
(LE), one internal (`lair` CA) — usually sharing one webroot and one upstream.
|
||||
|
||||
## 3. Per-vhost cert choice
|
||||
|
||||
| vhost audience | name | cert | doc |
|
||||
| --- | --- | --- | --- |
|
||||
| Public / WAN | `<svc>.<public-zone>` (e.g. `bench.helexa.ai`) | Let's Encrypt (certbot, Cloudflare DNS-01, ECDSA) | `external-tls.md` |
|
||||
| Mesh-only | `<svc>.internal` | internal CA (`step ca`, `lair` provisioner, `step@` renewal) | `internal-tls.md` |
|
||||
|
||||
Provisioner credentials for the internal CA (`~/.step/secrets/provisioner`, shipped to
|
||||
the host transiently and removed) are covered in `internal-tls.md` §4.
|
||||
|
||||
## 4. nginx conventions on the proxies
|
||||
|
||||
- **`sites-available/` + `sites-enabled/` symlink**, included via
|
||||
`/etc/nginx/conf.d/sites-enabled.conf` (`include /etc/nginx/sites-enabled/*.conf;`).
|
||||
One file per `server_name`; enable with a relative symlink
|
||||
(`ln -sf ../sites-available/<name>.conf /etc/nginx/sites-enabled/`).
|
||||
- **Static SPA** served from `/var/www/<name>` with SPA fallback
|
||||
(`try_files $uri $uri/ /index.html;`); **API** reverse-proxied to the internal
|
||||
`host:port`. Internal vhosts add `ssl_trusted_certificate <internal root>` and pin
|
||||
`ssl_protocols TLSv1.3`.
|
||||
- **SELinux (enforcing):** webroots must be labelled `httpd_sys_content_t` or nginx
|
||||
returns **403**. After creating/populating `/var/www/<name>`, run
|
||||
`restorecon -R /var/www/<name>`; rsynced files inherit the dir's type.
|
||||
- **Never reference a cert path before the cert exists** — `nginx -t` fails on a missing
|
||||
`ssl_certificate` and blocks the whole server from (re)starting. Issue the cert, then
|
||||
install the TLS vhost (gate on the cert's presence; serve an http-only bootstrap until
|
||||
then if needed).
|
||||
- Config + cert/renewal wiring is installed idempotently from each project's
|
||||
`infra-setup.sh` (`deployment-gitea-actions.md` §2); the recurring artifact rsync
|
||||
(e.g. built SPA `dist/`) rides in the deploy workflow.
|
||||
|
||||
## 5. Worked example: helexa-bench UI
|
||||
|
||||
The bench visualisation is reached both ways, fronted by `hanzalova.internal`:
|
||||
|
||||
| vhost | cert | DNS |
|
||||
| --- | --- | --- |
|
||||
| `bench.helexa.ai` (public) | Let's Encrypt | Cloudflare A → office WAN IP; OPNsense forwards WAN `:443` → `hanzalova` |
|
||||
| `bench.internal` (mesh) | internal `lair` CA, renewed by `step@bench.timer` | split-horizon `bench.internal → hanzalova` mesh IP |
|
||||
|
||||
Both vhosts share one webroot (`/var/www/bench.helexa.ai`, the built SPA) and proxy
|
||||
`/api` to the helexa-bench read API on `bob.hanzalova.internal:13132`. The internal vhost
|
||||
exists precisely because of §2: from a workstation on the mesh, `bench.helexa.ai`
|
||||
hairpins to the OPNsense LAN interface and fails, so mesh users hit `bench.internal`.
|
||||
Reference in New Issue
Block a user