Files
architecture/external-tls.md
2026-06-14 15:52:09 +03:00

4.8 KiB

External TLS: public certs for WAN-facing vhosts

Extends generic.md §11 (TLS / PKI). That section and internal-tls.md cover the internal PKI (Smallstep step-ca, *.internal names, mesh-only). This doc covers the other half: publicly-trusted certs for names served to the public internet at a site's WAN edge — e.g. bench.helexa.ai, qapish.ai, *.zap.pics.

Decision rule (the whole strategy in one line):

Public, internet-resolvable name → Let's Encrypt. Mesh-only *.internal name → internal CA (internal-tls.md). A service reached both ways gets one vhost of each (see reverse-proxies.md).

Public certs must chain to a publicly-trusted root (browsers off the mesh don't trust the lair internal root), so these come from Let's Encrypt — never step-ca.


1. Issuance: certbot + Cloudflare DNS-01, ECDSA

Our public DNS zones are on Cloudflare, so we use the DNS-01 challenge via the certbot-dns-cloudflare plugin. DNS-01 is deliberate:

  • No inbound :80 needed. The challenge is a TXT record, not an HTTP hit — so a cert can be issued (or renewed) even while nginx is stopped or the host isn't yet reachable from the WAN. (This is why a dormant edge proxy doesn't block issuance.)
  • Wildcard-capable, if a zone ever wants *.example.com.

Keys are ECDSA (--key-type ecdsa), matching the rest of the fleet.

sudo certbot certonly \
    -m ops@<domain> --agree-tos --no-eff-email --noninteractive \
    --cert-name <domain> \
    --key-type ecdsa \
    --dns-cloudflare \
    --dns-cloudflare-credentials /root/.certbot-internal \
    --dns-cloudflare-propagation-seconds 60 \
    --keep-until-expiring \
    -d <domain>
  • /root/.certbot-internal holds the Cloudflare API token. One token covers all the zones we manage (helexa.ai, zap.pics, …), so new sub-domains under an existing zone need no new credential — just run the command.
  • --keep-until-expiring makes scripted/repeated runs idempotent (no-op if the cert is still valid), so this is safe to call unconditionally from infra-setup.sh.
  • --cert-name <domain> pins the lineage name so the cert lands at a predictable path regardless of -d ordering.

2. Paths

certbot's standard layout (do not relocate — the renew timer expects it):

Path Contents
/etc/letsencrypt/live/<domain>/fullchain.pem cert + intermediate chain
/etc/letsencrypt/live/<domain>/privkey.pem private key

These live under root-only /etc/letsencrypt/live (0700). Scripts that check for an existing cert must sudo test -d /etc/letsencrypt/live/<domain> — an unprivileged test silently returns false and will wrongly conclude the cert is missing.

3. Renewal

Automatic via certbot's own certbot-renew.timer (systemd) — no per-cert unit, unlike the internal step@<name> template. certbot renews any lineage within 30 days of expiry and runs the configured deploy hook. Ensure nginx reloads after renewal with a deploy hook (once per host):

# /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh   (chmod +x)
#!/bin/sh
systemctl reload nginx 2>/dev/null || true

4. nginx wiring

server {
    listen 443 ssl;
    http2 on;
    server_name <domain>;

    ssl_certificate     /etc/letsencrypt/live/<domain>/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/<domain>/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
}

Keep an :80 server for the same name only if you want an HTTP→HTTPS redirect; the cert itself needs no :80 (DNS-01). Never reference a cert path before the cert exists — nginx -t fails on a missing ssl_certificate file and blocks all of nginx from (re)starting. Issue first, then install the TLS vhost (gate the vhost install on sudo test -d /etc/letsencrypt/live/<domain>).

5. Gotchas

  • SAN, not CN. Modern clients ignore CN; the served name must be in the SAN. certbot sets SAN from -d, so this is automatic — but if curl reports "no alternative certificate subject name matches target hostname", the listener answering isn't the one holding this cert (see next point).
  • Wrong cert on the public endpoint = a routing problem, not a cert problem. If a public name returns something like CN=opnsense.<site>.internal, the WAN :443 forward (or HAProxy SNI route) on OPNsense isn't landing on the site's nginx. Fix the edge route (reverse-proxies.md §2), not the cert.

6. Checklist for a new public vhost

  1. Add the public DNS record on Cloudflare (unproxied by default — generic.md §11).
  2. Issue the cert (§1), from infra-setup.sh, idempotently.
  3. Point the nginx vhost at the live/<domain> paths (§4); nginx -t && reload.
  4. Confirm the site's OPNsense forwards WAN :443 to this nginx (reverse-proxies.md).