Add two new guidance documents alongside generic.md: - deployment-gitea-actions.md: CI-driven deployment via a Gitea Actions workflow as an alternative to deploy.sh + manifest.yml (§7), with the workflow as the source of infra truth and a scoped gitea_ci runner user. - internal-tls.md: provisioning and renewing per-service internal TLS certs (<service>.internal) for mesh-only nginx vhosts, extending the PKI conventions in §11. Cross-reference both from generic.md and list them in readme.md. Also add a "never suppress errors" rule to the deploy-script conventions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.0 KiB
Internal TLS: per-service certs for mesh services
Extends generic.md §11 (TLS / PKI). That section covers the host identity cert
every host carries (/etc/pki/tls/{misc,private}/$(hostname -f).pem, kept fresh by
step.service). This doc covers the other common case: a per-service vanity cert
for an internal service reached by its own name on the WireGuard mesh — typically an
nginx vhost like gongfoo.internal or vlc-admin.internal.
Use this whenever a service is fronted by a *.internal name that differs from the
host's FQDN. Serving the host cert for a vlc-admin.internal request fails client
verification (the host cert's SAN is the host's FQDN, not the service name), so the
service needs its own cert.
All of this rides on the existing internal PKI: Smallstep step-ca at
https://ca.internal, internal root already trusted fleet-wide at
/etc/pki/ca-trust/source/anchors/root-internal.pem.
1. Naming and DNS
- The service name is
<service>.internal, resolved by split-horizon DNS on the mesh. Never give it a public / Cloudflare record — these names are mesh-only. - The renewal unit is a systemd template instance, and
%ican't contain dots cleanly, so the instance label is the dot-free short name and the unit appends.internal: instancevlc-admin→ serves/renewsvlc-admin.internal. Choose service short names without dots.
2. Paths
Follow the established convention (shared with nginx):
| Path | Contents | Mode |
|---|---|---|
/etc/nginx/tls/cert/<name>.internal.pem |
cert (chain) | 0644 root:root |
/etc/nginx/tls/key/<name>.internal.pem |
private key | 0640 root:root, setfacl u:nginx:r |
setfacl -m u:nginx:r on the key is required when nginx workers must read it
(e.g. a proxy_ssl_certificate_key for mTLS to an internal backend). For a plain
server cert the master (root) reads it at load time and the ACL is belt-and-suspenders —
set it anyway for consistency.
3. Renewal: the step@ template
Renewal is autonomous via a templated unit pair, instantiated per service
(step@<name>.timer). The cert renews itself over mTLS (no provisioner needed once a
cert exists), and reloads nginx on success:
# /etc/systemd/system/step@.service
[Service]
Type=oneshot
ExecCondition=/usr/bin/step certificate needs-renewal /etc/nginx/tls/cert/%i.internal.pem
ExecStart=/usr/bin/step ca renew --force \
--ca-url https://ca.internal \
--root /etc/pki/ca-trust/source/anchors/root-internal.pem \
/etc/nginx/tls/cert/%i.internal.pem \
/etc/nginx/tls/key/%i.internal.pem
ExecStartPost=/usr/bin/systemctl reload nginx.service
# /etc/systemd/system/step@.timer
[Timer]
Persistent=true
OnCalendar=*:1/15 # every 15 min; certs are short-lived (24h)
RandomizedDelaySec=5m
[Install]
WantedBy=timers.target
Enable per service: systemctl enable --now step@<name>.timer. The ExecCondition
makes it a clean no-op until a cert exists, so enabling it before the first mint is
harmless.
4. Initial minting
The timer only renews an existing cert; the first cert is minted explicitly via
the JWK provisioner (lair). Mint it from the provisioning script (infra-setup.sh,
see deployment-gitea-actions.md §2) by shipping the provisioner password to the host
just long enough to issue the cert:
name=<service>
cert=/etc/nginx/tls/cert/${name}.internal.pem
key=/etc/nginx/tls/key/${name}.internal.pem
# Skip if already valid (verify checks chain/expiry, not the name).
state=$(ssh "$host" "[ -f $cert ] && step certificate verify $cert \
--roots /etc/pki/ca-trust/source/anchors/root-internal.pem >/dev/null 2>&1 \
&& echo valid || echo missing")
if [ "$state" != valid ]; then
# provisioner password lives at ~/.step/secrets/provisioner on the operator box
rsync -az --rsync-path='sudo rsync' --chmod=0600 \
~/.step/secrets/provisioner "$host:/tmp/${name}-provisioner"
ssh "$host" "
sudo mkdir -p /etc/nginx/tls/cert /etc/nginx/tls/key
rc=0
sudo step ca certificate --force \
--provisioner lair \
--provisioner-password-file /tmp/${name}-provisioner \
--ca-url https://ca.internal \
--root /etc/pki/ca-trust/source/anchors/root-internal.pem \
--san ${name}.internal \
${name}.internal $cert $key || rc=\$?
sudo rm -f /tmp/${name}-provisioner # always remove the credential
[ \$rc -eq 0 ] || { echo 'mint failed' >&2; exit \$rc; }
sudo chown root:root $cert $key
sudo chmod 644 $cert; sudo chmod 640 $key
sudo setfacl -m u:nginx:r $key"
fi
systemctl enable --now step@${name}.timer # on the host
Rules that matter:
- Always pass
--san <name>.internal. Modern TLS clients ignore CN and require a matching SAN; a CN-only cert fails with "no alternative certificate subject name matches target hostname". - Remove the provisioner password even on failure (capture the exit code,
rm, then propagate). Never leave the credential on the host. - The password file convention is
~/.step/secrets/provisioneron the operator workstation — the same onedeploy.sh-style scripts use.
5. nginx wiring
server {
listen 443 ssl;
server_name <name>.internal;
ssl_certificate /etc/nginx/tls/cert/<name>.internal.pem;
ssl_certificate_key /etc/nginx/tls/key/<name>.internal.pem;
# ... + proxy_ssl_certificate{,_key} with the same paths if the upstream wants mTLS
}
Clients verify against the internal root (--cacert /etc/pki/ca-trust/source/anchors/root-internal.pem), which is already in the fleet
trust store, so browsers and curl on any mesh host trust it without extra flags.
6. Checklist for a new mesh service cert
- Pick a dot-free short name; add split-horizon DNS
<name>.internal → <host>(no public record). - Mint the first cert with
--san <name>.internal(§4), frominfra-setup.sh. systemctl enable --now step@<name>.timerfor renewal.- Point the nginx vhost at the cert/key paths (§5);
nginx -t&& reload.