docs: add CI deployment and internal-TLS guidance, cross-reference from generic

Add two new guidance documents alongside generic.md:

- deployment-gitea-actions.md: CI-driven deployment via a Gitea Actions
  workflow as an alternative to deploy.sh + manifest.yml (§7), with the
  workflow as the source of infra truth and a scoped gitea_ci runner user.
- internal-tls.md: provisioning and renewing per-service internal TLS
  certs (<service>.internal) for mesh-only nginx vhosts, extending the
  PKI conventions in §11.

Cross-reference both from generic.md and list them in readme.md. Also
add a "never suppress errors" rule to the deploy-script conventions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-14 15:43:18 +03:00
parent 83652460ed
commit 200c41b4f1
4 changed files with 365 additions and 0 deletions

149
internal-tls.md Normal file
View File

@@ -0,0 +1,149 @@
# Internal TLS: per-service certs for mesh services
Extends `generic.md` §11 (TLS / PKI). That section covers the **host identity cert**
every host carries (`/etc/pki/tls/{misc,private}/$(hostname -f).pem`, kept fresh by
`step.service`). This doc covers the other common case: a **per-service vanity cert**
for an internal service reached by its own name on the WireGuard mesh — typically an
nginx vhost like `gongfoo.internal` or `vlc-admin.internal`.
Use this whenever a service is fronted by a `*.internal` name that differs from the
host's FQDN. Serving the host cert for a `vlc-admin.internal` request fails client
verification (the host cert's SAN is the host's FQDN, not the service name), so the
service needs its own cert.
All of this rides on the existing internal PKI: Smallstep `step-ca` at
`https://ca.internal`, internal root already trusted fleet-wide at
`/etc/pki/ca-trust/source/anchors/root-internal.pem`.
---
## 1. Naming and DNS
- The service name is `<service>.internal`, resolved by split-horizon DNS on the mesh.
**Never give it a public / Cloudflare record** — these names are mesh-only.
- The renewal unit is a systemd template instance, and `%i` can't contain dots cleanly,
so the **instance label is the dot-free short name** and the unit appends `.internal`:
instance `vlc-admin` → serves/renews `vlc-admin.internal`. Choose service short names
without dots.
## 2. Paths
Follow the established convention (shared with nginx):
| Path | Contents | Mode |
| --- | --- | --- |
| `/etc/nginx/tls/cert/<name>.internal.pem` | cert (chain) | `0644 root:root` |
| `/etc/nginx/tls/key/<name>.internal.pem` | private key | `0640 root:root`, `setfacl u:nginx:r` |
`setfacl -m u:nginx:r` on the key is required when nginx **workers** must read it
(e.g. a `proxy_ssl_certificate_key` for mTLS to an internal backend). For a plain
server cert the master (root) reads it at load time and the ACL is belt-and-suspenders —
set it anyway for consistency.
## 3. Renewal: the `step@` template
Renewal is autonomous via a templated unit pair, instantiated per service
(`step@<name>.timer`). The cert renews itself over mTLS (no provisioner needed once a
cert exists), and reloads nginx on success:
```ini
# /etc/systemd/system/step@.service
[Service]
Type=oneshot
ExecCondition=/usr/bin/step certificate needs-renewal /etc/nginx/tls/cert/%i.internal.pem
ExecStart=/usr/bin/step ca renew --force \
--ca-url https://ca.internal \
--root /etc/pki/ca-trust/source/anchors/root-internal.pem \
/etc/nginx/tls/cert/%i.internal.pem \
/etc/nginx/tls/key/%i.internal.pem
ExecStartPost=/usr/bin/systemctl reload nginx.service
```
```ini
# /etc/systemd/system/step@.timer
[Timer]
Persistent=true
OnCalendar=*:1/15 # every 15 min; certs are short-lived (24h)
RandomizedDelaySec=5m
[Install]
WantedBy=timers.target
```
Enable per service: `systemctl enable --now step@<name>.timer`. The `ExecCondition`
makes it a clean no-op until a cert exists, so enabling it before the first mint is
harmless.
## 4. Initial minting
The timer only **renews** an existing cert; the **first** cert is minted explicitly via
the JWK provisioner (`lair`). Mint it from the provisioning script (`infra-setup.sh`,
see `deployment-gitea-actions.md` §2) by shipping the provisioner password to the host
just long enough to issue the cert:
```sh
name=<service>
cert=/etc/nginx/tls/cert/${name}.internal.pem
key=/etc/nginx/tls/key/${name}.internal.pem
# Skip if already valid (verify checks chain/expiry, not the name).
state=$(ssh "$host" "[ -f $cert ] && step certificate verify $cert \
--roots /etc/pki/ca-trust/source/anchors/root-internal.pem >/dev/null 2>&1 \
&& echo valid || echo missing")
if [ "$state" != valid ]; then
# provisioner password lives at ~/.step/secrets/provisioner on the operator box
rsync -az --rsync-path='sudo rsync' --chmod=0600 \
~/.step/secrets/provisioner "$host:/tmp/${name}-provisioner"
ssh "$host" "
sudo mkdir -p /etc/nginx/tls/cert /etc/nginx/tls/key
rc=0
sudo step ca certificate --force \
--provisioner lair \
--provisioner-password-file /tmp/${name}-provisioner \
--ca-url https://ca.internal \
--root /etc/pki/ca-trust/source/anchors/root-internal.pem \
--san ${name}.internal \
${name}.internal $cert $key || rc=\$?
sudo rm -f /tmp/${name}-provisioner # always remove the credential
[ \$rc -eq 0 ] || { echo 'mint failed' >&2; exit \$rc; }
sudo chown root:root $cert $key
sudo chmod 644 $cert; sudo chmod 640 $key
sudo setfacl -m u:nginx:r $key"
fi
systemctl enable --now step@${name}.timer # on the host
```
Rules that matter:
- **Always pass `--san <name>.internal`.** Modern TLS clients ignore CN and require a
matching SAN; a CN-only cert fails with *"no alternative certificate subject name
matches target hostname"*.
- **Remove the provisioner password even on failure** (capture the exit code, `rm`,
then propagate). Never leave the credential on the host.
- The password file convention is `~/.step/secrets/provisioner` on the operator
workstation — the same one `deploy.sh`-style scripts use.
## 5. nginx wiring
```nginx
server {
listen 443 ssl;
server_name <name>.internal;
ssl_certificate /etc/nginx/tls/cert/<name>.internal.pem;
ssl_certificate_key /etc/nginx/tls/key/<name>.internal.pem;
# ... + proxy_ssl_certificate{,_key} with the same paths if the upstream wants mTLS
}
```
Clients verify against the internal root (`--cacert
/etc/pki/ca-trust/source/anchors/root-internal.pem`), which is already in the fleet
trust store, so browsers and `curl` on any mesh host trust it without extra flags.
## 6. Checklist for a new mesh service cert
1. Pick a dot-free short name; add split-horizon DNS `<name>.internal → <host>`
(no public record).
2. Mint the first cert with `--san <name>.internal` (§4), from `infra-setup.sh`.
3. `systemctl enable --now step@<name>.timer` for renewal.
4. Point the nginx vhost at the cert/key paths (§5); `nginx -t` && reload.