Add two new guidance documents alongside generic.md: - deployment-gitea-actions.md: CI-driven deployment via a Gitea Actions workflow as an alternative to deploy.sh + manifest.yml (§7), with the workflow as the source of infra truth and a scoped gitea_ci runner user. - internal-tls.md: provisioning and renewing per-service internal TLS certs (<service>.internal) for mesh-only nginx vhosts, extending the PKI conventions in §11. Cross-reference both from generic.md and list them in readme.md. Also add a "never suppress errors" rule to the deploy-script conventions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
150 lines
6.0 KiB
Markdown
150 lines
6.0 KiB
Markdown
# Internal TLS: per-service certs for mesh services
|
|
|
|
Extends `generic.md` §11 (TLS / PKI). That section covers the **host identity cert**
|
|
every host carries (`/etc/pki/tls/{misc,private}/$(hostname -f).pem`, kept fresh by
|
|
`step.service`). This doc covers the other common case: a **per-service vanity cert**
|
|
for an internal service reached by its own name on the WireGuard mesh — typically an
|
|
nginx vhost like `gongfoo.internal` or `vlc-admin.internal`.
|
|
|
|
Use this whenever a service is fronted by a `*.internal` name that differs from the
|
|
host's FQDN. Serving the host cert for a `vlc-admin.internal` request fails client
|
|
verification (the host cert's SAN is the host's FQDN, not the service name), so the
|
|
service needs its own cert.
|
|
|
|
All of this rides on the existing internal PKI: Smallstep `step-ca` at
|
|
`https://ca.internal`, internal root already trusted fleet-wide at
|
|
`/etc/pki/ca-trust/source/anchors/root-internal.pem`.
|
|
|
|
---
|
|
|
|
## 1. Naming and DNS
|
|
|
|
- The service name is `<service>.internal`, resolved by split-horizon DNS on the mesh.
|
|
**Never give it a public / Cloudflare record** — these names are mesh-only.
|
|
- The renewal unit is a systemd template instance, and `%i` can't contain dots cleanly,
|
|
so the **instance label is the dot-free short name** and the unit appends `.internal`:
|
|
instance `vlc-admin` → serves/renews `vlc-admin.internal`. Choose service short names
|
|
without dots.
|
|
|
|
## 2. Paths
|
|
|
|
Follow the established convention (shared with nginx):
|
|
|
|
| Path | Contents | Mode |
|
|
| --- | --- | --- |
|
|
| `/etc/nginx/tls/cert/<name>.internal.pem` | cert (chain) | `0644 root:root` |
|
|
| `/etc/nginx/tls/key/<name>.internal.pem` | private key | `0640 root:root`, `setfacl u:nginx:r` |
|
|
|
|
`setfacl -m u:nginx:r` on the key is required when nginx **workers** must read it
|
|
(e.g. a `proxy_ssl_certificate_key` for mTLS to an internal backend). For a plain
|
|
server cert the master (root) reads it at load time and the ACL is belt-and-suspenders —
|
|
set it anyway for consistency.
|
|
|
|
## 3. Renewal: the `step@` template
|
|
|
|
Renewal is autonomous via a templated unit pair, instantiated per service
|
|
(`step@<name>.timer`). The cert renews itself over mTLS (no provisioner needed once a
|
|
cert exists), and reloads nginx on success:
|
|
|
|
```ini
|
|
# /etc/systemd/system/step@.service
|
|
[Service]
|
|
Type=oneshot
|
|
ExecCondition=/usr/bin/step certificate needs-renewal /etc/nginx/tls/cert/%i.internal.pem
|
|
ExecStart=/usr/bin/step ca renew --force \
|
|
--ca-url https://ca.internal \
|
|
--root /etc/pki/ca-trust/source/anchors/root-internal.pem \
|
|
/etc/nginx/tls/cert/%i.internal.pem \
|
|
/etc/nginx/tls/key/%i.internal.pem
|
|
ExecStartPost=/usr/bin/systemctl reload nginx.service
|
|
```
|
|
|
|
```ini
|
|
# /etc/systemd/system/step@.timer
|
|
[Timer]
|
|
Persistent=true
|
|
OnCalendar=*:1/15 # every 15 min; certs are short-lived (24h)
|
|
RandomizedDelaySec=5m
|
|
[Install]
|
|
WantedBy=timers.target
|
|
```
|
|
|
|
Enable per service: `systemctl enable --now step@<name>.timer`. The `ExecCondition`
|
|
makes it a clean no-op until a cert exists, so enabling it before the first mint is
|
|
harmless.
|
|
|
|
## 4. Initial minting
|
|
|
|
The timer only **renews** an existing cert; the **first** cert is minted explicitly via
|
|
the JWK provisioner (`lair`). Mint it from the provisioning script (`infra-setup.sh`,
|
|
see `deployment-gitea-actions.md` §2) by shipping the provisioner password to the host
|
|
just long enough to issue the cert:
|
|
|
|
```sh
|
|
name=<service>
|
|
cert=/etc/nginx/tls/cert/${name}.internal.pem
|
|
key=/etc/nginx/tls/key/${name}.internal.pem
|
|
|
|
# Skip if already valid (verify checks chain/expiry, not the name).
|
|
state=$(ssh "$host" "[ -f $cert ] && step certificate verify $cert \
|
|
--roots /etc/pki/ca-trust/source/anchors/root-internal.pem >/dev/null 2>&1 \
|
|
&& echo valid || echo missing")
|
|
|
|
if [ "$state" != valid ]; then
|
|
# provisioner password lives at ~/.step/secrets/provisioner on the operator box
|
|
rsync -az --rsync-path='sudo rsync' --chmod=0600 \
|
|
~/.step/secrets/provisioner "$host:/tmp/${name}-provisioner"
|
|
ssh "$host" "
|
|
sudo mkdir -p /etc/nginx/tls/cert /etc/nginx/tls/key
|
|
rc=0
|
|
sudo step ca certificate --force \
|
|
--provisioner lair \
|
|
--provisioner-password-file /tmp/${name}-provisioner \
|
|
--ca-url https://ca.internal \
|
|
--root /etc/pki/ca-trust/source/anchors/root-internal.pem \
|
|
--san ${name}.internal \
|
|
${name}.internal $cert $key || rc=\$?
|
|
sudo rm -f /tmp/${name}-provisioner # always remove the credential
|
|
[ \$rc -eq 0 ] || { echo 'mint failed' >&2; exit \$rc; }
|
|
sudo chown root:root $cert $key
|
|
sudo chmod 644 $cert; sudo chmod 640 $key
|
|
sudo setfacl -m u:nginx:r $key"
|
|
fi
|
|
systemctl enable --now step@${name}.timer # on the host
|
|
```
|
|
|
|
Rules that matter:
|
|
|
|
- **Always pass `--san <name>.internal`.** Modern TLS clients ignore CN and require a
|
|
matching SAN; a CN-only cert fails with *"no alternative certificate subject name
|
|
matches target hostname"*.
|
|
- **Remove the provisioner password even on failure** (capture the exit code, `rm`,
|
|
then propagate). Never leave the credential on the host.
|
|
- The password file convention is `~/.step/secrets/provisioner` on the operator
|
|
workstation — the same one `deploy.sh`-style scripts use.
|
|
|
|
## 5. nginx wiring
|
|
|
|
```nginx
|
|
server {
|
|
listen 443 ssl;
|
|
server_name <name>.internal;
|
|
|
|
ssl_certificate /etc/nginx/tls/cert/<name>.internal.pem;
|
|
ssl_certificate_key /etc/nginx/tls/key/<name>.internal.pem;
|
|
# ... + proxy_ssl_certificate{,_key} with the same paths if the upstream wants mTLS
|
|
}
|
|
```
|
|
|
|
Clients verify against the internal root (`--cacert
|
|
/etc/pki/ca-trust/source/anchors/root-internal.pem`), which is already in the fleet
|
|
trust store, so browsers and `curl` on any mesh host trust it without extra flags.
|
|
|
|
## 6. Checklist for a new mesh service cert
|
|
|
|
1. Pick a dot-free short name; add split-horizon DNS `<name>.internal → <host>`
|
|
(no public record).
|
|
2. Mint the first cert with `--san <name>.internal` (§4), from `infra-setup.sh`.
|
|
3. `systemctl enable --now step@<name>.timer` for renewal.
|
|
4. Point the nginx vhost at the cert/key paths (§5); `nginx -t` && reload.
|