docs: add CI deployment and internal-TLS guidance, cross-reference from generic
Add two new guidance documents alongside generic.md: - deployment-gitea-actions.md: CI-driven deployment via a Gitea Actions workflow as an alternative to deploy.sh + manifest.yml (§7), with the workflow as the source of infra truth and a scoped gitea_ci runner user. - internal-tls.md: provisioning and renewing per-service internal TLS certs (<service>.internal) for mesh-only nginx vhosts, extending the PKI conventions in §11. Cross-reference both from generic.md and list them in readme.md. Also add a "never suppress errors" rule to the deploy-script conventions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
199
deployment-gitea-actions.md
Normal file
199
deployment-gitea-actions.md
Normal file
@@ -0,0 +1,199 @@
|
|||||||
|
# Deployment via Gitea Actions
|
||||||
|
|
||||||
|
An alternative to the local `deploy.sh` + `manifest.yml` flow in `generic.md` §6–§7.
|
||||||
|
Use this when deployment should be **CI-driven** — triggered by a push or a manual
|
||||||
|
dispatch, run on a Gitea Actions runner, and auditable in the Actions log — rather
|
||||||
|
than run by an operator from a workstation.
|
||||||
|
|
||||||
|
Both models coexist; pick per project:
|
||||||
|
|
||||||
|
- **`deploy.sh` (generic.md §7)** — operator-driven, runs from a workstation with the
|
||||||
|
operator's own ssh + `pass` access. Good for tightly-held apps, one-off targets, or
|
||||||
|
when no runner can reach the target hosts.
|
||||||
|
- **Gitea Actions (this doc)** — runner-driven, secrets in Gitea, no operator in the
|
||||||
|
loop. Good for anything that should redeploy on merge to `main` and for fleets a
|
||||||
|
runner can already reach over the WireGuard mesh.
|
||||||
|
|
||||||
|
The defining principle: **the workflow is the source of infra truth.** Hosts, ports,
|
||||||
|
paths, and component→host mapping live in the workflow YAML, not a `manifest.yml`.
|
||||||
|
There is no separate manifest in this model.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. The deploy user: `gitea_ci`
|
||||||
|
|
||||||
|
The runner SSHes into each target as a dedicated **`gitea_ci`** system user — never
|
||||||
|
root, never the operator's account. On each target host `gitea_ci` has:
|
||||||
|
|
||||||
|
- a home dir (`/var/lib/gitea_ci`) and `~/.ssh/authorized_keys` containing the
|
||||||
|
runner's public key,
|
||||||
|
- membership in `systemd-journal` (so the workflow can capture
|
||||||
|
`journalctl -u <unit>` after a service start without a sudoers entry),
|
||||||
|
- a **scoped** `/etc/sudoers.d/<app>_gitea_ci` drop-in granting `NOPASSWD` for
|
||||||
|
exactly the commands the deploy runs — nothing broader.
|
||||||
|
|
||||||
|
Name the sudoers file `<app>_gitea_ci`, not bare `gitea_ci`, so multiple apps can
|
||||||
|
drop their own files on a shared host without clobbering each other.
|
||||||
|
|
||||||
|
### Scoped sudoers
|
||||||
|
|
||||||
|
Whitelist exact commands. For file pushes the workflow uses
|
||||||
|
`rsync --rsync-path='sudo rsync'`, so the remote rsync runs as root; pin each line to
|
||||||
|
one destination with a trailing literal path:
|
||||||
|
|
||||||
|
```
|
||||||
|
gitea_ci ALL=(root) NOPASSWD: /usr/bin/rsync * /usr/local/bin/<app>
|
||||||
|
gitea_ci ALL=(root) NOPASSWD: /usr/bin/rsync * /etc/<app>/config.toml
|
||||||
|
gitea_ci ALL=(root) NOPASSWD: /usr/bin/systemctl restart <app>.service
|
||||||
|
gitea_ci ALL=(root) NOPASSWD: /usr/bin/systemctl daemon-reload
|
||||||
|
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/restorecon -R /usr/local/bin/<app> /etc/<app> /var/lib/<app>
|
||||||
|
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/semanage port -l
|
||||||
|
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/semanage port -a -t http_port_t -p tcp 8081
|
||||||
|
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --add-service=<app> --permanent
|
||||||
|
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --add-service=<app>
|
||||||
|
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --query-service=<app>
|
||||||
|
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --reload
|
||||||
|
```
|
||||||
|
|
||||||
|
- The `*` in an rsync line matches rsync's `--server …` argument vector across spaces;
|
||||||
|
the trailing literal destination is what actually bounds the rule.
|
||||||
|
- sudoers treats `:` and `=` as reserved; escape them (`\:`, `\=`) inside command
|
||||||
|
arguments or `visudo` rejects the file — common when whitelisting
|
||||||
|
`dnf config-manager addrepo --from-repofile=https://…`.
|
||||||
|
- Verify every drop-in with `visudo -cf` at install time so a typo can't lock the host.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. One-time host provisioning: `script/infra-setup.sh`
|
||||||
|
|
||||||
|
Everything `gitea_ci` needs is provisioned **once per host** by a
|
||||||
|
`script/infra-setup.sh`, run by the operator from a workstation (full sudo, not the
|
||||||
|
scoped account). It is idempotent and skips past unreachable hosts so one offline node
|
||||||
|
doesn't block the rest. It:
|
||||||
|
|
||||||
|
1. creates the `gitea_ci` user (if missing) and its `~/.ssh`,
|
||||||
|
2. installs the runner's pubkey into `authorized_keys`
|
||||||
|
(`rsync --chown gitea_ci:gitea_ci --chmod 0600 --rsync-path 'sudo rsync'`),
|
||||||
|
3. adds `gitea_ci` to `systemd-journal`,
|
||||||
|
4. installs the host-appropriate `/etc/sudoers.d/<app>_gitea_ci` drop-in and
|
||||||
|
`visudo`-verifies it,
|
||||||
|
5. provisions any per-service internal TLS cert the app's nginx vhost needs
|
||||||
|
(see `internal-tls.md`).
|
||||||
|
|
||||||
|
The runner's keypair is generated once (`ssh-keygen -t ed25519 -f ~/.ssh/id_gitea_ci`);
|
||||||
|
the **private** key becomes the `RSYNC_SSH_KEY` Gitea secret, the public key is what
|
||||||
|
`infra-setup.sh` distributes.
|
||||||
|
|
||||||
|
Application config is **not** shipped by `infra-setup.sh` in this model — the workflow
|
||||||
|
renders it from Gitea secrets on every deploy (see §4). (`deploy.sh`-style apps that
|
||||||
|
keep config in `pass` are the §7 model instead.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Artifact delivery: two variants
|
||||||
|
|
||||||
|
**(a) RPM channel.** A separate `build-prerelease` workflow builds, packages, signs, and
|
||||||
|
publishes RPMs to an internal repo (e.g. `rpm.lair.cafe/unstable`); the deploy workflow
|
||||||
|
just `dnf install/upgrade`s them. Best when the app already ships as an RPM and the
|
||||||
|
build is heavy (CUDA, vendored deps). The deploy workflow triggers on
|
||||||
|
`workflow_run: [build-prerelease] completed`.
|
||||||
|
|
||||||
|
**(b) Build-and-rsync.** The deploy workflow's own `build` job produces the artifact
|
||||||
|
(e.g. a static binary + a static frontend bundle) and `rsync`s it straight to the
|
||||||
|
targets. Best when there's no packaging step. Build for a **statically-linked target**
|
||||||
|
(e.g. musl) so a runner newer than the target host doesn't produce a binary the target's
|
||||||
|
older glibc can't load — see §6.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. The workflow
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
on:
|
||||||
|
push: { branches: [main] } # redeploy on merge
|
||||||
|
workflow_dispatch: # manual re-run from the UI
|
||||||
|
|
||||||
|
concurrency: # serialize deploys; never half-apply two at once
|
||||||
|
group: deploy
|
||||||
|
cancel-in-progress: false
|
||||||
|
|
||||||
|
env:
|
||||||
|
# --- infra truth: hosts, ports, paths live here, not in a manifest ---
|
||||||
|
API_HOST: <host>.<site>.internal
|
||||||
|
API_PORT: "8081"
|
||||||
|
DEPLOY_KEY: |
|
||||||
|
${{ secrets.RSYNC_SSH_KEY }}
|
||||||
|
```
|
||||||
|
|
||||||
|
Jobs follow a **build → deploy-per-component** shape:
|
||||||
|
|
||||||
|
- **build** — runs the lint/test gate (`fmt`, `clippy -D warnings`, `test`) so a broken
|
||||||
|
commit never deploys, then produces and uploads the artifact(s).
|
||||||
|
- **deploy-\<component\>** (`needs: build`) — one job per component/host. Each:
|
||||||
|
1. writes the SSH key from `DEPLOY_KEY`, then `ssh gitea_ci@$HOST hostname -f` as a
|
||||||
|
reachability/auth check (`StrictHostKeyChecking=accept-new`);
|
||||||
|
2. renders config from secrets (use a literal substitution — `python3` `.replace()` or
|
||||||
|
`envsubst` — so secrets with shell-special characters survive);
|
||||||
|
3. `rsync`s the artifact, config, systemd unit, and any firewalld/SELinux assets
|
||||||
|
(`--rsync-path='sudo rsync'`, `--chown`/`--chmod` to set ownership in transit,
|
||||||
|
`--mkpath` — see §6);
|
||||||
|
4. applies system state over `ssh` with the scoped `sudo` commands (sysusers,
|
||||||
|
`restorecon`, `semanage`, firewalld, `daemon-reload`, `restart`);
|
||||||
|
5. health-probes (HTTP for an API, `systemctl is-active` otherwise);
|
||||||
|
6. captures the unit's startup journal with `if: always()` so a failed start still
|
||||||
|
leaves a usable record.
|
||||||
|
|
||||||
|
Secrets to expect in the repo settings: `RSYNC_SSH_KEY` plus the app's config secrets
|
||||||
|
(API keys, tokens, etc.). Non-secret values stay inline in `env:`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Runner images
|
||||||
|
|
||||||
|
Deploy jobs run on a generic runner (e.g. `runs-on: fedora-43`); build jobs run on a
|
||||||
|
toolchain runner. **Bake build dependencies into the runner image, not into the
|
||||||
|
workflow** — a deploy workflow should never `dnf install` at run time (runners may run
|
||||||
|
unprivileged, and per-run installs are slow and flaky). If a build needs a tool the
|
||||||
|
image lacks, add it to the image (the `gongfoo/images/*` Containerfiles) and rebuild.
|
||||||
|
|
||||||
|
For static cross-compilation specifically, the runner's toolchain must include the
|
||||||
|
cross target's std. Where a distro's packaged compiler can't load a foreign std (e.g.
|
||||||
|
Fedora's distro `rustc` rejects any std it didn't build, and Fedora ships no musl std),
|
||||||
|
provision the toolchain via its own version manager (`rustup` + `rustup target add …`)
|
||||||
|
so compiler and std always match.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Gotchas worth pre-empting
|
||||||
|
|
||||||
|
These cost a deploy round-trip each; encode them up front.
|
||||||
|
|
||||||
|
- **`rsync` won't create a missing destination directory** for a single-file copy. On
|
||||||
|
Fedora, `/etc/sysusers.d` and `/etc/firewalld/services` don't exist by default (only
|
||||||
|
the `/usr/lib` variants ship). Add `--mkpath` to file pushes so the root-side rsync
|
||||||
|
creates the parent.
|
||||||
|
- **firewalld only learns a freshly-shipped custom service after `--reload`.** Querying
|
||||||
|
or adding it to the runtime before reloading fails `INVALID_SERVICE`. Order:
|
||||||
|
rsync the XML → `firewall-cmd --reload` → `--query-service` → (if absent)
|
||||||
|
`--add-service --permanent` → `--reload`.
|
||||||
|
- **nginx `sites-enabled` holds only symlinks.** rsync vhost configs to
|
||||||
|
`sites-available/` and `ln -sfn` them into `sites-enabled/`. (The include is often
|
||||||
|
one level removed: `nginx.conf` → `conf.d/*.conf` → a `sites-enabled.conf` that does
|
||||||
|
`include …/sites-enabled/*.conf;`.) `nginx -t` before reload; a bad config left on
|
||||||
|
disk also breaks unrelated `systemctl reload nginx` calls (e.g. cert-renewal hooks).
|
||||||
|
- **glibc skew:** a binary built on a newer host won't run on an older target. Build
|
||||||
|
static (musl) or build on a runner no newer than the oldest target.
|
||||||
|
- **`semanage port -a` on an already-labelled port** prints "already defined, modifying
|
||||||
|
instead" and reassigns; guard with `semanage port -l | grep` so re-runs are no-ops.
|
||||||
|
- **`Type=notify` units** must actually send `READY=1` (`sd_notify`) or `systemctl
|
||||||
|
restart` blocks until `TimeoutStartSec`. The daemon sends it after it finishes binding.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Relationship to generic.md
|
||||||
|
|
||||||
|
This model reuses §8 (service accounts, hardened units), §9 (named firewalld services),
|
||||||
|
§10 (SELinux), and §11 (PKI/cert paths) unchanged — it only swaps *how* the assets get
|
||||||
|
onto the host (CI + `gitea_ci` + scoped sudo, instead of `deploy.sh` + operator sudo).
|
||||||
|
The `asset/` layout from §6 is the same, minus `manifest.yml`. Per-service TLS for
|
||||||
|
mesh-only nginx vhosts is covered in `internal-tls.md`.
|
||||||
15
generic.md
15
generic.md
@@ -279,6 +279,14 @@ Config file templates use a simple `{{VAR_NAME}}` syntax. `deploy.sh` substitute
|
|||||||
|
|
||||||
## 7. Deployment Script (`script/deploy.sh`)
|
## 7. Deployment Script (`script/deploy.sh`)
|
||||||
|
|
||||||
|
> **Alternative: CI-driven deployment.** When a project should redeploy from a Gitea
|
||||||
|
> Actions workflow instead of an operator running `deploy.sh`, see
|
||||||
|
> **`deployment-gitea-actions.md`**. In that model the workflow itself is the source of
|
||||||
|
> infra truth (no `manifest.yml`), the runner SSHes in as a dedicated `gitea_ci` user
|
||||||
|
> with a scoped sudoers drop-in, and one-time host prep lives in
|
||||||
|
> `script/infra-setup.sh`. The `asset/` layout (§6) and the on-host conventions
|
||||||
|
> (§8–§11) are otherwise identical. The two models coexist; pick per project.
|
||||||
|
|
||||||
A bash script with a stable CLI:
|
A bash script with a stable CLI:
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -311,6 +319,7 @@ A bash script with a stable CLI:
|
|||||||
- Quiet on success, loud on failure.
|
- Quiet on success, loud on failure.
|
||||||
- Supports `--dry-run` to print what would happen.
|
- Supports `--dry-run` to print what would happen.
|
||||||
- Never writes secrets to disk on the build host outside of the rendered template being rsynced.
|
- Never writes secrets to disk on the build host outside of the rendered template being rsynced.
|
||||||
|
- **Never suppress errors.** Do not use `2>/dev/null`, `|| true`, or any pattern that hides error output or swallows exit codes. If a command might fail legitimately (e.g. stopping a service that isn't installed yet on first deploy), handle the failure explicitly with a visible message (e.g. `cmd || info "service was not running"`). If a command shouldn't fail, let it fail loudly — `set -euo pipefail` will catch it. This rule applies to all shell scripts in the project, not just `deploy.sh`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -520,6 +529,12 @@ The service unit itself needs an `ExecReload=` that causes the daemon to re-read
|
|||||||
|
|
||||||
Ship these `.path` and cert-reload `.service` units from `asset/systemd/` the same way as the main unit.
|
Ship these `.path` and cert-reload `.service` units from `asset/systemd/` the same way as the main unit.
|
||||||
|
|
||||||
|
**Per-service internal certs.** The paths above are the host *identity* cert. A service
|
||||||
|
fronted by its own mesh name (e.g. an nginx vhost at `<service>.internal`, distinct from
|
||||||
|
the host FQDN) needs its own cert — minted via the `lair` provisioner and renewed by a
|
||||||
|
templated `step@<name>` unit. That pattern is documented separately in
|
||||||
|
**`internal-tls.md`**.
|
||||||
|
|
||||||
### Ingress
|
### Ingress
|
||||||
- Per-site nginx reverse proxy terminates all WAN inbound 443.
|
- Per-site nginx reverse proxy terminates all WAN inbound 443.
|
||||||
- Public DNS via Cloudflare, **unproxied by default** (CF's mTLS origin-pull has been unreliable). Revisit if/when that changes.
|
- Public DNS via Cloudflare, **unproxied by default** (CF's mTLS origin-pull has been unreliable). Revisit if/when that changes.
|
||||||
|
|||||||
149
internal-tls.md
Normal file
149
internal-tls.md
Normal file
@@ -0,0 +1,149 @@
|
|||||||
|
# Internal TLS: per-service certs for mesh services
|
||||||
|
|
||||||
|
Extends `generic.md` §11 (TLS / PKI). That section covers the **host identity cert**
|
||||||
|
every host carries (`/etc/pki/tls/{misc,private}/$(hostname -f).pem`, kept fresh by
|
||||||
|
`step.service`). This doc covers the other common case: a **per-service vanity cert**
|
||||||
|
for an internal service reached by its own name on the WireGuard mesh — typically an
|
||||||
|
nginx vhost like `gongfoo.internal` or `vlc-admin.internal`.
|
||||||
|
|
||||||
|
Use this whenever a service is fronted by a `*.internal` name that differs from the
|
||||||
|
host's FQDN. Serving the host cert for a `vlc-admin.internal` request fails client
|
||||||
|
verification (the host cert's SAN is the host's FQDN, not the service name), so the
|
||||||
|
service needs its own cert.
|
||||||
|
|
||||||
|
All of this rides on the existing internal PKI: Smallstep `step-ca` at
|
||||||
|
`https://ca.internal`, internal root already trusted fleet-wide at
|
||||||
|
`/etc/pki/ca-trust/source/anchors/root-internal.pem`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Naming and DNS
|
||||||
|
|
||||||
|
- The service name is `<service>.internal`, resolved by split-horizon DNS on the mesh.
|
||||||
|
**Never give it a public / Cloudflare record** — these names are mesh-only.
|
||||||
|
- The renewal unit is a systemd template instance, and `%i` can't contain dots cleanly,
|
||||||
|
so the **instance label is the dot-free short name** and the unit appends `.internal`:
|
||||||
|
instance `vlc-admin` → serves/renews `vlc-admin.internal`. Choose service short names
|
||||||
|
without dots.
|
||||||
|
|
||||||
|
## 2. Paths
|
||||||
|
|
||||||
|
Follow the established convention (shared with nginx):
|
||||||
|
|
||||||
|
| Path | Contents | Mode |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `/etc/nginx/tls/cert/<name>.internal.pem` | cert (chain) | `0644 root:root` |
|
||||||
|
| `/etc/nginx/tls/key/<name>.internal.pem` | private key | `0640 root:root`, `setfacl u:nginx:r` |
|
||||||
|
|
||||||
|
`setfacl -m u:nginx:r` on the key is required when nginx **workers** must read it
|
||||||
|
(e.g. a `proxy_ssl_certificate_key` for mTLS to an internal backend). For a plain
|
||||||
|
server cert the master (root) reads it at load time and the ACL is belt-and-suspenders —
|
||||||
|
set it anyway for consistency.
|
||||||
|
|
||||||
|
## 3. Renewal: the `step@` template
|
||||||
|
|
||||||
|
Renewal is autonomous via a templated unit pair, instantiated per service
|
||||||
|
(`step@<name>.timer`). The cert renews itself over mTLS (no provisioner needed once a
|
||||||
|
cert exists), and reloads nginx on success:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# /etc/systemd/system/step@.service
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
ExecCondition=/usr/bin/step certificate needs-renewal /etc/nginx/tls/cert/%i.internal.pem
|
||||||
|
ExecStart=/usr/bin/step ca renew --force \
|
||||||
|
--ca-url https://ca.internal \
|
||||||
|
--root /etc/pki/ca-trust/source/anchors/root-internal.pem \
|
||||||
|
/etc/nginx/tls/cert/%i.internal.pem \
|
||||||
|
/etc/nginx/tls/key/%i.internal.pem
|
||||||
|
ExecStartPost=/usr/bin/systemctl reload nginx.service
|
||||||
|
```
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# /etc/systemd/system/step@.timer
|
||||||
|
[Timer]
|
||||||
|
Persistent=true
|
||||||
|
OnCalendar=*:1/15 # every 15 min; certs are short-lived (24h)
|
||||||
|
RandomizedDelaySec=5m
|
||||||
|
[Install]
|
||||||
|
WantedBy=timers.target
|
||||||
|
```
|
||||||
|
|
||||||
|
Enable per service: `systemctl enable --now step@<name>.timer`. The `ExecCondition`
|
||||||
|
makes it a clean no-op until a cert exists, so enabling it before the first mint is
|
||||||
|
harmless.
|
||||||
|
|
||||||
|
## 4. Initial minting
|
||||||
|
|
||||||
|
The timer only **renews** an existing cert; the **first** cert is minted explicitly via
|
||||||
|
the JWK provisioner (`lair`). Mint it from the provisioning script (`infra-setup.sh`,
|
||||||
|
see `deployment-gitea-actions.md` §2) by shipping the provisioner password to the host
|
||||||
|
just long enough to issue the cert:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
name=<service>
|
||||||
|
cert=/etc/nginx/tls/cert/${name}.internal.pem
|
||||||
|
key=/etc/nginx/tls/key/${name}.internal.pem
|
||||||
|
|
||||||
|
# Skip if already valid (verify checks chain/expiry, not the name).
|
||||||
|
state=$(ssh "$host" "[ -f $cert ] && step certificate verify $cert \
|
||||||
|
--roots /etc/pki/ca-trust/source/anchors/root-internal.pem >/dev/null 2>&1 \
|
||||||
|
&& echo valid || echo missing")
|
||||||
|
|
||||||
|
if [ "$state" != valid ]; then
|
||||||
|
# provisioner password lives at ~/.step/secrets/provisioner on the operator box
|
||||||
|
rsync -az --rsync-path='sudo rsync' --chmod=0600 \
|
||||||
|
~/.step/secrets/provisioner "$host:/tmp/${name}-provisioner"
|
||||||
|
ssh "$host" "
|
||||||
|
sudo mkdir -p /etc/nginx/tls/cert /etc/nginx/tls/key
|
||||||
|
rc=0
|
||||||
|
sudo step ca certificate --force \
|
||||||
|
--provisioner lair \
|
||||||
|
--provisioner-password-file /tmp/${name}-provisioner \
|
||||||
|
--ca-url https://ca.internal \
|
||||||
|
--root /etc/pki/ca-trust/source/anchors/root-internal.pem \
|
||||||
|
--san ${name}.internal \
|
||||||
|
${name}.internal $cert $key || rc=\$?
|
||||||
|
sudo rm -f /tmp/${name}-provisioner # always remove the credential
|
||||||
|
[ \$rc -eq 0 ] || { echo 'mint failed' >&2; exit \$rc; }
|
||||||
|
sudo chown root:root $cert $key
|
||||||
|
sudo chmod 644 $cert; sudo chmod 640 $key
|
||||||
|
sudo setfacl -m u:nginx:r $key"
|
||||||
|
fi
|
||||||
|
systemctl enable --now step@${name}.timer # on the host
|
||||||
|
```
|
||||||
|
|
||||||
|
Rules that matter:
|
||||||
|
|
||||||
|
- **Always pass `--san <name>.internal`.** Modern TLS clients ignore CN and require a
|
||||||
|
matching SAN; a CN-only cert fails with *"no alternative certificate subject name
|
||||||
|
matches target hostname"*.
|
||||||
|
- **Remove the provisioner password even on failure** (capture the exit code, `rm`,
|
||||||
|
then propagate). Never leave the credential on the host.
|
||||||
|
- The password file convention is `~/.step/secrets/provisioner` on the operator
|
||||||
|
workstation — the same one `deploy.sh`-style scripts use.
|
||||||
|
|
||||||
|
## 5. nginx wiring
|
||||||
|
|
||||||
|
```nginx
|
||||||
|
server {
|
||||||
|
listen 443 ssl;
|
||||||
|
server_name <name>.internal;
|
||||||
|
|
||||||
|
ssl_certificate /etc/nginx/tls/cert/<name>.internal.pem;
|
||||||
|
ssl_certificate_key /etc/nginx/tls/key/<name>.internal.pem;
|
||||||
|
# ... + proxy_ssl_certificate{,_key} with the same paths if the upstream wants mTLS
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Clients verify against the internal root (`--cacert
|
||||||
|
/etc/pki/ca-trust/source/anchors/root-internal.pem`), which is already in the fleet
|
||||||
|
trust store, so browsers and `curl` on any mesh host trust it without extra flags.
|
||||||
|
|
||||||
|
## 6. Checklist for a new mesh service cert
|
||||||
|
|
||||||
|
1. Pick a dot-free short name; add split-horizon DNS `<name>.internal → <host>`
|
||||||
|
(no public record).
|
||||||
|
2. Mint the first cert with `--san <name>.internal` (§4), from `infra-setup.sh`.
|
||||||
|
3. `systemctl enable --now step@<name>.timer` for renewal.
|
||||||
|
4. Point the nginx vhost at the cert/key paths (§5); `nginx -t` && reload.
|
||||||
@@ -11,6 +11,8 @@ The goal is boring consistency: the same crate layout, the same deploy flow, the
|
|||||||
## What's here
|
## What's here
|
||||||
|
|
||||||
- **`generic.md`** — the baseline. Applies to every project unless that project explicitly overrides a section. Covers workspace layout, separation of concerns, configuration, secrets, deployment, service accounts, firewalld, SELinux, and code quality.
|
- **`generic.md`** — the baseline. Applies to every project unless that project explicitly overrides a section. Covers workspace layout, separation of concerns, configuration, secrets, deployment, service accounts, firewalld, SELinux, and code quality.
|
||||||
|
- **`deployment-gitea-actions.md`** — CI-driven deployment via a Gitea Actions workflow, as an alternative to the `deploy.sh` + `manifest.yml` flow in `generic.md` §7. The workflow is the source of infra truth; the runner deploys as a scoped `gitea_ci` user.
|
||||||
|
- **`internal-tls.md`** — provisioning and renewing per-service internal TLS certs (`<service>.internal`) for mesh-only nginx vhosts, extending the PKI conventions in `generic.md` §11.
|
||||||
|
|
||||||
More files will appear here over time as guidance that's more specific than `generic.md` gets extracted — per-stack, per-deployment-target, or per-problem-domain documents. When a project needs guidance that isn't generic, it belongs in a new file here, not buried in one project's repo.
|
More files will appear here over time as guidance that's more specific than `generic.md` gets extracted — per-stack, per-deployment-target, or per-problem-domain documents. When a project needs guidance that isn't generic, it belongs in a new file here, not buried in one project's repo.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user