Files
architecture/deployment-gitea-actions.md
rob thijssen 200c41b4f1 docs: add CI deployment and internal-TLS guidance, cross-reference from generic
Add two new guidance documents alongside generic.md:

- deployment-gitea-actions.md: CI-driven deployment via a Gitea Actions
  workflow as an alternative to deploy.sh + manifest.yml (§7), with the
  workflow as the source of infra truth and a scoped gitea_ci runner user.
- internal-tls.md: provisioning and renewing per-service internal TLS
  certs (<service>.internal) for mesh-only nginx vhosts, extending the
  PKI conventions in §11.

Cross-reference both from generic.md and list them in readme.md. Also
add a "never suppress errors" rule to the deploy-script conventions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 15:43:18 +03:00

200 lines
9.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Deployment via Gitea Actions
An alternative to the local `deploy.sh` + `manifest.yml` flow in `generic.md` §6§7.
Use this when deployment should be **CI-driven** — triggered by a push or a manual
dispatch, run on a Gitea Actions runner, and auditable in the Actions log — rather
than run by an operator from a workstation.
Both models coexist; pick per project:
- **`deploy.sh` (generic.md §7)** — operator-driven, runs from a workstation with the
operator's own ssh + `pass` access. Good for tightly-held apps, one-off targets, or
when no runner can reach the target hosts.
- **Gitea Actions (this doc)** — runner-driven, secrets in Gitea, no operator in the
loop. Good for anything that should redeploy on merge to `main` and for fleets a
runner can already reach over the WireGuard mesh.
The defining principle: **the workflow is the source of infra truth.** Hosts, ports,
paths, and component→host mapping live in the workflow YAML, not a `manifest.yml`.
There is no separate manifest in this model.
---
## 1. The deploy user: `gitea_ci`
The runner SSHes into each target as a dedicated **`gitea_ci`** system user — never
root, never the operator's account. On each target host `gitea_ci` has:
- a home dir (`/var/lib/gitea_ci`) and `~/.ssh/authorized_keys` containing the
runner's public key,
- membership in `systemd-journal` (so the workflow can capture
`journalctl -u <unit>` after a service start without a sudoers entry),
- a **scoped** `/etc/sudoers.d/<app>_gitea_ci` drop-in granting `NOPASSWD` for
exactly the commands the deploy runs — nothing broader.
Name the sudoers file `<app>_gitea_ci`, not bare `gitea_ci`, so multiple apps can
drop their own files on a shared host without clobbering each other.
### Scoped sudoers
Whitelist exact commands. For file pushes the workflow uses
`rsync --rsync-path='sudo rsync'`, so the remote rsync runs as root; pin each line to
one destination with a trailing literal path:
```
gitea_ci ALL=(root) NOPASSWD: /usr/bin/rsync * /usr/local/bin/<app>
gitea_ci ALL=(root) NOPASSWD: /usr/bin/rsync * /etc/<app>/config.toml
gitea_ci ALL=(root) NOPASSWD: /usr/bin/systemctl restart <app>.service
gitea_ci ALL=(root) NOPASSWD: /usr/bin/systemctl daemon-reload
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/restorecon -R /usr/local/bin/<app> /etc/<app> /var/lib/<app>
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/semanage port -l
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/semanage port -a -t http_port_t -p tcp 8081
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --add-service=<app> --permanent
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --add-service=<app>
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --query-service=<app>
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --reload
```
- The `*` in an rsync line matches rsync's `--server …` argument vector across spaces;
the trailing literal destination is what actually bounds the rule.
- sudoers treats `:` and `=` as reserved; escape them (`\:`, `\=`) inside command
arguments or `visudo` rejects the file — common when whitelisting
`dnf config-manager addrepo --from-repofile=https://…`.
- Verify every drop-in with `visudo -cf` at install time so a typo can't lock the host.
---
## 2. One-time host provisioning: `script/infra-setup.sh`
Everything `gitea_ci` needs is provisioned **once per host** by a
`script/infra-setup.sh`, run by the operator from a workstation (full sudo, not the
scoped account). It is idempotent and skips past unreachable hosts so one offline node
doesn't block the rest. It:
1. creates the `gitea_ci` user (if missing) and its `~/.ssh`,
2. installs the runner's pubkey into `authorized_keys`
(`rsync --chown gitea_ci:gitea_ci --chmod 0600 --rsync-path 'sudo rsync'`),
3. adds `gitea_ci` to `systemd-journal`,
4. installs the host-appropriate `/etc/sudoers.d/<app>_gitea_ci` drop-in and
`visudo`-verifies it,
5. provisions any per-service internal TLS cert the app's nginx vhost needs
(see `internal-tls.md`).
The runner's keypair is generated once (`ssh-keygen -t ed25519 -f ~/.ssh/id_gitea_ci`);
the **private** key becomes the `RSYNC_SSH_KEY` Gitea secret, the public key is what
`infra-setup.sh` distributes.
Application config is **not** shipped by `infra-setup.sh` in this model — the workflow
renders it from Gitea secrets on every deploy (see §4). (`deploy.sh`-style apps that
keep config in `pass` are the §7 model instead.)
---
## 3. Artifact delivery: two variants
**(a) RPM channel.** A separate `build-prerelease` workflow builds, packages, signs, and
publishes RPMs to an internal repo (e.g. `rpm.lair.cafe/unstable`); the deploy workflow
just `dnf install/upgrade`s them. Best when the app already ships as an RPM and the
build is heavy (CUDA, vendored deps). The deploy workflow triggers on
`workflow_run: [build-prerelease] completed`.
**(b) Build-and-rsync.** The deploy workflow's own `build` job produces the artifact
(e.g. a static binary + a static frontend bundle) and `rsync`s it straight to the
targets. Best when there's no packaging step. Build for a **statically-linked target**
(e.g. musl) so a runner newer than the target host doesn't produce a binary the target's
older glibc can't load — see §6.
---
## 4. The workflow
```yaml
on:
push: { branches: [main] } # redeploy on merge
workflow_dispatch: # manual re-run from the UI
concurrency: # serialize deploys; never half-apply two at once
group: deploy
cancel-in-progress: false
env:
# --- infra truth: hosts, ports, paths live here, not in a manifest ---
API_HOST: <host>.<site>.internal
API_PORT: "8081"
DEPLOY_KEY: |
${{ secrets.RSYNC_SSH_KEY }}
```
Jobs follow a **build → deploy-per-component** shape:
- **build** — runs the lint/test gate (`fmt`, `clippy -D warnings`, `test`) so a broken
commit never deploys, then produces and uploads the artifact(s).
- **deploy-\<component\>** (`needs: build`) — one job per component/host. Each:
1. writes the SSH key from `DEPLOY_KEY`, then `ssh gitea_ci@$HOST hostname -f` as a
reachability/auth check (`StrictHostKeyChecking=accept-new`);
2. renders config from secrets (use a literal substitution — `python3` `.replace()` or
`envsubst` — so secrets with shell-special characters survive);
3. `rsync`s the artifact, config, systemd unit, and any firewalld/SELinux assets
(`--rsync-path='sudo rsync'`, `--chown`/`--chmod` to set ownership in transit,
`--mkpath` — see §6);
4. applies system state over `ssh` with the scoped `sudo` commands (sysusers,
`restorecon`, `semanage`, firewalld, `daemon-reload`, `restart`);
5. health-probes (HTTP for an API, `systemctl is-active` otherwise);
6. captures the unit's startup journal with `if: always()` so a failed start still
leaves a usable record.
Secrets to expect in the repo settings: `RSYNC_SSH_KEY` plus the app's config secrets
(API keys, tokens, etc.). Non-secret values stay inline in `env:`.
---
## 5. Runner images
Deploy jobs run on a generic runner (e.g. `runs-on: fedora-43`); build jobs run on a
toolchain runner. **Bake build dependencies into the runner image, not into the
workflow** — a deploy workflow should never `dnf install` at run time (runners may run
unprivileged, and per-run installs are slow and flaky). If a build needs a tool the
image lacks, add it to the image (the `gongfoo/images/*` Containerfiles) and rebuild.
For static cross-compilation specifically, the runner's toolchain must include the
cross target's std. Where a distro's packaged compiler can't load a foreign std (e.g.
Fedora's distro `rustc` rejects any std it didn't build, and Fedora ships no musl std),
provision the toolchain via its own version manager (`rustup` + `rustup target add …`)
so compiler and std always match.
---
## 6. Gotchas worth pre-empting
These cost a deploy round-trip each; encode them up front.
- **`rsync` won't create a missing destination directory** for a single-file copy. On
Fedora, `/etc/sysusers.d` and `/etc/firewalld/services` don't exist by default (only
the `/usr/lib` variants ship). Add `--mkpath` to file pushes so the root-side rsync
creates the parent.
- **firewalld only learns a freshly-shipped custom service after `--reload`.** Querying
or adding it to the runtime before reloading fails `INVALID_SERVICE`. Order:
rsync the XML → `firewall-cmd --reload``--query-service` → (if absent)
`--add-service --permanent``--reload`.
- **nginx `sites-enabled` holds only symlinks.** rsync vhost configs to
`sites-available/` and `ln -sfn` them into `sites-enabled/`. (The include is often
one level removed: `nginx.conf``conf.d/*.conf` → a `sites-enabled.conf` that does
`include …/sites-enabled/*.conf;`.) `nginx -t` before reload; a bad config left on
disk also breaks unrelated `systemctl reload nginx` calls (e.g. cert-renewal hooks).
- **glibc skew:** a binary built on a newer host won't run on an older target. Build
static (musl) or build on a runner no newer than the oldest target.
- **`semanage port -a` on an already-labelled port** prints "already defined, modifying
instead" and reassigns; guard with `semanage port -l | grep` so re-runs are no-ops.
- **`Type=notify` units** must actually send `READY=1` (`sd_notify`) or `systemctl
restart` blocks until `TimeoutStartSec`. The daemon sends it after it finishes binding.
---
## 7. Relationship to generic.md
This model reuses §8 (service accounts, hardened units), §9 (named firewalld services),
§10 (SELinux), and §11 (PKI/cert paths) unchanged — it only swaps *how* the assets get
onto the host (CI + `gitea_ci` + scoped sudo, instead of `deploy.sh` + operator sudo).
The `asset/` layout from §6 is the same, minus `manifest.yml`. Per-service TLS for
mesh-only nginx vhosts is covered in `internal-tls.md`.