Files

docs: add CI deployment and internal-TLS guidance, cross-reference from generic

Add two new guidance documents alongside generic.md:

- deployment-gitea-actions.md: CI-driven deployment via a Gitea Actions
  workflow as an alternative to deploy.sh + manifest.yml (§7), with the
  workflow as the source of infra truth and a scoped gitea_ci runner user.
- internal-tls.md: provisioning and renewing per-service internal TLS
  certs (<service>.internal) for mesh-only nginx vhosts, extending the
  PKI conventions in §11.

Cross-reference both from generic.md and list them in readme.md. Also
add a "never suppress errors" rule to the deploy-script conventions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-14 15:43:18 +03:00

9.7 KiB

Raw Blame History

Deployment via Gitea Actions

An alternative to the local deploy.sh + manifest.yml flow in generic.md §6–§7. Use this when deployment should be CI-driven — triggered by a push or a manual dispatch, run on a Gitea Actions runner, and auditable in the Actions log — rather than run by an operator from a workstation.

Both models coexist; pick per project:

deploy.sh (generic.md §7) — operator-driven, runs from a workstation with the operator's own ssh + pass access. Good for tightly-held apps, one-off targets, or when no runner can reach the target hosts.
Gitea Actions (this doc) — runner-driven, secrets in Gitea, no operator in the loop. Good for anything that should redeploy on merge to main and for fleets a runner can already reach over the WireGuard mesh.

The defining principle: the workflow is the source of infra truth. Hosts, ports, paths, and component→host mapping live in the workflow YAML, not a manifest.yml. There is no separate manifest in this model.

1. The deploy user: `gitea_ci`

The runner SSHes into each target as a dedicated gitea_ci system user — never root, never the operator's account. On each target host gitea_ci has:

a home dir (/var/lib/gitea_ci) and ~/.ssh/authorized_keys containing the runner's public key,
membership in systemd-journal (so the workflow can capture journalctl -u <unit> after a service start without a sudoers entry),
a scoped /etc/sudoers.d/<app>_gitea_ci drop-in granting NOPASSWD for exactly the commands the deploy runs — nothing broader.

Name the sudoers file <app>_gitea_ci, not bare gitea_ci, so multiple apps can drop their own files on a shared host without clobbering each other.

Scoped sudoers

Whitelist exact commands. For file pushes the workflow uses rsync --rsync-path='sudo rsync', so the remote rsync runs as root; pin each line to one destination with a trailing literal path:

gitea_ci ALL=(root) NOPASSWD: /usr/bin/rsync * /usr/local/bin/<app>
gitea_ci ALL=(root) NOPASSWD: /usr/bin/rsync * /etc/<app>/config.toml
gitea_ci ALL=(root) NOPASSWD: /usr/bin/systemctl restart <app>.service
gitea_ci ALL=(root) NOPASSWD: /usr/bin/systemctl daemon-reload
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/restorecon -R /usr/local/bin/<app> /etc/<app> /var/lib/<app>
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/semanage port -l
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/semanage port -a -t http_port_t -p tcp 8081
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --add-service=<app> --permanent
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --add-service=<app>
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --query-service=<app>
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --reload

The * in an rsync line matches rsync's --server … argument vector across spaces; the trailing literal destination is what actually bounds the rule.
sudoers treats : and = as reserved; escape them (\:, \=) inside command arguments or visudo rejects the file — common when whitelisting dnf config-manager addrepo --from-repofile=https://….
Verify every drop-in with visudo -cf at install time so a typo can't lock the host.

2. One-time host provisioning: `script/infra-setup.sh`

Everything gitea_ci needs is provisioned once per host by a script/infra-setup.sh, run by the operator from a workstation (full sudo, not the scoped account). It is idempotent and skips past unreachable hosts so one offline node doesn't block the rest. It:

creates the gitea_ci user (if missing) and its ~/.ssh,
installs the runner's pubkey into authorized_keys (rsync --chown gitea_ci:gitea_ci --chmod 0600 --rsync-path 'sudo rsync'),
adds gitea_ci to systemd-journal,
installs the host-appropriate /etc/sudoers.d/<app>_gitea_ci drop-in and visudo-verifies it,
provisions any per-service internal TLS cert the app's nginx vhost needs (see internal-tls.md).

The runner's keypair is generated once (ssh-keygen -t ed25519 -f ~/.ssh/id_gitea_ci); the private key becomes the RSYNC_SSH_KEY Gitea secret, the public key is what infra-setup.sh distributes.

Application config is not shipped by infra-setup.sh in this model — the workflow renders it from Gitea secrets on every deploy (see §4). (deploy.sh-style apps that keep config in pass are the §7 model instead.)

3. Artifact delivery: two variants

(a) RPM channel. A separate build-prerelease workflow builds, packages, signs, and publishes RPMs to an internal repo (e.g. rpm.lair.cafe/unstable); the deploy workflow just dnf install/upgrades them. Best when the app already ships as an RPM and the build is heavy (CUDA, vendored deps). The deploy workflow triggers on workflow_run: [build-prerelease] completed.

(b) Build-and-rsync. The deploy workflow's own build job produces the artifact (e.g. a static binary + a static frontend bundle) and rsyncs it straight to the targets. Best when there's no packaging step. Build for a statically-linked target (e.g. musl) so a runner newer than the target host doesn't produce a binary the target's older glibc can't load — see §6.

4. The workflow

on:
  push: { branches: [main] }      # redeploy on merge
  workflow_dispatch:              # manual re-run from the UI

concurrency:                      # serialize deploys; never half-apply two at once
  group: deploy
  cancel-in-progress: false

env:
  # --- infra truth: hosts, ports, paths live here, not in a manifest ---
  API_HOST: <host>.<site>.internal
  API_PORT: "8081"
  DEPLOY_KEY: |
    ${{ secrets.RSYNC_SSH_KEY }}

Jobs follow a build → deploy-per-component shape:

build — runs the lint/test gate (fmt, clippy -D warnings, test) so a broken commit never deploys, then produces and uploads the artifact(s).
deploy-<component> (needs: build) — one job per component/host. Each:
1. writes the SSH key from DEPLOY_KEY, then ssh gitea_ci@$HOST hostname -f as a reachability/auth check (StrictHostKeyChecking=accept-new);
2. renders config from secrets (use a literal substitution — python3 .replace() or envsubst — so secrets with shell-special characters survive);
3. rsyncs the artifact, config, systemd unit, and any firewalld/SELinux assets (--rsync-path='sudo rsync', --chown/--chmod to set ownership in transit, --mkpath — see §6);
4. applies system state over ssh with the scoped sudo commands (sysusers, restorecon, semanage, firewalld, daemon-reload, restart);
5. health-probes (HTTP for an API, systemctl is-active otherwise);
6. captures the unit's startup journal with if: always() so a failed start still leaves a usable record.

Secrets to expect in the repo settings: RSYNC_SSH_KEY plus the app's config secrets (API keys, tokens, etc.). Non-secret values stay inline in env:.

5. Runner images

Deploy jobs run on a generic runner (e.g. runs-on: fedora-43); build jobs run on a toolchain runner. Bake build dependencies into the runner image, not into the workflow — a deploy workflow should never dnf install at run time (runners may run unprivileged, and per-run installs are slow and flaky). If a build needs a tool the image lacks, add it to the image (the gongfoo/images/* Containerfiles) and rebuild.

For static cross-compilation specifically, the runner's toolchain must include the cross target's std. Where a distro's packaged compiler can't load a foreign std (e.g. Fedora's distro rustc rejects any std it didn't build, and Fedora ships no musl std), provision the toolchain via its own version manager (rustup + rustup target add …) so compiler and std always match.

6. Gotchas worth pre-empting

These cost a deploy round-trip each; encode them up front.

rsync won't create a missing destination directory for a single-file copy. On Fedora, /etc/sysusers.d and /etc/firewalld/services don't exist by default (only the /usr/lib variants ship). Add --mkpath to file pushes so the root-side rsync creates the parent.
firewalld only learns a freshly-shipped custom service after --reload. Querying or adding it to the runtime before reloading fails INVALID_SERVICE. Order: rsync the XML → firewall-cmd --reload → --query-service → (if absent) --add-service --permanent → --reload.
nginx sites-enabled holds only symlinks. rsync vhost configs to sites-available/ and ln -sfn them into sites-enabled/. (The include is often one level removed: nginx.conf → conf.d/*.conf → a sites-enabled.conf that does include …/sites-enabled/*.conf;.) nginx -t before reload; a bad config left on disk also breaks unrelated systemctl reload nginx calls (e.g. cert-renewal hooks).
glibc skew: a binary built on a newer host won't run on an older target. Build static (musl) or build on a runner no newer than the oldest target.
semanage port -a on an already-labelled port prints "already defined, modifying instead" and reassigns; guard with semanage port -l | grep so re-runs are no-ops.
Type=notify units must actually send READY=1 (sd_notify) or systemctl restart blocks until TimeoutStartSec. The daemon sends it after it finishes binding.

7. Relationship to generic.md

This model reuses §8 (service accounts, hardened units), §9 (named firewalld services), §10 (SELinux), and §11 (PKI/cert paths) unchanged — it only swaps how the assets get onto the host (CI + gitea_ci + scoped sudo, instead of deploy.sh + operator sudo). The asset/ layout from §6 is the same, minus manifest.yml. Per-service TLS for mesh-only nginx vhosts is covered in internal-tls.md.

9.7 KiB Raw Blame History Unescape Escape