Add two new guidance documents alongside generic.md: - deployment-gitea-actions.md: CI-driven deployment via a Gitea Actions workflow as an alternative to deploy.sh + manifest.yml (§7), with the workflow as the source of infra truth and a scoped gitea_ci runner user. - internal-tls.md: provisioning and renewing per-service internal TLS certs (<service>.internal) for mesh-only nginx vhosts, extending the PKI conventions in §11. Cross-reference both from generic.md and list them in readme.md. Also add a "never suppress errors" rule to the deploy-script conventions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9.7 KiB
Deployment via Gitea Actions
An alternative to the local deploy.sh + manifest.yml flow in generic.md §6–§7.
Use this when deployment should be CI-driven — triggered by a push or a manual
dispatch, run on a Gitea Actions runner, and auditable in the Actions log — rather
than run by an operator from a workstation.
Both models coexist; pick per project:
deploy.sh(generic.md §7) — operator-driven, runs from a workstation with the operator's own ssh +passaccess. Good for tightly-held apps, one-off targets, or when no runner can reach the target hosts.- Gitea Actions (this doc) — runner-driven, secrets in Gitea, no operator in the
loop. Good for anything that should redeploy on merge to
mainand for fleets a runner can already reach over the WireGuard mesh.
The defining principle: the workflow is the source of infra truth. Hosts, ports,
paths, and component→host mapping live in the workflow YAML, not a manifest.yml.
There is no separate manifest in this model.
1. The deploy user: gitea_ci
The runner SSHes into each target as a dedicated gitea_ci system user — never
root, never the operator's account. On each target host gitea_ci has:
- a home dir (
/var/lib/gitea_ci) and~/.ssh/authorized_keyscontaining the runner's public key, - membership in
systemd-journal(so the workflow can capturejournalctl -u <unit>after a service start without a sudoers entry), - a scoped
/etc/sudoers.d/<app>_gitea_cidrop-in grantingNOPASSWDfor exactly the commands the deploy runs — nothing broader.
Name the sudoers file <app>_gitea_ci, not bare gitea_ci, so multiple apps can
drop their own files on a shared host without clobbering each other.
Scoped sudoers
Whitelist exact commands. For file pushes the workflow uses
rsync --rsync-path='sudo rsync', so the remote rsync runs as root; pin each line to
one destination with a trailing literal path:
gitea_ci ALL=(root) NOPASSWD: /usr/bin/rsync * /usr/local/bin/<app>
gitea_ci ALL=(root) NOPASSWD: /usr/bin/rsync * /etc/<app>/config.toml
gitea_ci ALL=(root) NOPASSWD: /usr/bin/systemctl restart <app>.service
gitea_ci ALL=(root) NOPASSWD: /usr/bin/systemctl daemon-reload
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/restorecon -R /usr/local/bin/<app> /etc/<app> /var/lib/<app>
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/semanage port -l
gitea_ci ALL=(root) NOPASSWD: /usr/sbin/semanage port -a -t http_port_t -p tcp 8081
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --add-service=<app> --permanent
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --add-service=<app>
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --query-service=<app>
gitea_ci ALL=(root) NOPASSWD: /usr/bin/firewall-cmd --reload
- The
*in an rsync line matches rsync's--server …argument vector across spaces; the trailing literal destination is what actually bounds the rule. - sudoers treats
:and=as reserved; escape them (\:,\=) inside command arguments orvisudorejects the file — common when whitelistingdnf config-manager addrepo --from-repofile=https://…. - Verify every drop-in with
visudo -cfat install time so a typo can't lock the host.
2. One-time host provisioning: script/infra-setup.sh
Everything gitea_ci needs is provisioned once per host by a
script/infra-setup.sh, run by the operator from a workstation (full sudo, not the
scoped account). It is idempotent and skips past unreachable hosts so one offline node
doesn't block the rest. It:
- creates the
gitea_ciuser (if missing) and its~/.ssh, - installs the runner's pubkey into
authorized_keys(rsync --chown gitea_ci:gitea_ci --chmod 0600 --rsync-path 'sudo rsync'), - adds
gitea_citosystemd-journal, - installs the host-appropriate
/etc/sudoers.d/<app>_gitea_cidrop-in andvisudo-verifies it, - provisions any per-service internal TLS cert the app's nginx vhost needs
(see
internal-tls.md).
The runner's keypair is generated once (ssh-keygen -t ed25519 -f ~/.ssh/id_gitea_ci);
the private key becomes the RSYNC_SSH_KEY Gitea secret, the public key is what
infra-setup.sh distributes.
Application config is not shipped by infra-setup.sh in this model — the workflow
renders it from Gitea secrets on every deploy (see §4). (deploy.sh-style apps that
keep config in pass are the §7 model instead.)
3. Artifact delivery: two variants
(a) RPM channel. A separate build-prerelease workflow builds, packages, signs, and
publishes RPMs to an internal repo (e.g. rpm.lair.cafe/unstable); the deploy workflow
just dnf install/upgrades them. Best when the app already ships as an RPM and the
build is heavy (CUDA, vendored deps). The deploy workflow triggers on
workflow_run: [build-prerelease] completed.
(b) Build-and-rsync. The deploy workflow's own build job produces the artifact
(e.g. a static binary + a static frontend bundle) and rsyncs it straight to the
targets. Best when there's no packaging step. Build for a statically-linked target
(e.g. musl) so a runner newer than the target host doesn't produce a binary the target's
older glibc can't load — see §6.
4. The workflow
on:
push: { branches: [main] } # redeploy on merge
workflow_dispatch: # manual re-run from the UI
concurrency: # serialize deploys; never half-apply two at once
group: deploy
cancel-in-progress: false
env:
# --- infra truth: hosts, ports, paths live here, not in a manifest ---
API_HOST: <host>.<site>.internal
API_PORT: "8081"
DEPLOY_KEY: |
${{ secrets.RSYNC_SSH_KEY }}
Jobs follow a build → deploy-per-component shape:
- build — runs the lint/test gate (
fmt,clippy -D warnings,test) so a broken commit never deploys, then produces and uploads the artifact(s). - deploy-<component> (
needs: build) — one job per component/host. Each:- writes the SSH key from
DEPLOY_KEY, thenssh gitea_ci@$HOST hostname -fas a reachability/auth check (StrictHostKeyChecking=accept-new); - renders config from secrets (use a literal substitution —
python3.replace()orenvsubst— so secrets with shell-special characters survive); rsyncs the artifact, config, systemd unit, and any firewalld/SELinux assets (--rsync-path='sudo rsync',--chown/--chmodto set ownership in transit,--mkpath— see §6);- applies system state over
sshwith the scopedsudocommands (sysusers,restorecon,semanage, firewalld,daemon-reload,restart); - health-probes (HTTP for an API,
systemctl is-activeotherwise); - captures the unit's startup journal with
if: always()so a failed start still leaves a usable record.
- writes the SSH key from
Secrets to expect in the repo settings: RSYNC_SSH_KEY plus the app's config secrets
(API keys, tokens, etc.). Non-secret values stay inline in env:.
5. Runner images
Deploy jobs run on a generic runner (e.g. runs-on: fedora-43); build jobs run on a
toolchain runner. Bake build dependencies into the runner image, not into the
workflow — a deploy workflow should never dnf install at run time (runners may run
unprivileged, and per-run installs are slow and flaky). If a build needs a tool the
image lacks, add it to the image (the gongfoo/images/* Containerfiles) and rebuild.
For static cross-compilation specifically, the runner's toolchain must include the
cross target's std. Where a distro's packaged compiler can't load a foreign std (e.g.
Fedora's distro rustc rejects any std it didn't build, and Fedora ships no musl std),
provision the toolchain via its own version manager (rustup + rustup target add …)
so compiler and std always match.
6. Gotchas worth pre-empting
These cost a deploy round-trip each; encode them up front.
rsyncwon't create a missing destination directory for a single-file copy. On Fedora,/etc/sysusers.dand/etc/firewalld/servicesdon't exist by default (only the/usr/libvariants ship). Add--mkpathto file pushes so the root-side rsync creates the parent.- firewalld only learns a freshly-shipped custom service after
--reload. Querying or adding it to the runtime before reloading failsINVALID_SERVICE. Order: rsync the XML →firewall-cmd --reload→--query-service→ (if absent)--add-service --permanent→--reload. - nginx
sites-enabledholds only symlinks. rsync vhost configs tosites-available/andln -sfnthem intosites-enabled/. (The include is often one level removed:nginx.conf→conf.d/*.conf→ asites-enabled.confthat doesinclude …/sites-enabled/*.conf;.)nginx -tbefore reload; a bad config left on disk also breaks unrelatedsystemctl reload nginxcalls (e.g. cert-renewal hooks). - glibc skew: a binary built on a newer host won't run on an older target. Build static (musl) or build on a runner no newer than the oldest target.
semanage port -aon an already-labelled port prints "already defined, modifying instead" and reassigns; guard withsemanage port -l | grepso re-runs are no-ops.Type=notifyunits must actually sendREADY=1(sd_notify) orsystemctl restartblocks untilTimeoutStartSec. The daemon sends it after it finishes binding.
7. Relationship to generic.md
This model reuses §8 (service accounts, hardened units), §9 (named firewalld services),
§10 (SELinux), and §11 (PKI/cert paths) unchanged — it only swaps how the assets get
onto the host (CI + gitea_ci + scoped sudo, instead of deploy.sh + operator sudo).
The asset/ layout from §6 is the same, minus manifest.yml. Per-service TLS for
mesh-only nginx vhosts is covered in internal-tls.md.