feat(deploy): manifest-driven config, teardown + db-perms, hardening

deploy.sh:
- never rsync into /; stage to /tmp on the remote and install at final
  paths via sudo bash heredoc, closing the parent-dir attribute leak
  that broke three hosts in the earlier rsync incident
- shell-quote heredoc args via ${var@Q}
- drop -A -X on the remaining (web) rsyncs
- generic worker.secrets loop reads (env-var → pass path) from manifest;
  GITEA_TOKEN now flows through automatically
- in-memory bash substitution for templates (secrets never on argv)
- simplify semanage port labelling: --add 2>/dev/null || --modify (the
  old grep pre-check matched only the first listed port)
- restorecon back to short flags (Fedora policycoreutils has no long
  forms; --recursive errored at deploy time)
- quieter health probe loop: curl diagnostics only on final failure

manifest as source of truth:
- api.config.bind drives BIND_ADDR, firewalld port, semanage label,
  health-probe URL
- web.config.{server_name,root,api_upstream} drives nginx render,
  rsync targets, restorecon scope
- nginx config renamed to site.conf.tmpl; firewalld svc to
  moments-api.xml.tmpl; both rendered at deploy time
- topology flip: api → nikola, worker → frootmig (anjie freed)

new scripts:
- script/teardown.sh: idempotent component teardown, never rsyncs,
  shared-state cleanup gated on absence of remaining env files,
  --remove-docroot guard against shallow / system paths
- script/db-perms.sh: rewritten — fixes grep/append role mismatch that
  appended duplicates on re-run, adds postgres reload, hits primary +
  standby in a single invocation

readme: genericized; deployment topology no longer carries real host
or site names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-04 16:39:10 +03:00
parent f30f949895
commit 8867ff5df3
9 changed files with 643 additions and 181 deletions

View File

@@ -1,6 +1,6 @@
# moments
Personal activity timeline for [rob.tn](https://rob.tn). Polls public sources (GitHub, Gitea, hg-edge.mozilla.org, bugzilla.mozilla.org), stores raw payloads in Postgres, and serves a reshaped timeline to a static React frontend.
Personal activity timeline. Polls public sources (GitHub, Gitea, Mercurial, Bugzilla), stores raw payloads in Postgres, and serves a reshaped timeline to a static React frontend.
Successor to the now-defunct [grenade-events-react](https://github.com/grenade/grenade-events-react), which depended on MongoDB Stitch (retired by MongoDB in September 2022).
@@ -28,7 +28,7 @@ cargo run -p moments-api # serves on 127.0.0.1:8080
cargo run -p moments-worker # one-shot ingest tick (until --interval is wired up)
```
The API expects a Postgres reachable at `DATABASE_URL`. For magrathea, that's an mTLS connection using the host cert. For local dev against a throwaway database:
The API expects a Postgres reachable at `DATABASE_URL`. In production this is an mTLS connection using the host cert. For local dev against a throwaway database:
```sh
DATABASE_URL=postgres://localhost/moments cargo run -p moments-api
@@ -39,37 +39,31 @@ Migrations live in `crates/moments-data/migrations/` and run automatically on wo
## Deployment
```sh
./script/deploy.sh prod all # api + worker + web
./script/deploy.sh prod api worker # subset
./script/deploy.sh prod default # api + web only (worker untouched)
./script/deploy.sh prod all --dry-run
./script/deploy.sh <env> all # api + worker + web
./script/deploy.sh <env> api worker # subset
./script/deploy.sh <env> default # api + web only (worker untouched)
./script/deploy.sh <env> all --dry-run
```
Topology:
Concrete hosts, ports, and the site's `server_name` live in `asset/manifest.yml`. The shape of the deployment:
| Component | Host | Notes |
| --------- | --------------------------------- | ------------------------------------------------------------------ |
| api | `anjie.kosherinata.internal` | binds `0.0.0.0:42424`; firewalld service `moments-api` |
| worker | `anjie.kosherinata.internal` | no listening port; pollers only |
| web | `oolon.kosherinata.internal` | per-site nginx ingress for rob.tn; `/api/*` → anjie across the WG |
| db | `magrathea.kosherinata.internal` | postgres mTLS, passwordless |
| Component | Notes |
| --------- | ---------------------------------------------------------------------- |
| api | binds the port from `api.config.bind`; firewalld service `moments-api` |
| worker | no listening port; pollers only |
| web | per-site nginx ingress; `/api/*` reverse-proxies to the api host |
| db | postgres mTLS, passwordless |
api and worker are co-located on `anjie` while `nikola` and `frootmig` are out for drive replacement.
Postgres roles `moments_rw` and `moments_ro` must exist on the primary, with `pg_ident.conf.d/<host>.conf` mapping the api host's FQDN → `moments_ro` and the worker host's FQDN → `moments_rw`. See `asset/sql/bootstrap-moments.sql`, `asset/postgres/ident.conf.tmpl`, and `script/db-perms.sh` (idempotently adds the cert_cn lines on the postgres primary + standby and reloads postgres).
Postgres roles `moments_rw` and `moments_ro` must exist on the primary, with `pg_ident.conf` mapping `anjie.kosherinata.internal` to **both** roles (one cert_cn line per mapping). See `asset/sql/bootstrap-moments.sql` and `asset/postgres/ident.conf.tmpl`.
Inter-host traffic over the WG mesh: web's nginx connects to the api host in plaintext. The mesh provides the encryption layer; per-hop TLS for an internal HTTP read-only API on already-public data is deferred. If that changes, swap the api binary to rustls + the host cert pair, and update the nginx upstream to `https://`.
Inter-host traffic over the WG mesh: oolon's nginx connects to `http://anjie.kosherinata.internal:42424` in plaintext. The mesh provides the encryption layer; per-hop TLS for an internal HTTP read-only API on already-public data is deferred. If that changes, swap the api binary to rustls + the host cert pair, and update the nginx upstream to `https://`.
Secrets are resolved at deploy time via `pass`. The mapping of env-var name → pass-store path lives under `worker.secrets` in `manifest.yml`; `deploy.sh` iterates the map, fetches each secret, and substitutes the matching `{{NAME}}` placeholder in `worker.env.tmpl`. To add a secret: add a `worker.secrets` entry, add `NAME={{NAME}}` to `worker.env.tmpl`, and ensure `pass show <path>` returns the value on the deploying machine.
Secrets resolved by `deploy.sh` via `pass`:
### First-time setup
- `github.com/grenade/admin-token` — GitHub PAT for events + search APIs (worker only).
After the first successful prod deploy:
Optional, set if needed in `worker.env`: `GITEA_TOKEN`, `BUGZILLA_API_KEY`.
### DNS cutover
`rob.tn` currently resolves to GitHub Pages. After the first successful prod deploy:
1. Update Cloudflare DNS for `rob.tn` to the WAN IP that fronts `oolon` (unproxied — see architecture doc §11).
2. Confirm `curl -fsS https://rob.tn/api/v1/healthz` returns `ok`.
3. Add an archival notice to the top of [grenade-events-react/readme.md](https://github.com/grenade/grenade-events-react) pointing at this repo, and archive the GitHub repo.
1. Point public DNS for the site at the web host's public IP (unproxied).
2. Confirm `curl --fail --silent --show-error https://<site>/api/v1/healthz` returns `ok`.
3. If migrating from a predecessor, archive the old repo with a pointer to this one.