Compare commits
12 Commits
ecfefa6433
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
83652460ed
|
|||
|
c5ea03b026
|
|||
|
2bc1a08055
|
|||
|
a0de8ba18c
|
|||
|
c644e7ba46
|
|||
|
eaf2398c7a
|
|||
|
e9447f54f4
|
|||
|
4f66508d86
|
|||
|
4881720304
|
|||
|
e67f9d7d4f
|
|||
|
3261b3274c
|
|||
|
9db5743531
|
135
generic.md
135
generic.md
@@ -22,10 +22,10 @@ Projects are Rust cargo workspaces. The repository root contains:
|
||||
│ ├── <app>-api/ # binary: REST / JSON / WebSocket daemon
|
||||
│ ├── <app>-worker/ # binary: long-running processor / queue consumer
|
||||
│ └── <app>-cli/ # binary: operator / admin CLI
|
||||
├── web/ # Vite + React + SWC + TS frontend (when applicable)
|
||||
├── <frontend-dir>/ # Vite + React + SWC + TS frontend(s) — see §4
|
||||
├── asset/ # deployment artifacts (see §6)
|
||||
├── script/ # deploy.sh and related operational scripts
|
||||
└── README.md
|
||||
└── readme.md
|
||||
```
|
||||
|
||||
### Crate naming
|
||||
@@ -130,10 +130,19 @@ API and worker binaries are managed by systemd unit files shipped from `asset/sy
|
||||
## 4. Frontends
|
||||
|
||||
### Web (default)
|
||||
Vite + React + SWC + TypeScript, in `<repo-root>/web/`:
|
||||
Vite + React + SWC + TypeScript. The frontend lives in a top-level directory named for what it *is* to the user, not a generic `web/`. Common names:
|
||||
|
||||
- `web/` — the primary public-facing app, when there's only one frontend
|
||||
- `ui/` — equivalent, when the project prefers this naming
|
||||
- `dashboard/` — a user-facing dashboard UI
|
||||
- `admin/` — an operator/admin console distinct from the public UI
|
||||
|
||||
A project may have more than one of these (e.g., both `dashboard/` and `admin/`). Each is an independent Vite app with its own `package.json`, built and deployed separately, typically served from its own nginx `server_name` or path prefix. Pick names that describe the audience or purpose; don't invent a generic wrapper directory to hold them.
|
||||
|
||||
Whatever the directory is called, the internal structure is the same:
|
||||
|
||||
```
|
||||
web/
|
||||
<frontend-dir>/
|
||||
├── package.json
|
||||
├── vite.config.ts
|
||||
├── tsconfig.json
|
||||
@@ -150,6 +159,7 @@ web/
|
||||
- Build output is static. Deployed to an nginx CDN endpoint — no Node.js in production.
|
||||
- API base URL is configured at build time (Vite `import.meta.env.VITE_API_BASE_URL`) and stamped per environment during deploy.
|
||||
- Prefer React Query or equivalent for server state. Keep business logic server-side; the frontend is a rendering and interaction layer.
|
||||
- When a project has multiple frontends, they may share types via a local package (e.g., `packages/shared-types/`) or via generated TypeScript bindings from the Rust `entities` crate. Don't duplicate API clients across frontends — factor the shared bits out.
|
||||
|
||||
### Web (Rust framework exception)
|
||||
Use a Rust web framework (Axum + templating, or a fullstack framework) **only when** the deployment model requires a single self-contained binary with no external web server — e.g., distributed orchestration nodes that each serve their own UI over TLS. The Cichlid pattern. Default is still Vite + nginx.
|
||||
@@ -167,11 +177,32 @@ Tauri. Consumes the same `<app>-api` as the web client. Shares types via the `<a
|
||||
### Central database: Postgres
|
||||
Default for any app with a central data store.
|
||||
|
||||
- **Default server: `magrathea.kosherinata.internal:5432`**, with `frankie.hanzalova.internal` as a streaming standby. Unless a project explicitly specifies otherwise, assume a new app uses this cluster. Postgres 18 (path: `/var/lib/pgsql/18/data/`).
|
||||
- Connection is **mTLS with passwordless auth**. Host-level client certificates issued by the internal step-ca, with cert CN → pg role mapping via `pg_ident.conf`.
|
||||
- No passwords in config files, ever. Connection strings reference cert paths.
|
||||
- No passwords in config files, ever. Connection strings reference cert paths (§11 TLS / PKI).
|
||||
|
||||
**Granting an app access to the database:**
|
||||
|
||||
1. Create the Postgres role(s) the app needs (e.g., `<app>_rw`, `<app>_ro`) on the **primary only** — replication carries them to the standby.
|
||||
2. Map the app host's cert CN to the Postgres role by dropping a file at `/var/lib/pgsql/18/data/pg_ident.conf.d/<app-host-fqdn>.conf` with one line per mapping:
|
||||
```
|
||||
cert_cn <app-host-fqdn> <db-username>
|
||||
```
|
||||
Multiple lines if the host connects as more than one role.
|
||||
3. Deploy the **same** ident drop-in to **both** `magrathea` and `frankie` — standbys don't replicate `pg_ident.conf` contents, and a failover to a server missing the mapping will lock the app out.
|
||||
4. On each server, reload Postgres to pick up the change (no restart needed):
|
||||
```
|
||||
sudo systemctl reload postgresql-18
|
||||
```
|
||||
5. Verify from the app host by connecting with its host cert and confirming the role resolves as expected.
|
||||
|
||||
`deploy.sh` should handle steps 2–4 idempotently when an app is being deployed to a new host (or when a host's cert CN changes).
|
||||
- Migrations via `sqlx-cli` or `refinery`; migration files live in `crates/<app>-data/migrations/`.
|
||||
- **Migrations are sequentially versioned and immutable once committed.** File naming follows the tool's convention (`V0001__init.sql`, `V0002__add_users.sql`, … for refinery; `0001_init.sql`, `0002_add_users.sql`, … for sqlx). Each new schema change lands as a **new** file with the next sequence number — **never** edit a migration that has already been committed, even if it hasn't been deployed yet, because checksums diverge and the migration runner will refuse to start (or worse, leave production out of sync with dev).
|
||||
- Schema changes are forward-only in production. Destructive migrations require a dedicated maintenance window and an explicit plan.
|
||||
- If you catch a bug in a recently-added migration *before* it's been merged or deployed anywhere, amending is fine — but the moment it's landed on `main` or run against any database, treat it as frozen and write a follow-up migration to correct the mistake.
|
||||
- Use `sqlx` with compile-time query checking (`sqlx prepare`) and commit the generated `.sqlx/` offline query cache so CI builds don't need a live database.
|
||||
- **Agentic contributors working in a project with a Postgres dependency will usually have MCP access to a Postgres MCP server scoped to that project's database(s).** Prefer using the MCP server to inspect schema, verify query shapes against real tables, and sanity-check migrations before applying them — don't guess at column names or types when you can look them up. The scope is limited to the project's own databases; don't assume access to unrelated ones.
|
||||
|
||||
### Distributed database: Turso
|
||||
When the app's data model is distributed (edge replicas, per-site local copies with sync), use Turso. Auth via Turso-issued tokens stored in the per-host secret store, not in `manifest.yml`.
|
||||
@@ -372,21 +403,24 @@ For each component with a firewalld service definition:
|
||||
|
||||
1. `rsync` the XML to `/etc/firewalld/services/<app>-<component>.xml` on the target.
|
||||
2. `firewall-cmd --reload` to pick up the new definition.
|
||||
3. Check if the service is already enabled in the target zone (default zone unless the manifest specifies otherwise):
|
||||
3. Resolve the host's default zone (`firewall-cmd --get-default-zone`) and check if the service is already enabled there:
|
||||
```
|
||||
firewall-cmd --zone=<zone> --query-service=<app>-<component>
|
||||
zone=$(firewall-cmd --get-default-zone)
|
||||
firewall-cmd --zone=$zone --query-service=<app>-<component>
|
||||
```
|
||||
4. If not, enable it persistently **and** in the runtime config:
|
||||
```
|
||||
firewall-cmd --permanent --zone=<zone> --add-service=<app>-<component>
|
||||
firewall-cmd --zone=<zone> --add-service=<app>-<component>
|
||||
firewall-cmd --permanent --zone=$zone --add-service=<app>-<component>
|
||||
firewall-cmd --zone=$zone --add-service=<app>-<component>
|
||||
```
|
||||
5. On component removal (future concern), the reverse: `--remove-service` then delete the XML.
|
||||
|
||||
Steps must be idempotent — re-running a deploy is a no-op on the firewall layer if the service is already installed and enabled.
|
||||
|
||||
### Zone selection
|
||||
Most services bind to internal WireGuard interfaces. Put the WireGuard interface in a dedicated `internal` or `wg` zone and open services there. Public-facing services (rare — nginx is usually the only one) go in the default `public`/`FedoraServer` zone. The manifest may optionally specify a `zone:` per component; default to `internal` if unset.
|
||||
The infrastructure uses **only the default zone** created at OS install time — `FedoraServer` on servers, `FedoraWorkstation` on workstations. There are no custom zones (no `internal`, no `wg`), and `deploy.sh` should not create any. Always add services to whatever `firewall-cmd --get-default-zone` reports on the target host.
|
||||
|
||||
If a future need arises to segment traffic by interface (e.g., restricting a component to the WireGuard interface only), revisit this section before introducing custom zoning — don't add it silently.
|
||||
|
||||
### Port ranges, ICMP, sources
|
||||
If a service needs port ranges, ICMP types, or source-IP restrictions, put them in the same XML using firewalld's standard elements (`<port port="x-y" />`, `<source address="..."/>`). Don't split these across multiple named services.
|
||||
@@ -446,11 +480,46 @@ This is the environment these apps deploy into. Claude Code should assume it.
|
||||
- Internal DNS split-horizon via `.internal` domains (`hanzalova.internal`, `kosherinata.internal`, etc.).
|
||||
|
||||
### TLS / PKI
|
||||
- Internal PKI via Smallstep `step-ca` at `ca.internal`.
|
||||
- Host certs renewed via systemd timers.
|
||||
- mTLS everywhere internal services talk to each other.
|
||||
- Internal PKI via Smallstep `step-ca` at `https://ca.internal`.
|
||||
- Every host runs `step.service` (the Smallstep renewer) which keeps the host's cert fresh. **Certs are issued with a 24-hour expiry** and renewed continuously — services must tolerate cert rotation, not assume certs are stable for the life of the process.
|
||||
- **mTLS everywhere** internal services talk to each other.
|
||||
- **Quantum-safe** SSH (sntrup761x25519 KEX) and TLS (X25519MLKEM768 where peers support it) are the default. External peers that don't support PQ fall back to classical curves — document the fallback explicitly in nginx config.
|
||||
|
||||
**Standard cert paths on every host:**
|
||||
|
||||
| Path | Contents | Mode |
|
||||
| --- | --- | --- |
|
||||
| `/etc/pki/ca-trust/source/anchors/root-internal.pem` | Internal root CA bundle | world-readable |
|
||||
| `/etc/pki/tls/misc/$(hostname -f).pem` | Host cert (public) | world-readable |
|
||||
| `/etc/pki/tls/private/$(hostname -f).pem` | Host private key | ACL grants read to service-account users |
|
||||
|
||||
Application code and systemd units should reference these paths directly — they're the same on every host, so config templates don't need to bake in a hostname. The key file is not world-readable; each app's service account is granted read access via `setfacl` (e.g., `setfacl -m u:<app>:r /etc/pki/tls/private/$(hostname -f).pem`) as part of deploy. This happens in `deploy.sh` alongside the `systemd-sysusers` step (§8).
|
||||
|
||||
**Reacting to cert rotation:**
|
||||
|
||||
Services that hold cert state in memory (most Rust daemons using `rustls` or `openssl`) must reload when the host cert changes. Ship a pair of systemd units alongside the service unit:
|
||||
|
||||
```ini
|
||||
# /etc/systemd/system/<app>-api-cert.path
|
||||
[Path]
|
||||
PathChanged=/etc/pki/tls/misc/<hostname>.pem
|
||||
Unit=<app>-api-cert-reload.service
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
```ini
|
||||
# /etc/systemd/system/<app>-api-cert-reload.service
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/bin/systemctl reload <app>-api.service
|
||||
```
|
||||
|
||||
The service unit itself needs an `ExecReload=` that causes the daemon to re-read its certs without dropping in-flight requests (typically `SIGHUP` handling in the Rust binary). If the daemon can't reload gracefully, `ExecStart=/bin/systemctl restart <app>-api.service` is the fallback — but prefer graceful reload.
|
||||
|
||||
Ship these `.path` and cert-reload `.service` units from `asset/systemd/` the same way as the main unit.
|
||||
|
||||
### Ingress
|
||||
- Per-site nginx reverse proxy terminates all WAN inbound 443.
|
||||
- Public DNS via Cloudflare, **unproxied by default** (CF's mTLS origin-pull has been unreliable). Revisit if/when that changes.
|
||||
@@ -463,6 +532,26 @@ This is the environment these apps deploy into. Claude Code should assume it.
|
||||
- SELinux enforcing per §10.
|
||||
- Podman quadlets for containerised workloads; bare-metal systemd units for native Rust binaries (preferred where feasible).
|
||||
|
||||
### GPU / inference
|
||||
Three bare-metal GPU hosts run [`mistral.rs`](https://github.com/EricLBuehler/mistral.rs) serving an OpenAI-compatible API on port `1234`:
|
||||
|
||||
| Host | GPU(s) |
|
||||
| --- | --- |
|
||||
| `beast.hanzalova.internal:1234` | 2× RTX 5090 |
|
||||
| `benjy.hanzalova.internal:1234` | 1× RTX 4090 |
|
||||
| `quadbrat.hanzalova.internal:1234` | 1× RTX 3060 |
|
||||
|
||||
- **No TLS, no auth.** The endpoints accept any bearer token (including a dummy one — most clients still require a non-empty token field). They are reachable only via the WireGuard mesh and protected at the network layer.
|
||||
- Model availability and capacity differ per host. Each host loads a different set depending on VRAM, and the set changes over time. Consumers must discover what's loaded by querying `/v1/models` on each endpoint rather than hard-coding model names to hosts.
|
||||
- **Planned: unified proxy at `https://cortex.internal:443`.** [`cortex`](https://git.lair.cafe/helexa/cortex) is an in-progress project that will load, evict, and route models across the three backends and expose a single TLS-terminated endpoint. Until it ships as functional, inference consumers must talk to the three backends directly and handle discovery/routing themselves.
|
||||
- When `cortex` lands, consumers should point at `https://cortex.internal:443` and drop the direct-backend logic. Until then, a simple strategy is: query `/v1/models` on all three hosts, pick the host that has the requested model loaded (prefer larger GPUs first for throughput), and fall back through the list on errors.
|
||||
|
||||
### Source hosting
|
||||
- **New projects are hosted on the self-hosted Gitea instance** at `git.lair.cafe` (or `git.internal` on the WireGuard mesh — both resolve to the same instance). Agentic contributors will usually have MCP access to this Gitea and should prefer it over any public forge when creating repos, issues, or PRs.
|
||||
- **Legacy projects** live under various GitHub / GitLab orgs tied to my public username (`grenade`). These will continue to exist but are being migrated to Gitea over time, especially when they come up for a refactor.
|
||||
- **When a project has been relocated**, the original public repo should carry a prominent notice at the top of its `readme.md` (or a GitHub archival notice) pointing to the new Gitea URL. If you're working in a repo that looks stale or superseded, check for such a notice before assuming it's still the canonical location.
|
||||
- Default to `git.lair.cafe` / `git.internal` for new scaffolds. Only push a new project to GitHub/GitLab if there's a specific reason (OSS visibility, CI integration that only the public forge offers, etc.) — and note the reason in the project `readme.md`.
|
||||
|
||||
---
|
||||
|
||||
## 12. Code Quality and Tooling
|
||||
@@ -490,8 +579,18 @@ This is the environment these apps deploy into. Claude Code should assume it.
|
||||
|
||||
### Documentation
|
||||
- Every public item in library crates has a doc comment.
|
||||
- Each crate has a `README.md` or top-level module doc explaining its role in the workspace.
|
||||
- The repo `README.md` covers: what the project does, how to build, how to run locally, how to deploy. Point readers to this document for architectural conventions.
|
||||
- Each crate has a `readme.md` or top-level module doc explaining its role in the workspace.
|
||||
- The repo `readme.md` covers: what the project does, how to build, how to run locally, how to deploy. Point readers to this document for architectural conventions.
|
||||
- **Name readme files `readme.md` (lowercase), not `README.md`.** The shouty all-caps spelling is a convention I don't share; filenames aren't where emphasis belongs. Every forge in use (Gitea, GitHub, GitLab) renders `readme.md` as the repo landing page just as readily as `README.md`. Other conventional top-level docs — `license`, `changelog`, `contributing` — follow the same rule: lowercase, no shouting.
|
||||
- **Exception: `CLAUDE.md` and `AGENTS.md` stay in uppercase.** These are agent-facing instruction files and are easy to miss in a file listing when lowercased. The all-caps spelling is the established convention and the one that tooling (Claude Code and other agent harnesses) looks for, so leave them as-is.
|
||||
- **Agents may modify `CLAUDE.md` and `AGENTS.md` at their own discretion** — no approval needed to add, update, or remove guidance when it's warranted. Diffs get reviewed, so unintentional drift will surface in the normal flow. Treat these as living instructions that should be kept accurate and current.
|
||||
|
||||
### Commits
|
||||
- **Use [Conventional Commits](https://www.conventionalcommits.org/) syntax for every commit.** `type(scope): subject`, with types drawn from the standard set (`feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `build`, `ci`, `perf`, `style`). Scope is the crate, component, or area touched. Subject is imperative and under ~70 characters. A body may follow if the *why* isn't self-evident.
|
||||
- **Agentic contributors may commit without asking**, provided the change is a coherent, complete unit of work — the feature works, the bug is fixed, the refactor is finished. No approval prompt is needed for good commits that end a thread of work.
|
||||
- **Don't declare victory prematurely.** If there's a realistic chance that follow-up commits on the same topic will be needed to finish the job (because the implementation is speculative, the tests haven't been run, or edge cases haven't been considered), stop and think before committing. A stream of sequential commits all fixing up the same incomplete attempt pollutes history and is more annoying than an approval prompt.
|
||||
- **When in doubt, consolidate before committing** rather than landing half-done work and patching it afterwards. One commit that resolves the task cleanly beats five commits that thrash around getting there.
|
||||
- Never `--amend` a pushed commit, never `--no-verify`, and never bypass pre-commit hooks to get a commit in. If a hook fails, fix the underlying issue.
|
||||
|
||||
---
|
||||
|
||||
@@ -506,9 +605,11 @@ When scaffolding or extending a project:
|
||||
5. Any new deployable component gets an entry in `asset/manifest.yml`, a systemd unit in `asset/systemd/`, a sysusers drop-in, a firewalld service XML, and any required SELinux assets — in the same change.
|
||||
6. Config templates go in `asset/config/` with `{{PLACEHOLDER}}` secrets. Never commit a rendered config.
|
||||
7. Postgres connections are mTLS, passwordless. If writing connection code that accepts a password, stop and ask.
|
||||
8. Frontend is Vite + React + SWC + TS, served as static assets from nginx. Rust web frameworks require a stated reason.
|
||||
8. Frontends are Vite + React + SWC + TS, served as static assets from nginx. Name the directory after its audience (`web/`, `ui/`, `dashboard/`, `admin/`) — `web/` is not a mandated convention. Rust web frameworks require a stated reason.
|
||||
9. Services run as dedicated non-root users with hardened systemd units per §8. Root requires explicit justification.
|
||||
10. Every listening port gets a named firewalld service per §9. No bare `--add-port` calls.
|
||||
11. SELinux stays enforcing. Work with the default policy first; ship a custom module only when necessary (§10). Never suggest `setenforce 0`.
|
||||
12. Prefer fewer dependencies. Prefer bare-metal systemd over containers unless there's a reason.
|
||||
13. When unsure, ask — these preferences are defaults, not mandates, but deviations should be deliberate.
|
||||
13. Commit in Conventional Commits syntax. Commit autonomously when the work is done; hold off when follow-ups on the same topic are likely (§12 Commits).
|
||||
14. Default new repos to `git.lair.cafe` / `git.internal` (self-hosted Gitea). Public forges only with a stated reason (§11 Source hosting).
|
||||
15. When unsure, ask — these preferences are defaults, not mandates, but deviations should be deliberate.
|
||||
|
||||
27
readme.md
Normal file
27
readme.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# architecture
|
||||
|
||||
Living documentation for the conventions and scaffolding defaults I use across every project I maintain. If you're contributing to one of those projects — as a human or as an AI coding agent — this repo is required reading.
|
||||
|
||||
## What this is
|
||||
|
||||
A single place where decisions about workspace layout, deployment, infrastructure, service hardening, firewall rules, SELinux posture, and similar cross-cutting concerns are written down once and reused everywhere. Rather than re-deriving (or forgetting) the same defaults in every repo, each project points here and inherits them.
|
||||
|
||||
The goal is boring consistency: the same crate layout, the same deploy flow, the same systemd hardening, the same firewalld approach across every app I own, so that context switching between projects doesn't mean re-learning the shape of things.
|
||||
|
||||
## What's here
|
||||
|
||||
- **`generic.md`** — the baseline. Applies to every project unless that project explicitly overrides a section. Covers workspace layout, separation of concerns, configuration, secrets, deployment, service accounts, firewalld, SELinux, and code quality.
|
||||
|
||||
More files will appear here over time as guidance that's more specific than `generic.md` gets extracted — per-stack, per-deployment-target, or per-problem-domain documents. When a project needs guidance that isn't generic, it belongs in a new file here, not buried in one project's repo.
|
||||
|
||||
## How to use it
|
||||
|
||||
- **If you're scaffolding a new project:** start from `generic.md` and follow it. Deviations should be deliberate and noted in that project's own README.
|
||||
- **If you're contributing to an existing project of mine:** read `generic.md` first. The project's local `CLAUDE.md` or `README.md` will note any intentional deviations; everything else defaults to what's here.
|
||||
- **If you're an AI agent:** treat this repo's contents as authoritative defaults for any project under my control. When the surrounding project doesn't specify, fall back to the guidance here. When it does specify, the project wins — but flag the deviation so it's visible.
|
||||
|
||||
## How this evolves
|
||||
|
||||
This is living documentation, not a spec frozen at a point in time. When a convention changes — because something broke, because a better pattern emerged, or because the infrastructure itself changed — the update lands here first, and projects catch up on their next touch.
|
||||
|
||||
If you find guidance here that contradicts what's actually running in production, the guidance is wrong. Open an issue or a PR.
|
||||
Reference in New Issue
Block a user