Compare commits

...

9 Commits

Author SHA1 Message Date
83652460ed docs(generic): document GPU inference hosts and planned cortex proxy
Add the three mistral.rs backends (beast, benjy, quadbrat) with their GPU
capacity and the port 1234 / no-auth / no-TLS contract. Note that consumers
must currently discover model availability per-host via /v1/models, and
that cortex (git.lair.cafe/helexa/cortex) will eventually unify them
behind https://cortex.internal:443.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:25:59 +03:00
c5ea03b026 docs(generic): document default Postgres cluster and cert-CN mapping flow
Call out magrathea (primary) / frankie (standby) as the default Postgres
cluster and document the concrete steps to grant an app access: create
roles on the primary, drop a pg_ident.conf.d file on both servers, and
reload postgresql-18. The both-servers detail is easy to miss and costs
the app during a failover.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:13:17 +03:00
2bc1a08055 docs(generic): document TLS cert paths, rotation cadence, and reload pattern
Expand §11 TLS/PKI with the concrete host cert paths, file modes, and the
ACL-for-service-accounts pattern. Document the 24h cert expiry and the
continuous step.service renewal so implementations don't assume certs are
stable. Add the standard systemd .path/.service reload pair for services
that need to re-read certs without restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 13:38:42 +03:00
a0de8ba18c docs(generic): keep CLAUDE.md/AGENTS.md uppercase, allow autonomous edits
Carve out the agent-instruction files as exceptions to the lowercase-readme
convention — their all-caps naming is what tooling expects and what makes
them visible in a file listing. Also document that agents can modify these
files on their own judgement; diffs get reviewed so drift is caught
downstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:54:32 +03:00
c644e7ba46 docs: adopt lowercase readme.md convention
Add guidance in generic.md §12 that readme files (and other conventional
top-level docs: license, changelog, contributing) should be named in
lowercase, not shouty all-caps. Update all README.md references in
generic.md and rename this repo's own README.md to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:40:30 +03:00
eaf2398c7a docs(generic): document migration immutability and sequential versioning
Migrations are sequentially numbered and frozen once committed. Editing an
already-landed migration causes checksum divergence and migration-runner
failures at deploy time — new changes must go in new files. Call this out
explicitly so contributors don't quietly break a service by "fixing" a
prior migration in place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:36:52 +03:00
e9447f54f4 docs(generic): note Postgres MCP server availability for agentic contributors
Projects with a Postgres dependency typically expose an MCP server scoped
to their database(s). Call this out so agents know to verify schema and
query shapes against the real database rather than guessing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:34:37 +03:00
4f66508d86 docs(generic): document Gitea (git.lair.cafe) as default source host
Note that new projects default to the self-hosted Gitea instance at
git.lair.cafe (git.internal on the WireGuard mesh), that legacy projects
on GitHub/GitLab are being migrated as they come up for refactor, and
that relocated repos should carry a prominent pointer to the new URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:32:10 +03:00
4881720304 docs(generic): clarify frontend directory naming is not fixed to "web/"
The "web/" folder name in §4 was being read as a required convention, but
projects routinely use ui/, dashboard/, or admin/ instead — and may have
more than one frontend in the same repo. Document the common names, note
that each frontend is an independent Vite app, and add guidance on sharing
types across multiple frontends.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:23:47 +03:00
2 changed files with 102 additions and 12 deletions

View File

@@ -22,10 +22,10 @@ Projects are Rust cargo workspaces. The repository root contains:
│ ├── <app>-api/ # binary: REST / JSON / WebSocket daemon
│ ├── <app>-worker/ # binary: long-running processor / queue consumer
│ └── <app>-cli/ # binary: operator / admin CLI
├── web/ # Vite + React + SWC + TS frontend (when applicable)
├── <frontend-dir>/ # Vite + React + SWC + TS frontend(s) — see §4
├── asset/ # deployment artifacts (see §6)
├── script/ # deploy.sh and related operational scripts
└── README.md
└── readme.md
```
### Crate naming
@@ -130,10 +130,19 @@ API and worker binaries are managed by systemd unit files shipped from `asset/sy
## 4. Frontends
### Web (default)
Vite + React + SWC + TypeScript, in `<repo-root>/web/`:
Vite + React + SWC + TypeScript. The frontend lives in a top-level directory named for what it *is* to the user, not a generic `web/`. Common names:
- `web/` — the primary public-facing app, when there's only one frontend
- `ui/` — equivalent, when the project prefers this naming
- `dashboard/` — a user-facing dashboard UI
- `admin/` — an operator/admin console distinct from the public UI
A project may have more than one of these (e.g., both `dashboard/` and `admin/`). Each is an independent Vite app with its own `package.json`, built and deployed separately, typically served from its own nginx `server_name` or path prefix. Pick names that describe the audience or purpose; don't invent a generic wrapper directory to hold them.
Whatever the directory is called, the internal structure is the same:
```
web/
<frontend-dir>/
├── package.json
├── vite.config.ts
├── tsconfig.json
@@ -150,6 +159,7 @@ web/
- Build output is static. Deployed to an nginx CDN endpoint — no Node.js in production.
- API base URL is configured at build time (Vite `import.meta.env.VITE_API_BASE_URL`) and stamped per environment during deploy.
- Prefer React Query or equivalent for server state. Keep business logic server-side; the frontend is a rendering and interaction layer.
- When a project has multiple frontends, they may share types via a local package (e.g., `packages/shared-types/`) or via generated TypeScript bindings from the Rust `entities` crate. Don't duplicate API clients across frontends — factor the shared bits out.
### Web (Rust framework exception)
Use a Rust web framework (Axum + templating, or a fullstack framework) **only when** the deployment model requires a single self-contained binary with no external web server — e.g., distributed orchestration nodes that each serve their own UI over TLS. The Cichlid pattern. Default is still Vite + nginx.
@@ -167,11 +177,32 @@ Tauri. Consumes the same `<app>-api` as the web client. Shares types via the `<a
### Central database: Postgres
Default for any app with a central data store.
- **Default server: `magrathea.kosherinata.internal:5432`**, with `frankie.hanzalova.internal` as a streaming standby. Unless a project explicitly specifies otherwise, assume a new app uses this cluster. Postgres 18 (path: `/var/lib/pgsql/18/data/`).
- Connection is **mTLS with passwordless auth**. Host-level client certificates issued by the internal step-ca, with cert CN → pg role mapping via `pg_ident.conf`.
- No passwords in config files, ever. Connection strings reference cert paths.
- No passwords in config files, ever. Connection strings reference cert paths (§11 TLS / PKI).
**Granting an app access to the database:**
1. Create the Postgres role(s) the app needs (e.g., `<app>_rw`, `<app>_ro`) on the **primary only** — replication carries them to the standby.
2. Map the app host's cert CN to the Postgres role by dropping a file at `/var/lib/pgsql/18/data/pg_ident.conf.d/<app-host-fqdn>.conf` with one line per mapping:
```
cert_cn <app-host-fqdn> <db-username>
```
Multiple lines if the host connects as more than one role.
3. Deploy the **same** ident drop-in to **both** `magrathea` and `frankie` — standbys don't replicate `pg_ident.conf` contents, and a failover to a server missing the mapping will lock the app out.
4. On each server, reload Postgres to pick up the change (no restart needed):
```
sudo systemctl reload postgresql-18
```
5. Verify from the app host by connecting with its host cert and confirming the role resolves as expected.
`deploy.sh` should handle steps 24 idempotently when an app is being deployed to a new host (or when a host's cert CN changes).
- Migrations via `sqlx-cli` or `refinery`; migration files live in `crates/<app>-data/migrations/`.
- **Migrations are sequentially versioned and immutable once committed.** File naming follows the tool's convention (`V0001__init.sql`, `V0002__add_users.sql`, … for refinery; `0001_init.sql`, `0002_add_users.sql`, … for sqlx). Each new schema change lands as a **new** file with the next sequence number — **never** edit a migration that has already been committed, even if it hasn't been deployed yet, because checksums diverge and the migration runner will refuse to start (or worse, leave production out of sync with dev).
- Schema changes are forward-only in production. Destructive migrations require a dedicated maintenance window and an explicit plan.
- If you catch a bug in a recently-added migration *before* it's been merged or deployed anywhere, amending is fine — but the moment it's landed on `main` or run against any database, treat it as frozen and write a follow-up migration to correct the mistake.
- Use `sqlx` with compile-time query checking (`sqlx prepare`) and commit the generated `.sqlx/` offline query cache so CI builds don't need a live database.
- **Agentic contributors working in a project with a Postgres dependency will usually have MCP access to a Postgres MCP server scoped to that project's database(s).** Prefer using the MCP server to inspect schema, verify query shapes against real tables, and sanity-check migrations before applying them — don't guess at column names or types when you can look them up. The scope is limited to the project's own databases; don't assume access to unrelated ones.
### Distributed database: Turso
When the app's data model is distributed (edge replicas, per-site local copies with sync), use Turso. Auth via Turso-issued tokens stored in the per-host secret store, not in `manifest.yml`.
@@ -449,11 +480,46 @@ This is the environment these apps deploy into. Claude Code should assume it.
- Internal DNS split-horizon via `.internal` domains (`hanzalova.internal`, `kosherinata.internal`, etc.).
### TLS / PKI
- Internal PKI via Smallstep `step-ca` at `ca.internal`.
- Host certs renewed via systemd timers.
- mTLS everywhere internal services talk to each other.
- Internal PKI via Smallstep `step-ca` at `https://ca.internal`.
- Every host runs `step.service` (the Smallstep renewer) which keeps the host's cert fresh. **Certs are issued with a 24-hour expiry** and renewed continuously — services must tolerate cert rotation, not assume certs are stable for the life of the process.
- **mTLS everywhere** internal services talk to each other.
- **Quantum-safe** SSH (sntrup761x25519 KEX) and TLS (X25519MLKEM768 where peers support it) are the default. External peers that don't support PQ fall back to classical curves — document the fallback explicitly in nginx config.
**Standard cert paths on every host:**
| Path | Contents | Mode |
| --- | --- | --- |
| `/etc/pki/ca-trust/source/anchors/root-internal.pem` | Internal root CA bundle | world-readable |
| `/etc/pki/tls/misc/$(hostname -f).pem` | Host cert (public) | world-readable |
| `/etc/pki/tls/private/$(hostname -f).pem` | Host private key | ACL grants read to service-account users |
Application code and systemd units should reference these paths directly — they're the same on every host, so config templates don't need to bake in a hostname. The key file is not world-readable; each app's service account is granted read access via `setfacl` (e.g., `setfacl -m u:<app>:r /etc/pki/tls/private/$(hostname -f).pem`) as part of deploy. This happens in `deploy.sh` alongside the `systemd-sysusers` step (§8).
**Reacting to cert rotation:**
Services that hold cert state in memory (most Rust daemons using `rustls` or `openssl`) must reload when the host cert changes. Ship a pair of systemd units alongside the service unit:
```ini
# /etc/systemd/system/<app>-api-cert.path
[Path]
PathChanged=/etc/pki/tls/misc/<hostname>.pem
Unit=<app>-api-cert-reload.service
[Install]
WantedBy=multi-user.target
```
```ini
# /etc/systemd/system/<app>-api-cert-reload.service
[Service]
Type=oneshot
ExecStart=/bin/systemctl reload <app>-api.service
```
The service unit itself needs an `ExecReload=` that causes the daemon to re-read its certs without dropping in-flight requests (typically `SIGHUP` handling in the Rust binary). If the daemon can't reload gracefully, `ExecStart=/bin/systemctl restart <app>-api.service` is the fallback — but prefer graceful reload.
Ship these `.path` and cert-reload `.service` units from `asset/systemd/` the same way as the main unit.
### Ingress
- Per-site nginx reverse proxy terminates all WAN inbound 443.
- Public DNS via Cloudflare, **unproxied by default** (CF's mTLS origin-pull has been unreliable). Revisit if/when that changes.
@@ -466,6 +532,26 @@ This is the environment these apps deploy into. Claude Code should assume it.
- SELinux enforcing per §10.
- Podman quadlets for containerised workloads; bare-metal systemd units for native Rust binaries (preferred where feasible).
### GPU / inference
Three bare-metal GPU hosts run [`mistral.rs`](https://github.com/EricLBuehler/mistral.rs) serving an OpenAI-compatible API on port `1234`:
| Host | GPU(s) |
| --- | --- |
| `beast.hanzalova.internal:1234` | 2× RTX 5090 |
| `benjy.hanzalova.internal:1234` | 1× RTX 4090 |
| `quadbrat.hanzalova.internal:1234` | 1× RTX 3060 |
- **No TLS, no auth.** The endpoints accept any bearer token (including a dummy one — most clients still require a non-empty token field). They are reachable only via the WireGuard mesh and protected at the network layer.
- Model availability and capacity differ per host. Each host loads a different set depending on VRAM, and the set changes over time. Consumers must discover what's loaded by querying `/v1/models` on each endpoint rather than hard-coding model names to hosts.
- **Planned: unified proxy at `https://cortex.internal:443`.** [`cortex`](https://git.lair.cafe/helexa/cortex) is an in-progress project that will load, evict, and route models across the three backends and expose a single TLS-terminated endpoint. Until it ships as functional, inference consumers must talk to the three backends directly and handle discovery/routing themselves.
- When `cortex` lands, consumers should point at `https://cortex.internal:443` and drop the direct-backend logic. Until then, a simple strategy is: query `/v1/models` on all three hosts, pick the host that has the requested model loaded (prefer larger GPUs first for throughput), and fall back through the list on errors.
### Source hosting
- **New projects are hosted on the self-hosted Gitea instance** at `git.lair.cafe` (or `git.internal` on the WireGuard mesh — both resolve to the same instance). Agentic contributors will usually have MCP access to this Gitea and should prefer it over any public forge when creating repos, issues, or PRs.
- **Legacy projects** live under various GitHub / GitLab orgs tied to my public username (`grenade`). These will continue to exist but are being migrated to Gitea over time, especially when they come up for a refactor.
- **When a project has been relocated**, the original public repo should carry a prominent notice at the top of its `readme.md` (or a GitHub archival notice) pointing to the new Gitea URL. If you're working in a repo that looks stale or superseded, check for such a notice before assuming it's still the canonical location.
- Default to `git.lair.cafe` / `git.internal` for new scaffolds. Only push a new project to GitHub/GitLab if there's a specific reason (OSS visibility, CI integration that only the public forge offers, etc.) — and note the reason in the project `readme.md`.
---
## 12. Code Quality and Tooling
@@ -493,8 +579,11 @@ This is the environment these apps deploy into. Claude Code should assume it.
### Documentation
- Every public item in library crates has a doc comment.
- Each crate has a `README.md` or top-level module doc explaining its role in the workspace.
- The repo `README.md` covers: what the project does, how to build, how to run locally, how to deploy. Point readers to this document for architectural conventions.
- Each crate has a `readme.md` or top-level module doc explaining its role in the workspace.
- The repo `readme.md` covers: what the project does, how to build, how to run locally, how to deploy. Point readers to this document for architectural conventions.
- **Name readme files `readme.md` (lowercase), not `README.md`.** The shouty all-caps spelling is a convention I don't share; filenames aren't where emphasis belongs. Every forge in use (Gitea, GitHub, GitLab) renders `readme.md` as the repo landing page just as readily as `README.md`. Other conventional top-level docs — `license`, `changelog`, `contributing` — follow the same rule: lowercase, no shouting.
- **Exception: `CLAUDE.md` and `AGENTS.md` stay in uppercase.** These are agent-facing instruction files and are easy to miss in a file listing when lowercased. The all-caps spelling is the established convention and the one that tooling (Claude Code and other agent harnesses) looks for, so leave them as-is.
- **Agents may modify `CLAUDE.md` and `AGENTS.md` at their own discretion** — no approval needed to add, update, or remove guidance when it's warranted. Diffs get reviewed, so unintentional drift will surface in the normal flow. Treat these as living instructions that should be kept accurate and current.
### Commits
- **Use [Conventional Commits](https://www.conventionalcommits.org/) syntax for every commit.** `type(scope): subject`, with types drawn from the standard set (`feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `build`, `ci`, `perf`, `style`). Scope is the crate, component, or area touched. Subject is imperative and under ~70 characters. A body may follow if the *why* isn't self-evident.
@@ -516,10 +605,11 @@ When scaffolding or extending a project:
5. Any new deployable component gets an entry in `asset/manifest.yml`, a systemd unit in `asset/systemd/`, a sysusers drop-in, a firewalld service XML, and any required SELinux assets — in the same change.
6. Config templates go in `asset/config/` with `{{PLACEHOLDER}}` secrets. Never commit a rendered config.
7. Postgres connections are mTLS, passwordless. If writing connection code that accepts a password, stop and ask.
8. Frontend is Vite + React + SWC + TS, served as static assets from nginx. Rust web frameworks require a stated reason.
8. Frontends are Vite + React + SWC + TS, served as static assets from nginx. Name the directory after its audience (`web/`, `ui/`, `dashboard/`, `admin/`) — `web/` is not a mandated convention. Rust web frameworks require a stated reason.
9. Services run as dedicated non-root users with hardened systemd units per §8. Root requires explicit justification.
10. Every listening port gets a named firewalld service per §9. No bare `--add-port` calls.
11. SELinux stays enforcing. Work with the default policy first; ship a custom module only when necessary (§10). Never suggest `setenforce 0`.
12. Prefer fewer dependencies. Prefer bare-metal systemd over containers unless there's a reason.
13. Commit in Conventional Commits syntax. Commit autonomously when the work is done; hold off when follow-ups on the same topic are likely (§12 Commits).
14. When unsure, ask — these preferences are defaults, not mandates, but deviations should be deliberate.
14. Default new repos to `git.lair.cafe` / `git.internal` (self-hosted Gitea). Public forges only with a stated reason (§11 Source hosting).
15. When unsure, ask — these preferences are defaults, not mandates, but deviations should be deliberate.