# Project Architecture Preferences

Baseline architectural conventions for new projects. Claude Code should follow these defaults when scaffolding, implementing, or refactoring unless the project explicitly overrides them. When in doubt, ask before deviating.

These preferences are opinionated and have evolved from running real multi-site infrastructure. They optimise for: local-first operation, reproducible deployments, minimal release-time churn, and keeping secrets out of source control.

---

## 1. Workspace Layout

Projects are Rust cargo workspaces. The repository root contains:

```
<repo-root>/
├── Cargo.toml              # workspace manifest (workspace-level deps + version)
├── crates/                 # all Rust crates live here
│   ├── <app>-entities/     # domain types, DTOs, error enums — no I/O
│   ├── <app>-core/         # business logic, pure where practical
│   ├── <app>-data/         # data access: postgres, turso, filesystem, etc.
│   ├── <app>-crypto/       # shared cryptography (only if needed across bins)
│   ├── <app>-os-utils/     # shared OS/process/path helpers (only if needed)
│   ├── <app>-api/          # binary: REST / JSON / WebSocket daemon
│   ├── <app>-worker/       # binary: long-running processor / queue consumer
│   └── <app>-cli/          # binary: operator / admin CLI
├── web/                    # Vite + React + SWC + TS frontend (when applicable)
├── asset/                  # deployment artifacts (see §6)
├── script/                 # deploy.sh and related operational scripts
└── README.md
```

### Crate naming
Use `<app>-<role>` throughout. The `<app>` prefix makes grep, cargo output, and systemd unit naming unambiguous across a multi-project host.

### Separation of concerns (strict)
- **entities** — types only. No I/O, no async runtime deps. Serde, thiserror, chrono/time are fine. Everything downstream depends on this.
- **core** — business logic. Consumes entities. May define traits for data access (ports) that the `data` crate implements (adapters). No direct DB or network calls.
- **data** — implements the traits defined in `core`. Owns all sqlx/turso/reqwest usage relevant to persistence and external services.
- **binaries** — thin. Wire up config, logging, signal handling, and the appropriate core/data stack. Binaries should contain no business logic that could live in a library crate.

### Shared utility crates
Only create `<app>-crypto`, `<app>-os-utils`, etc. when genuinely shared by two or more binaries. Premature extraction is worse than inlining — extract when the second consumer appears.

---

## 2. Cargo Workspace Conventions

### Workspace-level versioning
Every crate in the workspace shares a single version, defined once in the root `Cargo.toml`:

```toml
[workspace.package]
version = "0.1.0"
edition = "2024"
rust-version = "1.85"
license = "GPL-3.0-or-later"   # adjust per project
authors = ["Rob Thijssen <rob@example>"]

[workspace]
resolver = "3"
members = ["crates/*"]
```

Each crate's `Cargo.toml` inherits:

```toml
[package]
name = "<app>-entities"
version.workspace = true
edition.workspace = true
rust-version.workspace = true
license.workspace = true
authors.workspace = true
```

**Rationale:** release tagging scripts only need to rewrite one version string. CI stamps and changelogs stay consistent across artifacts from the same tag.

### Workspace-level dependencies
Declare every external dependency once under `[workspace.dependencies]` in the root manifest. Crates reference them via `dep.workspace = true`. This prevents version drift between crates in the same workspace.

```toml
[workspace.dependencies]
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
sqlx = { version = "0.8", default-features = false, features = ["postgres", "runtime-tokio-rustls", "macros", "migrate"] }
thiserror = "2"
tracing = "0.1"
# ...
```

### Internal crate dependencies
Internal crates depend on each other via path + version:

```toml
<app>-entities = { path = "../<app>-entities", version = "=0.1.0" }
```

Use `=` pinning on the internal version to guarantee in-workspace coherence after publishing or vendoring.

### Edition and toolchain
- Rust edition **2024**.
- Commit a `rust-toolchain.toml` at the repo root pinning the stable channel to match CI.

---

## 3. Binaries and Runtime

### Daemons run under systemd
API and worker binaries are managed by systemd unit files shipped from `asset/systemd/`. Binaries should:

- Log to stdout/stderr using `tracing` with structured JSON output when `JOURNAL_STREAM` is set (journald will ingest cleanly).
- Read config from `/etc/<app>/config.toml` by default, overridable via `--config` and env vars (figment-style layering: file → env → CLI).
- Handle `SIGTERM` gracefully: stop accepting new work, drain in-flight tasks with a bounded timeout, exit 0.
- Never daemonise themselves. systemd owns the lifecycle.
- Expose a health endpoint (for api) or emit periodic heartbeat logs (for workers) so `systemd` and monitoring can assess liveness.

### API crate
- Axum is the default web framework unless there's a reason otherwise.
- Serves REST + JSON over TCP, with WebSocket upgrades where streaming is needed.
- TLS terminates at the site nginx reverse proxy (see §7) unless the binary itself is the ingress (e.g., Cichlid-style self-serving nodes), in which case use rustls with post-quantum-capable cipher suites.
- API surface versioned under `/v1/` from day one.
- Request/response types live in `<app>-entities` so clients (including the desktop app) can depend on them.

### Worker binaries
- Long-running processors (ingestion, indexing, queue consumers) use Postgres `FOR UPDATE SKIP LOCKED` for work-claiming where a central DB is already in play.
- Idempotent by design — crashes and restarts should never double-process.
- Backoff with jitter on transient failures; escalate to DLQ semantics on repeated failure.

---

## 4. Frontends

### Web (default)
Vite + React + SWC + TypeScript, in `<repo-root>/web/`:

```
web/
├── package.json
├── vite.config.ts
├── tsconfig.json
├── index.html
└── src/
    ├── main.tsx
    ├── App.tsx
    ├── api/          # generated or hand-written client for the <app>-api
    ├── components/
    ├── routes/
    └── lib/
```

- Build output is static. Deployed to an nginx CDN endpoint — no Node.js in production.
- API base URL is configured at build time (Vite `import.meta.env.VITE_API_BASE_URL`) and stamped per environment during deploy.
- Prefer React Query or equivalent for server state. Keep business logic server-side; the frontend is a rendering and interaction layer.

### Web (Rust framework exception)
Use a Rust web framework (Axum + templating, or a fullstack framework) **only when** the deployment model requires a single self-contained binary with no external web server — e.g., distributed orchestration nodes that each serve their own UI over TLS. The Cichlid pattern. Default is still Vite + nginx.

### Desktop
Tauri. Consumes the same `<app>-api` as the web client. Shares types via the `<app>-entities` crate (exposed to the Tauri frontend via generated TypeScript bindings — `ts-rs` or `specta`).

### Mobile
**Preferences TBD.** When a project targets mobile, the goal is a framework that consumes the existing backend API (keeping business logic server-side) and produces responsive native-quality UIs for both Android and iOS from a shared codebase. Revisit this section once there's real-world experience to draw on.

---

## 5. Data

### Central database: Postgres
Default for any app with a central data store.

- Connection is **mTLS with passwordless auth**. Host-level client certificates issued by the internal step-ca, with cert CN → pg role mapping via `pg_ident.conf`.
- No passwords in config files, ever. Connection strings reference cert paths.
- Migrations via `sqlx-cli` or `refinery`; migration files live in `crates/<app>-data/migrations/`.
- Schema changes are forward-only in production. Destructive migrations require a dedicated maintenance window and an explicit plan.
- Use `sqlx` with compile-time query checking (`sqlx prepare`) and commit the generated `.sqlx/` offline query cache so CI builds don't need a live database.

### Distributed database: Turso
When the app's data model is distributed (edge replicas, per-site local copies with sync), use Turso. Auth via Turso-issued tokens stored in the per-host secret store, not in `manifest.yml`.

### Caching / ephemeral state
Prefer in-process (moka, quick-cache) over introducing Redis. Only add Redis when multiple processes genuinely need to share ephemeral state and Postgres LISTEN/NOTIFY won't do.

---

## 6. Deployment Assets (`asset/`)

`asset/` is the single source of truth for what gets deployed where. **No secrets in this directory, ever** — it's in source control.

```
asset/
├── manifest.yml            # environments → components → hosts
├── systemd/
│   ├── <app>-api.service
│   ├── <app>-api.socket        # if socket-activated
│   ├── <app>-worker.service
│   ├── <app>-indexer.timer
│   ├── <app>-indexer.service
│   └── <app>.sysusers.conf     # systemd-sysusers drop-in
├── firewalld/
│   ├── <app>-api.xml           # named firewalld service per §9
│   └── <app>-worker.xml
├── selinux/                    # only if custom policy is required
│   ├── <app>.te
│   └── <app>.fc
├── nginx/
│   └── <app>.<site>.conf   # per server_name configs
├── config/
│   ├── config.toml.tmpl    # templated; {{SECRET_NAME}} placeholders
│   └── ...
└── sql/
    └── bootstrap.sql       # idempotent role/db creation
```

### `manifest.yml` structure

```yaml
app: <app>
environments:
  prod:
    components:
      api:
        hosts: [oolon.hanzalova.internal]
        config:
          bind: 127.0.0.1:8080
          log_level: info
      worker:
        hosts: [gramathea.kosherinata.internal, oolon.hanzalova.internal]
        config:
          concurrency: 4
      web:
        hosts: [cdn.hanzalova.internal]
        root: /var/www/<app>
  dev:
    components:
      api:
        hosts: [quadbrat.hanzalova.internal]
        config:
          bind: 127.0.0.1:8080
          log_level: debug
      # ...
```

Top-level keys: `app`, `environments`. Each environment defines `components`, each component defines `hosts` (one or many) and `config` (non-secret values only). Secret references are placeholders resolved by `deploy.sh` at deploy time.

### Templated config
Config file templates use a simple `{{VAR_NAME}}` syntax. `deploy.sh` substitutes values from the host's `pass` store (or equivalent) into the template before shipping to the target. The unrendered template is committed; the rendered file never is.

---

## 7. Deployment Script (`script/deploy.sh`)

A bash script with a stable CLI:

```
./script/deploy.sh <environment> [component...]

./script/deploy.sh prod api worker
./script/deploy.sh dev all
./script/deploy.sh prod default
```

### Contract
- First positional arg is the environment name, matched against `manifest.yml` `environments.*`.
- Subsequent args name components, or `all` (every component in the environment), or `default` (a project-defined sensible subset — often everything except disruptive migrations).
- Parses `manifest.yml` with `yq`.
- For each `(component, host)` pair:
  1. Build the artifact locally (cargo build release, vite build, etc.) if not already built for the current commit.
  2. Resolve secrets from `pass` (or the project's configured secret backend) and render config templates.
  3. `rsync` binary + rendered config + systemd units + sysusers drop-in + firewalld XML + any SELinux assets to the target host over ssh (quantum-safe key exchange).
  4. On the target, in this order:
     a. `systemd-sysusers` to create the service account if missing (§8).
     b. Create/chown `/etc/<app>`, `/var/lib/<app>` with correct modes.
     c. `restorecon -R` on installed paths; apply `semanage` changes and load any policy module (§10).
     d. Install firewalld service XML, `firewall-cmd --reload`, ensure the service is enabled in the appropriate zone, persistently and at runtime (§9).
     e. `systemctl daemon-reload` and restart the unit(s).
  5. Verify health (HTTP probe for api, `systemctl is-active` for all).
- Exit non-zero on any failure. Report per-host status at the end.

### Requirements
- Idempotent. Running twice with no changes is a no-op beyond file copies.
- Quiet on success, loud on failure.
- Supports `--dry-run` to print what would happen.
- Never writes secrets to disk on the build host outside of the rendered template being rsynced.

---

## 8. Deployment: Service Accounts

All targets run **Fedora Server (current stable, 43 at time of writing)**. Claude Code should assume this and target the idiomatic Fedora way of doing things.

Services run as **dedicated non-root system users** by default. Root is an exception requiring explicit justification (e.g., a service that genuinely needs to bind < 1024 without `CAP_NET_BIND_SERVICE`, manage kernel modules, or similar).

### Creating the service account
Ship a `sysusers.d` drop-in with the app and let systemd-sysusers create the user at deploy time. Place it at `asset/systemd/<app>.sysusers.conf`:

```
#Type Name          ID    GECOS                         Home directory    Shell
u     <app>         -     "<App> service account"       /var/lib/<app>    /usr/sbin/nologin
```

`deploy.sh` installs this to `/etc/sysusers.d/<app>.conf` and runs `systemd-sysusers` on the target. This is idempotent and survives package reinstalls.

### Directory ownership
Services typically need:
- `/etc/<app>/` — config (root:<app>, 0750; files 0640) so the daemon can read but not write.
- `/var/lib/<app>/` — mutable state (<app>:<app>, 0750).
- `/var/log/<app>/` — only if the service logs somewhere other than journald (rare; prefer journald).

`deploy.sh` must create these with correct ownership and modes, not rely on the service creating them at runtime.

### systemd unit hardening
Unit files in `asset/systemd/` should use the user, not run as root, and include the standard hardening knobs unless a specific feature prevents it:

```ini
[Service]
Type=notify
User=<app>
Group=<app>
ExecStart=/usr/local/bin/<app>-api --config /etc/<app>/config.toml

# Hardening — enable unless the service genuinely needs otherwise
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
PrivateDevices=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictRealtime=true
RestrictSUIDSGID=true
LockPersonality=true
MemoryDenyWriteExecute=true
SystemCallArchitectures=native

# Writable paths (minimum necessary)
ReadWritePaths=/var/lib/<app>

# Network
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
```

If a setting breaks the service, relax only that one — don't disable hardening wholesale.

### Privilege exceptions
If the service must run as root or with extra capabilities, document the reason in a comment at the top of the unit file. Prefer narrow `AmbientCapabilities=` (e.g., `CAP_NET_BIND_SERVICE`) over full root.

---

## 9. Deployment: Firewall (firewalld)

All hosts run `firewalld`. Every service that listens on a port must ship a **named firewalld service definition** rather than opening bare ports in a zone. The service name matches the systemd unit name (minus `.service`).

### Why named services
Named services are self-documenting (`firewall-cmd --list-services` tells you what's actually running), removable atomically on app decommission, and survive zone reassignment without reconfiguration.

### Shipping the definition
Place the XML in `asset/firewalld/<app>-<component>.xml`:

```xml
<?xml version="1.0" encoding="utf-8"?>
<service>
  <short><app>-api</short>
  <description>REST/WebSocket API for <app></description>
  <port protocol="tcp" port="8443"/>
  <!-- multiple ports fine if the app needs them -->
  <port protocol="tcp" port="8444"/>
</service>
```

### What `deploy.sh` must do
For each component with a firewalld service definition:

1. `rsync` the XML to `/etc/firewalld/services/<app>-<component>.xml` on the target.
2. `firewall-cmd --reload` to pick up the new definition.
3. Check if the service is already enabled in the target zone (default zone unless the manifest specifies otherwise):
   ```
   firewall-cmd --zone=<zone> --query-service=<app>-<component>
   ```
4. If not, enable it persistently **and** in the runtime config:
   ```
   firewall-cmd --permanent --zone=<zone> --add-service=<app>-<component>
   firewall-cmd --zone=<zone> --add-service=<app>-<component>
   ```
5. On component removal (future concern), the reverse: `--remove-service` then delete the XML.

Steps must be idempotent — re-running a deploy is a no-op on the firewall layer if the service is already installed and enabled.

### Zone selection
Most services bind to internal WireGuard interfaces. Put the WireGuard interface in a dedicated `internal` or `wg` zone and open services there. Public-facing services (rare — nginx is usually the only one) go in the default `public`/`FedoraServer` zone. The manifest may optionally specify a `zone:` per component; default to `internal` if unset.

### Port ranges, ICMP, sources
If a service needs port ranges, ICMP types, or source-IP restrictions, put them in the same XML using firewalld's standard elements (`<port port="x-y" />`, `<source address="..."/>`). Don't split these across multiple named services.

---

## 10. Deployment: SELinux

All hosts run **SELinux in enforcing mode**. Deployments must either operate cleanly within the default targeted policy or ship the labels and policy modules they need. Running `setenforce 0` to "just get it working" is never acceptable, and Claude Code should flag any suggestion to do so.

### Order of preference
Try these in order. Go no further down the list than necessary:

1. **Fit the default policy.** Install binaries to `/usr/local/bin/` (or `/usr/bin/` for packaged apps), state under `/var/lib/<app>/`, config under `/etc/<app>/`, logs to journald. These paths already have sensible default labels and most Rust daemons will run unmodified under `unconfined_service_t` or `init_t`.
2. **Apply existing contexts with `semanage fcontext`.** When files land in non-standard paths, map them to an appropriate existing type:
   ```
   semanage fcontext -a -t bin_t '/opt/<app>/bin(/.*)?'
   semanage fcontext -a -t etc_t '/opt/<app>/etc(/.*)?'
   semanage fcontext -a -t var_lib_t '/opt/<app>/var(/.*)?'
   restorecon -Rv /opt/<app>
   ```
3. **Use booleans** for common permissions the service needs (e.g., `setsebool -P httpd_can_network_connect on` if nginx needs to reach the API). Document every boolean flipped in the deployment.
4. **Register non-standard ports.** If the API binds to a port not already known to SELinux, label it:
   ```
   semanage port -a -t http_port_t -p tcp 8443   # if not already labelled
   ```
   Check first with `semanage port -l | grep <port>` and skip if the label is correct.
5. **Ship a custom policy module** only when the above don't cover it. Place sources in `asset/selinux/<app>.te` (and `.fc`, `.if` as needed). Build and install at deploy time:
   ```
   checkmodule -M -m -o <app>.mod <app>.te
   semodule_package -o <app>.pp -m <app>.mod -f <app>.fc
   semodule -i <app>.pp
   ```
   Custom modules should be as narrow as possible. If the policy ends up allowing everything, it's wrong — generate rules from `audit2allow` only after confirming the denial is actually legitimate, never as a blanket suppression.

### What `deploy.sh` must do
- After installing files, always run `restorecon -R` on their installation paths so filesystem labels match the policy.
- Apply `semanage fcontext` / `semanage port` / `setsebool` changes **permanently** (no runtime-only hacks).
- Load or reload any shipped policy module with `semodule -i`.
- Keep these operations idempotent. `semanage fcontext -a` on an already-registered path errors; deploy scripts should check with `semanage fcontext -l` first, or use `-m` (modify) with a guard.

### Dev loop
During development on a test host, `ausearch -m AVC -ts recent` and `audit2why` are the primary tools for diagnosing denials. Capture the clean set of rules once stable, fold into `asset/selinux/<app>.te`, and commit. Never leave a host with `permissive` mode set — if you set it during debugging, put it back before ending the session.

### Podman quadlets
For containerised workloads: quadlets run confined under `container_t` by default. Bind mounts of host paths need `:Z` (private relabel) or `:z` (shared relabel) depending on whether the volume is shared across containers. Default to `:Z` unless sharing is required.

---

## 11. Infrastructure Context

This is the environment these apps deploy into. Claude Code should assume it.

### Network
- Multi-site WireGuard mesh. Sites are numbered; host IPs follow `10.<site>.0.0/16` (currently `10.3.0.0/16` and `10.6.0.0/16`, but the second octet encodes the site and is the stable part).
- Per-site OPNsense router handles WAN/LAN and the WireGuard endpoints.
- Internal DNS split-horizon via `.internal` domains (`hanzalova.internal`, `kosherinata.internal`, etc.).

### TLS / PKI
- Internal PKI via Smallstep `step-ca` at `ca.internal`.
- Host certs renewed via systemd timers.
- mTLS everywhere internal services talk to each other.
- **Quantum-safe** SSH (sntrup761x25519 KEX) and TLS (X25519MLKEM768 where peers support it) are the default. External peers that don't support PQ fall back to classical curves — document the fallback explicitly in nginx config.

### Ingress
- Per-site nginx reverse proxy terminates all WAN inbound 443.
- Public DNS via Cloudflare, **unproxied by default** (CF's mTLS origin-pull has been unreliable). Revisit if/when that changes.
- nginx serves static frontends directly from `/var/www/<app>` and reverse-proxies API traffic to the internal host:port from `manifest.yml`.

### Hosts
- Fedora Server, current stable (43). Workstations run the same release.
- Services run as dedicated non-root users per §8.
- firewalld with named services per §9.
- SELinux enforcing per §10.
- Podman quadlets for containerised workloads; bare-metal systemd units for native Rust binaries (preferred where feasible).

---

## 12. Code Quality and Tooling

### Formatting and linting
- `cargo fmt` on commit (pre-commit hook or CI gate).
- `cargo clippy --all-targets --all-features -- -D warnings` must pass.
- Frontend: `eslint` + `prettier`, configured to match the team style. Type errors fail the build.

### Testing
- Unit tests live alongside the code they test (`#[cfg(test)] mod tests`).
- Integration tests under `crates/<crate>/tests/`.
- End-to-end tests that require a database use a dedicated test DB per run, created and torn down by the test harness.
- Target: core business logic has meaningful test coverage. Binaries have smoke tests.

### Observability
- `tracing` with `tracing-subscriber`. JSON output in production, pretty output when stdout is a TTY.
- Structured log fields for request IDs, user IDs (where applicable), and operation names.
- No `println!` or `eprintln!` in committed code outside of CLI binaries' user-facing output.

### Error handling
- Library crates use `thiserror` for typed errors.
- Binaries use `anyhow` at the outermost layer, with `.context(...)` at call boundaries.
- Never `unwrap()` in production code paths. `expect("...")` with a clear message is acceptable for invariants that are genuinely impossible to violate.

### Documentation
- Every public item in library crates has a doc comment.
- Each crate has a `README.md` or top-level module doc explaining its role in the workspace.
- The repo `README.md` covers: what the project does, how to build, how to run locally, how to deploy. Point readers to this document for architectural conventions.

---

## 13. Conventions Summary for Claude Code

When scaffolding or extending a project:

1. Default to the workspace layout in §1. Ask before deviating.
2. Put new types in `entities`, new logic in `core`, new I/O in `data`. Binaries stay thin.
3. Add dependencies to the workspace root first, then reference with `dep.workspace = true`.
4. Version strings live in exactly one place — the workspace root.
5. Any new deployable component gets an entry in `asset/manifest.yml`, a systemd unit in `asset/systemd/`, a sysusers drop-in, a firewalld service XML, and any required SELinux assets — in the same change.
6. Config templates go in `asset/config/` with `{{PLACEHOLDER}}` secrets. Never commit a rendered config.
7. Postgres connections are mTLS, passwordless. If writing connection code that accepts a password, stop and ask.
8. Frontend is Vite + React + SWC + TS, served as static assets from nginx. Rust web frameworks require a stated reason.
9. Services run as dedicated non-root users with hardened systemd units per §8. Root requires explicit justification.
10. Every listening port gets a named firewalld service per §9. No bare `--add-port` calls.
11. SELinux stays enforcing. Work with the default policy first; ship a custom module only when necessary (§10). Never suggest `setenforce 0`.
12. Prefer fewer dependencies. Prefer bare-metal systemd over containers unless there's a reason.
13. When unsure, ask — these preferences are defaults, not mandates, but deviations should be deliberate.