cortex

Author	SHA1	Message	Date
rob thijssen	ea1fdf8aa6	chore(deploy): drop deploy.sh and manifest.yml now that workflow runs First end-to-end run of the deploy workflow succeeded (gitea run #289), so the operator-run rolling-deploy script and its YAML manifest are no longer the source of truth — fleet topology lives in .gitea/workflows/deploy.yml and per-host config in script/infra-setup.sh. Per-host neuron config comments updated to point at the new sync path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-02 16:41:04 +03:00
rob thijssen	5c520c7e90	feat(deploy): gitea workflow for rolling RPM deploys + host bootstrap Replace operator-run script/deploy.sh with a CI-driven rolling deploy: - .gitea/workflows/deploy.yml fires on build-prerelease success (and is re-runnable via workflow_dispatch). Cortex upgrades first on hanzalova.internal; the three neuron hosts upgrade in parallel under fail-fast: false so one failing host doesn't sink the rest. Concurrency-grouped to serialize overlapping deploys, never cancelling in-flight runs (a half-applied dnf transaction is worse than a stale deploy). - asset/sudoers.d/{cortex,neuron}-host.conf are the canonical source for the scoped privileges gitea_ci needs on each host kind, installed as /etc/sudoers.d/helexa_gitea_ci. URLs and = signs are backslash-escaped per sudoers reserved-character rules. - script/infra-setup.sh idempotently provisions the gitea_ci user, installs the runner pubkey, drops in the appropriate sudoers fragment with visudo verification, and syncs cortex.toml / models.toml / per-host asset/neuron/<short>.toml — config still ships from operator workstations rather than CI because the first two are gitignored. The CI-only secret is RSYNC_SSH_KEY (already configured for the repo); the matching pubkey is ~/.ssh/id_gitea_ci.pub on the operator's box. script/deploy.sh and asset/manifest.yml are left in place until the first end-to-end deploy workflow run succeeds, then removed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 14:58:23 +03:00
rob thijssen	740299bd9d	chore(neuron/beast): switch default-model quant from q5k to q6k Some checks failed CI / Format (push) Successful in 35s Details build-prerelease / Resolve version stamps (push) Successful in 39s Details CI / Clippy (push) Successful in 2m22s Details build-prerelease / Build neuron-blackwell (push) Successful in 3m35s Details CI / Test (push) Successful in 5m8s Details build-prerelease / Build cortex binary (push) Successful in 4m34s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m16s Details build-prerelease / Build neuron-ampere (push) Successful in 5m12s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details q5k produced NaN logits on Qwen/Qwen3.6-27B under candle TP=2 (sampler fell over with "logits unhealthy nan: 248320/248320"). q6k is the quant that worked well in production under mistral.rs on the same hardware, so it's the right baseline for verifying the mempool-trim fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 12:36:18 +03:00
rob thijssen	d3f2d50749	feat(deploy): per-host neuron config + pre-warm headline models All checks were successful CI / Format (push) Successful in 39s Details build-prerelease / Resolve version stamps (push) Successful in 40s Details CI / Clippy (push) Successful in 2m17s Details CI / Test (push) Successful in 4m57s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 3m50s Details build-prerelease / Build cortex binary (push) Successful in 4m52s Details build-prerelease / Package cortex RPM (push) Successful in 1m22s Details build-prerelease / Build neuron-ampere (push) Successful in 5m13s Details build-prerelease / Build neuron-ada (push) Successful in 5m14s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m53s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m55s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s Details Adds asset/neuron/{beast,benjy,quadbrat}.toml — per-host neuron.toml files keyed by the first dot-component of the host. deploy.sh now rsyncs the matching file to /etc/neuron/neuron.toml on each neuron and stops+starts the service so default_models is re-read. Headline model per host (drives /v1/models output immediately after a clean deploy): beast Qwen/Qwen3.6-27B (q5k, tp=2, devices=[0,1]) benjy Qwen/Qwen3-8B (bf16, devices=[0]) quadbrat Qwen/Qwen3-1.7B (bf16, devices=[0]) Removes the need to follow deploy.sh with `validate-neuron.sh beast Qwen/Qwen3.6-27B q5k 2` to surface the 27B in the catalogue — the neuron loads it itself on activation. The neuron loop now mirrors the cortex flow (stop → install/upgrade → sync config → start) so config-only changes pick up on subsequent deploys; previously a no-package-change deploy would silently leave the host on the old default_models. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 14:05:54 +03:00
rob thijssen	03bed93fee	add asset/manifest.yml describing fleet hosts and neuron flavours All checks were successful CI / Format (push) Successful in 28s Details CI / Clippy (push) Successful in 2m54s Details CI / Test (push) Successful in 5m37s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Adds a single source of truth for which hosts run cortex vs neuron and which CUDA compute-capability flavour each neuron host needs: cortex : hanzalova.internal neurons : beast → helexa-neuron-blackwell (2x RTX 5090, sm_120) benjy → helexa-neuron-ada (RTX 4090, sm_89) quadbrat → helexa-neuron-ampere (RTX 3060, sm_86) script/deploy.sh (gitignored, local-only) is updated locally to read hosts and flavours from this manifest and dnf install the correct helexa-neuron-<flavour> package per host. Using 'dnf install --refresh --allowerasing' lets it swap out the previous bare helexa-neuron RPM or a different flavour without manual intervention; the spec Conflicts: clauses keep at most one flavour resident. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:37:14 +03:00

5 Commits