First end-to-end run of the deploy workflow succeeded (gitea run #289),
so the operator-run rolling-deploy script and its YAML manifest are no
longer the source of truth — fleet topology lives in
.gitea/workflows/deploy.yml and per-host config in script/infra-setup.sh.
Per-host neuron config comments updated to point at the new sync path.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace operator-run script/deploy.sh with a CI-driven rolling deploy:
- .gitea/workflows/deploy.yml fires on build-prerelease success (and is
re-runnable via workflow_dispatch). Cortex upgrades first on
hanzalova.internal; the three neuron hosts upgrade in parallel under
fail-fast: false so one failing host doesn't sink the rest.
Concurrency-grouped to serialize overlapping deploys, never cancelling
in-flight runs (a half-applied dnf transaction is worse than a stale
deploy).
- asset/sudoers.d/{cortex,neuron}-host.conf are the canonical source for
the scoped privileges gitea_ci needs on each host kind, installed as
/etc/sudoers.d/helexa_gitea_ci. URLs and = signs are backslash-escaped
per sudoers reserved-character rules.
- script/infra-setup.sh idempotently provisions the gitea_ci user,
installs the runner pubkey, drops in the appropriate sudoers fragment
with visudo verification, and syncs cortex.toml / models.toml /
per-host asset/neuron/<short>.toml — config still ships from operator
workstations rather than CI because the first two are gitignored.
The CI-only secret is RSYNC_SSH_KEY (already configured for the repo);
the matching pubkey is ~/.ssh/id_gitea_ci.pub on the operator's box.
script/deploy.sh and asset/manifest.yml are left in place until the
first end-to-end deploy workflow run succeeds, then removed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
q5k produced NaN logits on Qwen/Qwen3.6-27B under candle TP=2 (sampler
fell over with "logits unhealthy nan: 248320/248320"). q6k is the
quant that worked well in production under mistral.rs on the same
hardware, so it's the right baseline for verifying the mempool-trim
fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds asset/neuron/{beast,benjy,quadbrat}.toml — per-host neuron.toml
files keyed by the first dot-component of the host. deploy.sh now
rsyncs the matching file to /etc/neuron/neuron.toml on each neuron and
stops+starts the service so default_models is re-read.
Headline model per host (drives /v1/models output immediately after a
clean deploy):
beast Qwen/Qwen3.6-27B (q5k, tp=2, devices=[0,1])
benjy Qwen/Qwen3-8B (bf16, devices=[0])
quadbrat Qwen/Qwen3-1.7B (bf16, devices=[0])
Removes the need to follow deploy.sh with `validate-neuron.sh beast
Qwen/Qwen3.6-27B q5k 2` to surface the 27B in the catalogue — the
neuron loads it itself on activation.
The neuron loop now mirrors the cortex flow (stop → install/upgrade →
sync config → start) so config-only changes pick up on subsequent
deploys; previously a no-package-change deploy would silently leave
the host on the old default_models.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a single source of truth for which hosts run cortex vs neuron
and which CUDA compute-capability flavour each neuron host needs:
cortex : hanzalova.internal
neurons :
beast → helexa-neuron-blackwell (2x RTX 5090, sm_120)
benjy → helexa-neuron-ada (RTX 4090, sm_89)
quadbrat → helexa-neuron-ampere (RTX 3060, sm_86)
script/deploy.sh (gitignored, local-only) is updated locally to read
hosts and flavours from this manifest and dnf install the correct
helexa-neuron-<flavour> package per host. Using
'dnf install --refresh --allowerasing' lets it swap out the previous
bare helexa-neuron RPM or a different flavour without manual
intervention; the spec Conflicts: clauses keep at most one flavour
resident.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>