Adds asset/neuron/{beast,benjy,quadbrat}.toml — per-host neuron.toml
files keyed by the first dot-component of the host. deploy.sh now
rsyncs the matching file to /etc/neuron/neuron.toml on each neuron and
stops+starts the service so default_models is re-read.
Headline model per host (drives /v1/models output immediately after a
clean deploy):
beast Qwen/Qwen3.6-27B (q5k, tp=2, devices=[0,1])
benjy Qwen/Qwen3-8B (bf16, devices=[0])
quadbrat Qwen/Qwen3-1.7B (bf16, devices=[0])
Removes the need to follow deploy.sh with `validate-neuron.sh beast
Qwen/Qwen3.6-27B q5k 2` to surface the 27B in the catalogue — the
neuron loads it itself on activation.
The neuron loop now mirrors the cortex flow (stop → install/upgrade →
sync config → start) so config-only changes pick up on subsequent
deploys; previously a no-package-change deploy would silently leave
the host on the old default_models.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a single source of truth for which hosts run cortex vs neuron
and which CUDA compute-capability flavour each neuron host needs:
cortex : hanzalova.internal
neurons :
beast → helexa-neuron-blackwell (2x RTX 5090, sm_120)
benjy → helexa-neuron-ada (RTX 4090, sm_89)
quadbrat → helexa-neuron-ampere (RTX 3060, sm_86)
script/deploy.sh (gitignored, local-only) is updated locally to read
hosts and flavours from this manifest and dnf install the correct
helexa-neuron-<flavour> package per host. Using
'dnf install --refresh --allowerasing' lets it swap out the previous
bare helexa-neuron RPM or a different flavour without manual
intervention; the spec Conflicts: clauses keep at most one flavour
resident.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>