Reverts the previous commit's naming of specific helexa neuron hosts
in the shipped example catalogue (`models.example.toml`) — the example
is supposed to be a generic starting point that any operator copies
and adapts, not a record of one particular fleet's layout.
- `pinned_on` in the TP example uses the placeholder
`"your-multi-gpu-neuron"`. Other entries keep the model ids
(since those are HuggingFace-canonical, not fleet-specific).
- New `models.toml` at repo root holds the helexa-fleet catalogue
(beast / benjy / quadbrat). Added to `.gitignore` alongside
`cortex.toml` — both are operator-owned, gitignored, RPM-marked
`%config(noreplace)`, and synced by `deploy.sh`.
- `deploy.sh` now rsync's `models.toml` to `/etc/cortex/models.toml`
on the gateway host on the same lifecycle as `cortex.toml`. Skips
cleanly when no local file exists, so users without a catalogue
aren't surprised by silent overwrites.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-up cuda-only fixes surfaced by `cargo build --features cuda`
inside the cuda-13.0 runner container:
1. `half::{bf16, f16}` was an undeclared dep. Added `half = "2.5"`
(matching candle-core's pinned major) under the cuda feature flag.
2. `dev.alloc::<T>(n)` already returns `candle_core::Result` (it calls
`.w()` internally on the cudarc error). Calling `.w()?` on top of
that needs `From<candle_core::Error> for CudaError`, which doesn't
exist — collapse to `?`. Removed the now-unused
`cuda_backend::WrapErr` import.
Verified by `cargo build -p neuron --features cuda` and
`cargo clippy -p neuron --all-targets --features cuda -- -D warnings`
inside `git.lair.cafe/gongfoo/runner-cuda-13.0` with the local
glibc/CUDA-13.0 math_functions.h noexcept patch. CPU clippy/tests stay
green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>