chore: keep models.example.toml generic; deploy.sh sync's local models.toml
Some checks failed
build-prerelease / Resolve version stamps (push) Successful in 34s
CI / Format (push) Successful in 40s
CI / Clippy (push) Successful in 2m22s
CI / Test (push) Successful in 4m31s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build cortex binary (push) Successful in 4m28s
build-prerelease / Build neuron-ampere (push) Has been cancelled
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package cortex RPM (push) Has started running
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-blackwell (push) Has been cancelled
Some checks failed
build-prerelease / Resolve version stamps (push) Successful in 34s
CI / Format (push) Successful in 40s
CI / Clippy (push) Successful in 2m22s
CI / Test (push) Successful in 4m31s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build cortex binary (push) Successful in 4m28s
build-prerelease / Build neuron-ampere (push) Has been cancelled
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package cortex RPM (push) Has started running
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-blackwell (push) Has been cancelled
Reverts the previous commit's naming of specific helexa neuron hosts in the shipped example catalogue (`models.example.toml`) — the example is supposed to be a generic starting point that any operator copies and adapts, not a record of one particular fleet's layout. - `pinned_on` in the TP example uses the placeholder `"your-multi-gpu-neuron"`. Other entries keep the model ids (since those are HuggingFace-canonical, not fleet-specific). - New `models.toml` at repo root holds the helexa-fleet catalogue (beast / benjy / quadbrat). Added to `.gitignore` alongside `cortex.toml` — both are operator-owned, gitignored, RPM-marked `%config(noreplace)`, and synced by `deploy.sh`. - `deploy.sh` now rsync's `models.toml` to `/etc/cortex/models.toml` on the gateway host on the same lifecycle as `cortex.toml`. Skips cleanly when no local file exists, so users without a catalogue aren't surprised by silent overwrites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -20,20 +20,19 @@
|
||||
# pinned_on - optional whitelist of neuron names. Non-empty
|
||||
# narrows feasibility to just those neurons and
|
||||
# protects the model from LRU eviction there.
|
||||
#
|
||||
# The examples below match the canonical helexa fleet
|
||||
# (beast = 2x RTX 5090, benjy = RTX 4090, quadbrat = RTX 3060).
|
||||
|
||||
# Tensor-parallel target — only beast has two big GPUs.
|
||||
# Tensor-parallel target — needs a neuron with at least 2 large GPUs.
|
||||
# The example pins to a specific neuron name; adjust or remove the
|
||||
# pinned_on entry for your own fleet.
|
||||
[[models]]
|
||||
id = "Qwen/Qwen3.6-27B"
|
||||
harness = "candle"
|
||||
vram_mb = 54000
|
||||
min_devices = 2
|
||||
min_device_vram_mb = 24000
|
||||
pinned_on = ["beast"]
|
||||
pinned_on = ["your-multi-gpu-neuron"]
|
||||
|
||||
# Mid-size dense model — fits on benjy or beast.
|
||||
# Mid-size dense model — fits on any single GPU with ≥16 GB VRAM.
|
||||
[[models]]
|
||||
id = "Qwen/Qwen3-8B"
|
||||
harness = "candle"
|
||||
@@ -41,7 +40,7 @@ vram_mb = 18000
|
||||
min_devices = 1
|
||||
min_device_vram_mb = 16000
|
||||
|
||||
# Small GGUF quantised — runs on the smallest neuron (quadbrat).
|
||||
# Small GGUF quantised — runs on any small GPU.
|
||||
[[models]]
|
||||
id = "unsloth/Qwen3-0.6B-GGUF"
|
||||
harness = "candle"
|
||||
|
||||
Reference in New Issue
Block a user