feat(cortex): catalogue source field + scheme-qualified /models/load
Some checks failed
CI / CUDA type-check (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 40s
CI / Format (push) Successful in 40s
CI / Test (push) Failing after 1m3s
CI / Clippy (push) Successful in 2m43s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 6m13s
build-prerelease / Build neuron-ampere (push) Successful in 7m31s
build-prerelease / Build neuron-ada (push) Successful in 8m16s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m56s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m21s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s
build-prerelease / Build cortex binary (push) Successful in 4m5s
build-prerelease / Package cortex RPM (push) Successful in 1m30s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s

Phase 3 of plan-source-aware-loader-preflight. Adds an optional
`source` field to `ModelProfile` and threads it through the
router's cold-load path so a profile pointing at the helexa
registry forwards `helexa:<id>` to neuron's `/models/load`
instead of leaving neuron to substitute its `default_source`
(typically `huggingface`).

Without this, an operator who declares
`source = "helexa"` in models.toml would still see neuron fetch
from HuggingFace — the catalogue → ModelSpec translation in
`profile_to_spec` was dropping the scheme on the floor.

What lands:

- `cortex-core::catalogue::ModelProfile.source: Option<String>`.
  None is the default and preserves pre-Phase-3 behaviour.
- `cortex-gateway::router::qualified_model_id(profile)` —
  small pure helper, extracted from `profile_to_spec` so it can
  be unit-tested. Empty-string `source` is treated as None so
  operators who blank out a previously-set value don't trip a
  scheme-with-no-scheme failure mode in neuron.
- `models.example.toml` documents the new field with a
  commented-out helexa-scheme example pointing back at
  neuron.example.toml's matching sources block.

Tests:

- 2 new unit tests in `cortex-core::catalogue`: source-absent
  round-trip and source-present round-trip through TOML.
- 3 new unit tests in `cortex-gateway::router`: pass-through
  when None, prefix when Some, pass-through on empty-string
  source.
- ModelProfile literal in catalogue's existing test updated to
  carry `source: None`.

CI gate: cargo fmt --check, cargo clippy --workspace
--all-targets -- -D warnings, cargo test --workspace
(24 test groups ok, zero failures).

Completes Phase 3. With Phases 1+2+3 landed:
- neuron parses `scheme:org/name`, routes per-source hf-hub
  Api with disambiguated cache.
- preflight returns structured errors before any device
  allocation.
- cortex catalogue declares per-model source jurisdiction
  and forwards it to neuron.

The registry itself (registry.helexa.ai service, MinIO,
nginx, mirror fabric) is the next moving piece — landing
under a separate project per the design discussion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-06-01 14:53:58 +03:00
parent d4e1b05956
commit d0292ed377
3 changed files with 110 additions and 3 deletions

View File

@@ -7,7 +7,8 @@
# returns and what the router can cold-load on demand.
#
# Field reference:
# id - HuggingFace model id, exact match.
# id - Repo id in the source registry (e.g. "Qwen/Qwen3.6-27B").
# Exact match.
# harness - which engine handles inference (currently "candle").
# quant - GGUF quantisation tag for the file in the HF repo
# (e.g. "Q4_K_M"). Omit/empty for the dense
@@ -20,6 +21,11 @@
# pinned_on - optional whitelist of neuron names. Non-empty
# narrows feasibility to just those neurons and
# protects the model from LRU eviction there.
# source - optional source scheme ("huggingface", "helexa",
# operator mirror tag). When set, cortex forwards
# the load to neuron as `scheme:id` so the daemon
# fetches from the right registry. Omit to let
# neuron substitute its own `default_source`.
# Tensor-parallel target — needs a neuron with at least 2 large GPUs.
# The example pins to a specific neuron name; adjust or remove the
@@ -49,6 +55,20 @@ vram_mb = 500
min_devices = 1
min_device_vram_mb = 4000
# Helexa registry model — `source` pins this entry to the helexa
# scheme so cortex forwards `helexa:Helexa/Qwen3.6-27B-Uncensored` to
# neuron's /models/load. Requires the neuron config to declare a
# matching [harness.candle.sources.helexa] entry pointing at the
# helexa registry endpoint (see neuron.example.toml).
#
# [[models]]
# id = "Helexa/Qwen3.6-27B-Uncensored"
# harness = "candle"
# source = "helexa"
# vram_mb = 54000
# min_devices = 2
# min_device_vram_mb = 24000
# -- Tier aliases ------------------------------------------------------------
# Optional. Clients can request inference against an alias (e.g.
# `model: "helexa/small"` in /v1/chat/completions) and cortex