feat(cortex): catalogue source field + scheme-qualified /models/load

Phase 3 of plan-source-aware-loader-preflight. Adds an optional `source` field to `ModelProfile` and threads it through the router's cold-load path so a profile pointing at the helexa registry forwards `helexa:<id>` to neuron's `/models/load` instead of leaving neuron to substitute its `default_source` (typically `huggingface`). Without this, an operator who declares `source = "helexa"` in models.toml would still see neuron fetch from HuggingFace — the catalogue → ModelSpec translation in `profile_to_spec` was dropping the scheme on the floor. What lands: - `cortex-core::catalogue::ModelProfile.source: Option<String>`. None is the default and preserves pre-Phase-3 behaviour. - `cortex-gateway::router::qualified_model_id(profile)` — small pure helper, extracted from `profile_to_spec` so it can be unit-tested. Empty-string `source` is treated as None so operators who blank out a previously-set value don't trip a scheme-with-no-scheme failure mode in neuron. - `models.example.toml` documents the new field with a commented-out helexa-scheme example pointing back at neuron.example.toml's matching sources block. Tests: - 2 new unit tests in `cortex-core::catalogue`: source-absent round-trip and source-present round-trip through TOML. - 3 new unit tests in `cortex-gateway::router`: pass-through when None, prefix when Some, pass-through on empty-string source. - ModelProfile literal in catalogue's existing test updated to carry `source: None`. CI gate: cargo fmt --check, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace (24 test groups ok, zero failures). Completes Phase 3. With Phases 1+2+3 landed: - neuron parses `scheme:org/name`, routes per-source hf-hub Api with disambiguated cache. - preflight returns structured errors before any device allocation. - cortex catalogue declares per-model source jurisdiction and forwards it to neuron. The registry itself (registry.helexa.ai service, MinIO, nginx, mirror fabric) is the next moving piece — landing under a separate project per the design discussion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 14:53:58 +03:00
parent d4e1b05956
commit d0292ed377
3 changed files with 110 additions and 3 deletions
--- a/models.example.toml
+++ b/models.example.toml
@@ -7,7 +7,8 @@
 # returns and what the router can cold-load on demand.
 #
 # Field reference:
-#   id                 - HuggingFace model id, exact match.
+#   id                 - Repo id in the source registry (e.g. "Qwen/Qwen3.6-27B").
+#                        Exact match.
 #   harness            - which engine handles inference (currently "candle").
 #   quant              - GGUF quantisation tag for the file in the HF repo
 #                        (e.g. "Q4_K_M"). Omit/empty for the dense
@@ -20,6 +21,11 @@
 #   pinned_on          - optional whitelist of neuron names. Non-empty
 #                        narrows feasibility to just those neurons and
 #                        protects the model from LRU eviction there.
+#   source             - optional source scheme ("huggingface", "helexa",
+#                        operator mirror tag). When set, cortex forwards
+#                        the load to neuron as `scheme:id` so the daemon
+#                        fetches from the right registry. Omit to let
+#                        neuron substitute its own `default_source`.

 # Tensor-parallel target — needs a neuron with at least 2 large GPUs.
 # The example pins to a specific neuron name; adjust or remove the
@@ -49,6 +55,20 @@ vram_mb = 500
 min_devices = 1
 min_device_vram_mb = 4000

+# Helexa registry model — `source` pins this entry to the helexa
+# scheme so cortex forwards `helexa:Helexa/Qwen3.6-27B-Uncensored` to
+# neuron's /models/load. Requires the neuron config to declare a
+# matching [harness.candle.sources.helexa] entry pointing at the
+# helexa registry endpoint (see neuron.example.toml).
+#
+# [[models]]
+# id = "Helexa/Qwen3.6-27B-Uncensored"
+# harness = "candle"
+# source = "helexa"
+# vram_mb = 54000
+# min_devices = 2
+# min_device_vram_mb = 24000
+
 # -- Tier aliases ------------------------------------------------------------
 # Optional. Clients can request inference against an alias (e.g.
 # `model: "helexa/small"` in /v1/chat/completions) and cortex