feat(cortex): catalogue source field + scheme-qualified /models/load
Some checks failed
CI / CUDA type-check (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 40s
CI / Format (push) Successful in 40s
CI / Test (push) Failing after 1m3s
CI / Clippy (push) Successful in 2m43s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 6m13s
build-prerelease / Build neuron-ampere (push) Successful in 7m31s
build-prerelease / Build neuron-ada (push) Successful in 8m16s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m56s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m21s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s
build-prerelease / Build cortex binary (push) Successful in 4m5s
build-prerelease / Package cortex RPM (push) Successful in 1m30s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s
Some checks failed
CI / CUDA type-check (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 40s
CI / Format (push) Successful in 40s
CI / Test (push) Failing after 1m3s
CI / Clippy (push) Successful in 2m43s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 6m13s
build-prerelease / Build neuron-ampere (push) Successful in 7m31s
build-prerelease / Build neuron-ada (push) Successful in 8m16s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m56s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m21s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s
build-prerelease / Build cortex binary (push) Successful in 4m5s
build-prerelease / Package cortex RPM (push) Successful in 1m30s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s
Phase 3 of plan-source-aware-loader-preflight. Adds an optional `source` field to `ModelProfile` and threads it through the router's cold-load path so a profile pointing at the helexa registry forwards `helexa:<id>` to neuron's `/models/load` instead of leaving neuron to substitute its `default_source` (typically `huggingface`). Without this, an operator who declares `source = "helexa"` in models.toml would still see neuron fetch from HuggingFace — the catalogue → ModelSpec translation in `profile_to_spec` was dropping the scheme on the floor. What lands: - `cortex-core::catalogue::ModelProfile.source: Option<String>`. None is the default and preserves pre-Phase-3 behaviour. - `cortex-gateway::router::qualified_model_id(profile)` — small pure helper, extracted from `profile_to_spec` so it can be unit-tested. Empty-string `source` is treated as None so operators who blank out a previously-set value don't trip a scheme-with-no-scheme failure mode in neuron. - `models.example.toml` documents the new field with a commented-out helexa-scheme example pointing back at neuron.example.toml's matching sources block. Tests: - 2 new unit tests in `cortex-core::catalogue`: source-absent round-trip and source-present round-trip through TOML. - 3 new unit tests in `cortex-gateway::router`: pass-through when None, prefix when Some, pass-through on empty-string source. - ModelProfile literal in catalogue's existing test updated to carry `source: None`. CI gate: cargo fmt --check, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace (24 test groups ok, zero failures). Completes Phase 3. With Phases 1+2+3 landed: - neuron parses `scheme:org/name`, routes per-source hf-hub Api with disambiguated cache. - preflight returns structured errors before any device allocation. - cortex catalogue declares per-model source jurisdiction and forwards it to neuron. The registry itself (registry.helexa.ai service, MinIO, nginx, mirror fabric) is the next moving piece — landing under a separate project per the design discussion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -24,6 +24,17 @@ pub struct ModelProfile {
|
||||
/// Neurons where this model should never be evicted.
|
||||
#[serde(default)]
|
||||
pub pinned_on: Vec<String>,
|
||||
/// Source scheme this profile's weights come from. When set, the
|
||||
/// router prefixes `id` with `scheme:` before forwarding the load
|
||||
/// request to neuron, ensuring the daemon fetches from the right
|
||||
/// registry regardless of which entry happens to match `id`.
|
||||
///
|
||||
/// `None` lets neuron substitute its own `default_source` (typically
|
||||
/// `huggingface`). Set to `"helexa"` when the model is hosted in
|
||||
/// the helexa registry — operator-procurement-grade audit relies
|
||||
/// on this being explicit per model rather than implicit.
|
||||
#[serde(default)]
|
||||
pub source: Option<String>,
|
||||
}
|
||||
|
||||
fn default_min_devices() -> u32 {
|
||||
@@ -140,6 +151,7 @@ mod tests {
|
||||
min_devices: 2,
|
||||
min_device_vram_mb: Some(24_000),
|
||||
pinned_on: vec![],
|
||||
source: None,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -197,6 +209,29 @@ mod tests {
|
||||
assert_eq!(cat.resolve_alias("Qwen/Qwen3-8B"), "Qwen/Qwen3-8B");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn source_defaults_to_none_when_absent_from_toml() {
|
||||
let src = r#"
|
||||
[[models]]
|
||||
id = "Qwen/Qwen3-30B"
|
||||
harness = "candle"
|
||||
"#;
|
||||
let cat: ModelCatalogue = toml::from_str(src).expect("parse models table");
|
||||
assert!(cat.models[0].source.is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn source_round_trips_through_toml() {
|
||||
let src = r#"
|
||||
[[models]]
|
||||
id = "Helexa/Qwen3.6-27B-Uncensored"
|
||||
harness = "candle"
|
||||
source = "helexa"
|
||||
"#;
|
||||
let cat: ModelCatalogue = toml::from_str(src).expect("parse models table");
|
||||
assert_eq!(cat.models[0].source.as_deref(), Some("helexa"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn aliases_table_round_trips_through_toml() {
|
||||
let src = r#"
|
||||
|
||||
@@ -292,7 +292,7 @@ async fn profile_to_spec(
|
||||
};
|
||||
|
||||
ModelSpec {
|
||||
model_id: profile.id.clone(),
|
||||
model_id: qualified_model_id(profile),
|
||||
harness: profile.harness.clone(),
|
||||
quant: profile.quant.clone(),
|
||||
tensor_parallel,
|
||||
@@ -300,6 +300,22 @@ async fn profile_to_spec(
|
||||
}
|
||||
}
|
||||
|
||||
/// Prefix the catalogue id with the scheme when one is declared, so
|
||||
/// neuron resolves the load against the right registry. Without this,
|
||||
/// a profile pointing at the helexa registry would resolve via
|
||||
/// neuron's `default_source` (typically `huggingface`) and fetch
|
||||
/// bytes from the wrong place. Profiles that omit `source` continue
|
||||
/// to pass the bare id through, preserving the pre-Phase-3 contract.
|
||||
///
|
||||
/// Stays at module scope (not nested in `profile_to_spec`) so the unit
|
||||
/// tests can exercise it without spinning up CortexState topology.
|
||||
fn qualified_model_id(profile: &ModelProfile) -> String {
|
||||
match profile.source.as_deref() {
|
||||
Some(scheme) if !scheme.is_empty() => format!("{scheme}:{}", profile.id),
|
||||
_ => profile.id.clone(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Resolve neuron's `/models/{id}/endpoint` to its inference URL and
|
||||
/// build the final `RouteDecision`. Shared by all three priority
|
||||
/// branches above.
|
||||
@@ -375,7 +391,43 @@ fn rewrite_loopback_host(inference_url: &str, neuron_endpoint: &str) -> Option<S
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::rewrite_loopback_host;
|
||||
use super::{ModelProfile, qualified_model_id, rewrite_loopback_host};
|
||||
|
||||
fn bare_profile(id: &str, source: Option<&str>) -> ModelProfile {
|
||||
ModelProfile {
|
||||
id: id.into(),
|
||||
harness: "candle".into(),
|
||||
quant: None,
|
||||
vram_mb: None,
|
||||
min_devices: 1,
|
||||
min_device_vram_mb: None,
|
||||
pinned_on: vec![],
|
||||
source: source.map(String::from),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn qualified_id_passes_through_when_source_absent() {
|
||||
let p = bare_profile("Qwen/Qwen3-30B", None);
|
||||
assert_eq!(qualified_model_id(&p), "Qwen/Qwen3-30B");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn qualified_id_prefixes_when_source_set() {
|
||||
let p = bare_profile("Helexa/Qwen3.6-27B-Uncensored", Some("helexa"));
|
||||
assert_eq!(
|
||||
qualified_model_id(&p),
|
||||
"helexa:Helexa/Qwen3.6-27B-Uncensored"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn qualified_id_passes_through_when_source_is_empty_string() {
|
||||
// An empty scheme is treated as absent — neuron's default_source
|
||||
// substitution kicks in.
|
||||
let p = bare_profile("Qwen/Qwen3-30B", Some(""));
|
||||
assert_eq!(qualified_model_id(&p), "Qwen/Qwen3-30B");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rewrites_localhost_keeps_port_and_path() {
|
||||
|
||||
Reference in New Issue
Block a user