feat(neuron,cortex-core): source-aware loader (scheme:org/name)
All checks were successful
CI / CUDA type-check (push) Successful in 46s
CI / Format (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 42s
CI / Clippy (push) Successful in 2m40s
build-prerelease / Build cortex binary (push) Successful in 4m23s
CI / Test (push) Successful in 5m28s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 5m39s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Build neuron-ampere (push) Successful in 7m53s
build-prerelease / Build neuron-ada (push) Successful in 5m18s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s
All checks were successful
CI / CUDA type-check (push) Successful in 46s
CI / Format (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 42s
CI / Clippy (push) Successful in 2m40s
build-prerelease / Build cortex binary (push) Successful in 4m23s
CI / Test (push) Successful in 5m28s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 5m39s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Build neuron-ampere (push) Successful in 7m53s
build-prerelease / Build neuron-ada (push) Successful in 5m18s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s
Phase 1 of plan-source-aware-loader-preflight. Makes neuron's
loader treat `huggingface:org/name` and `helexa:org/name` as
first-class distinct sources with per-source endpoint + cache,
while staying backwards-compatible with bare `org/name` ids.
Zero behavior change for existing operator configs.
Motivation: helexa is adding an EU-hosted registry
(`registry.helexa.ai`) alongside HF. Both speak HF-compatible
wire format, but the bytes, jurisdiction, trust root, and cache
namespace are distinct. The loader needs to disambiguate which
registry serves a given model id, and to keep their caches from
colliding on disk when both happen to host the same `org/name`.
What lands:
- `cortex-core::source` — new module. `ModelSourceId { scheme,
org, name }` with `FromStr` accepting both `scheme:org/name`
and bare `org/name`. `Display` round-trips. `repo_path()`
emits the `org/name` half for the hf-hub `Api::model(...)`
call regardless of which scheme/endpoint we're hitting.
Rejects malformed input with typed `ParseError` variants
(empty scheme, missing slash, scheme with `/`, name with
`:`, etc.).
- `neuron::config::CandleHarnessConfig` gains
`default_source: Option<String>` and
`sources: HashMap<String, SourceConfig>`. `SourceConfig`
mirrors what `hf_hub::ApiBuilder` consumes: endpoint URL,
optional `auth_env` (env var name read at startup so secrets
stay out of TOML), and optional cache_dir. Defaults
synthesise a `huggingface` entry pointing at
`https://huggingface.co` with the legacy `hf_cache` field as
its cache_dir — so existing configs that only set `hf_cache`
keep working unchanged.
- `CandleHarness::new(bind_url, &CandleHarnessConfig)` replaces
`CandleHarness::new(bind_url, hf_cache)`. Resolves every
configured source's auth env var and cache dir up front so
`hf_api_for(scheme)` is a pure HashMap lookup on the hot
load path. Only the `huggingface` scheme gets the legacy
`HF_HUB_CACHE`/`HF_HOME` env-var fallback chain; other
schemes resolve to whatever the operator typed.
- `hf_api()` -> `hf_api_for(scheme)`. Builds an
`hf_hub::Api` with the source's endpoint, cache_dir, and
auth token. Errors with a useful message naming the
configured schemes when an unknown scheme is requested.
- `CandleHarness::load_model` parses `spec.model_id` into a
`ModelSourceId`, substitutes `default_source` for bare ids,
and threads the parsed source through `preflight`,
`resolve_files`, `resolve_dense_files`, `load_arch_gguf`,
`load_arch_dense`, and `load_tp`. The hf-hub `Api::model()`
call now uses `source_id.repo_path()` so registry calls hit
the right URL shape regardless of scheme.
- `preflight()` signature gains a `&ModelSourceId` parameter
(it's the canonical id for log lines and error display);
`RepoFetchFailed.model_id` etc. now carry the
scheme-qualified form so operator-visible errors echo
exactly what was configured.
- `neuron.example.toml` documents the new
`[harness.candle.sources.*]` table with commented-out
examples for `huggingface` (explicit override) and `helexa`.
Tests:
- 13 new unit tests in `cortex-core::source` covering parse /
display round-trip, default-scheme substitution semantics,
and every `ParseError` variant.
- 6 new unit tests in `neuron::config` covering the
`effective_sources` synth (legacy `hf_cache` carry-through,
explicit override preservation, helexa-alongside-huggingface)
and `effective_default_source` fallback.
- 2 new unit tests in `harness::candle::tests` covering
multi-scheme `hf_api_for` routing, including the
"unknown scheme" error path naming configured schemes.
- Preflight integration tests updated to construct
`ModelSourceId` and assert against the scheme-qualified
error form.
CI gate: cargo fmt --check, cargo clippy --workspace
--all-targets -- -D warnings, cargo test --workspace (all 24
test groups ok, zero failures).
Out of scope (Phase 3):
- Cortex catalogue `source` field — independent of Phase 1+2,
ships when the registry comes online.
- `helexa` source endpoint itself — separate project; this
PR adds the client-side rails only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -12,6 +12,7 @@ use axum::http::StatusCode;
|
||||
use axum::response::{IntoResponse, Json};
|
||||
use axum::routing::get;
|
||||
use cortex_core::harness::ModelSpec;
|
||||
use cortex_core::source::ModelSourceId;
|
||||
use neuron::harness::preflight::{PreflightError, SourceFormat, preflight};
|
||||
use serde_json::{Value, json};
|
||||
use std::sync::Arc;
|
||||
@@ -89,6 +90,15 @@ fn spec(model_id: &str, tp: Option<u32>, quant: Option<&str>) -> ModelSpec {
|
||||
}
|
||||
}
|
||||
|
||||
/// Build a `ModelSourceId` from a bare `org/name` test input,
|
||||
/// substituting the default scheme so the mock route key matches.
|
||||
fn sid(model_id: &str) -> ModelSourceId {
|
||||
model_id
|
||||
.parse::<ModelSourceId>()
|
||||
.expect("test model_id parses")
|
||||
.with_default_scheme("huggingface")
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn preflight_gguf_tp_rejected_over_http() {
|
||||
let cache = tempfile::tempdir().expect("tempdir");
|
||||
@@ -107,7 +117,7 @@ async fn preflight_gguf_tp_rejected_over_http() {
|
||||
|
||||
let api = build_api(&endpoint, cache.path());
|
||||
let s = spec("HauhauCS/Qwen3.6", Some(2), Some("q6k"));
|
||||
let err = preflight(&api, &s).await.unwrap_err();
|
||||
let err = preflight(&api, &sid(&s.model_id), &s).await.unwrap_err();
|
||||
match err {
|
||||
PreflightError::TpRequiresSafetensors {
|
||||
model_id,
|
||||
@@ -115,7 +125,9 @@ async fn preflight_gguf_tp_rejected_over_http() {
|
||||
gguf_quants,
|
||||
..
|
||||
} => {
|
||||
assert_eq!(model_id, "HauhauCS/Qwen3.6");
|
||||
// Scheme prefix surfaces in error display now that
|
||||
// preflight is source-aware.
|
||||
assert_eq!(model_id, "huggingface:HauhauCS/Qwen3.6");
|
||||
assert_eq!(tp_size, 2);
|
||||
assert_eq!(gguf_quants.len(), 3);
|
||||
}
|
||||
@@ -140,7 +152,7 @@ async fn preflight_gguf_quant_suggestion_over_http() {
|
||||
|
||||
let api = build_api(&endpoint, cache.path());
|
||||
let s = spec("HauhauCS/Qwen3.6", Some(1), Some("q6k"));
|
||||
let err = preflight(&api, &s).await.unwrap_err();
|
||||
let err = preflight(&api, &sid(&s.model_id), &s).await.unwrap_err();
|
||||
match err {
|
||||
PreflightError::QuantNotFound {
|
||||
requested,
|
||||
@@ -176,7 +188,9 @@ async fn preflight_dense_safetensors_tp_ok() {
|
||||
|
||||
let api = build_api(&endpoint, cache.path());
|
||||
let s = spec("Qwen/Q3-30B", Some(2), Some("q5k"));
|
||||
let plan = preflight(&api, &s).await.expect("dense+tp should succeed");
|
||||
let plan = preflight(&api, &sid(&s.model_id), &s)
|
||||
.await
|
||||
.expect("dense+tp should succeed");
|
||||
assert_eq!(plan.tp_size, 2);
|
||||
assert!(plan.picked_quant_file.is_none());
|
||||
assert!(matches!(
|
||||
@@ -197,7 +211,7 @@ async fn preflight_gguf_single_gpu_good_quant() {
|
||||
|
||||
let api = build_api(&endpoint, cache.path());
|
||||
let s = spec("HauhauCS/Qwen3.6", Some(1), Some("q6_k_p"));
|
||||
let plan = preflight(&api, &s)
|
||||
let plan = preflight(&api, &sid(&s.model_id), &s)
|
||||
.await
|
||||
.expect("good quant should succeed");
|
||||
assert_eq!(plan.tp_size, 1);
|
||||
@@ -219,7 +233,7 @@ async fn preflight_repo_fetch_failed_on_404() {
|
||||
|
||||
let api = build_api(&endpoint, cache.path());
|
||||
let s = spec("DoesNot/Exist", Some(1), None);
|
||||
let err = preflight(&api, &s).await.unwrap_err();
|
||||
let err = preflight(&api, &sid(&s.model_id), &s).await.unwrap_err();
|
||||
assert!(
|
||||
matches!(err, PreflightError::RepoFetchFailed { .. }),
|
||||
"expected RepoFetchFailed, got {err:?}"
|
||||
@@ -238,7 +252,7 @@ async fn preflight_empty_repo_rejected() {
|
||||
|
||||
let api = build_api(&endpoint, cache.path());
|
||||
let s = spec("Empty/Repo", Some(1), None);
|
||||
let err = preflight(&api, &s).await.unwrap_err();
|
||||
let err = preflight(&api, &sid(&s.model_id), &s).await.unwrap_err();
|
||||
assert!(
|
||||
matches!(err, PreflightError::EmptyRepo { .. }),
|
||||
"expected EmptyRepo, got {err:?}"
|
||||
@@ -264,6 +278,8 @@ async fn preflight_mixed_repo_prefers_safetensors() {
|
||||
// TP=2 + quant should succeed via the dense path even though a
|
||||
// GGUF is present — the dense path handles ISQ.
|
||||
let s = spec("Mixed/Repo", Some(2), Some("q5k"));
|
||||
let plan = preflight(&api, &s).await.expect("mixed should succeed");
|
||||
let plan = preflight(&api, &sid(&s.model_id), &s)
|
||||
.await
|
||||
.expect("mixed should succeed");
|
||||
assert!(matches!(plan.format, SourceFormat::Mixed { .. }));
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user