Files
cortex/crates/neuron/tests/preflight.rs
rob thijssen d4e1b05956
All checks were successful
CI / CUDA type-check (push) Successful in 46s
CI / Format (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 42s
CI / Clippy (push) Successful in 2m40s
build-prerelease / Build cortex binary (push) Successful in 4m23s
CI / Test (push) Successful in 5m28s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 5m39s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Build neuron-ampere (push) Successful in 7m53s
build-prerelease / Build neuron-ada (push) Successful in 5m18s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s
feat(neuron,cortex-core): source-aware loader (scheme:org/name)
Phase 1 of plan-source-aware-loader-preflight. Makes neuron's
loader treat `huggingface:org/name` and `helexa:org/name` as
first-class distinct sources with per-source endpoint + cache,
while staying backwards-compatible with bare `org/name` ids.
Zero behavior change for existing operator configs.

Motivation: helexa is adding an EU-hosted registry
(`registry.helexa.ai`) alongside HF. Both speak HF-compatible
wire format, but the bytes, jurisdiction, trust root, and cache
namespace are distinct. The loader needs to disambiguate which
registry serves a given model id, and to keep their caches from
colliding on disk when both happen to host the same `org/name`.

What lands:

- `cortex-core::source` — new module. `ModelSourceId { scheme,
  org, name }` with `FromStr` accepting both `scheme:org/name`
  and bare `org/name`. `Display` round-trips. `repo_path()`
  emits the `org/name` half for the hf-hub `Api::model(...)`
  call regardless of which scheme/endpoint we're hitting.
  Rejects malformed input with typed `ParseError` variants
  (empty scheme, missing slash, scheme with `/`, name with
  `:`, etc.).

- `neuron::config::CandleHarnessConfig` gains
  `default_source: Option<String>` and
  `sources: HashMap<String, SourceConfig>`. `SourceConfig`
  mirrors what `hf_hub::ApiBuilder` consumes: endpoint URL,
  optional `auth_env` (env var name read at startup so secrets
  stay out of TOML), and optional cache_dir. Defaults
  synthesise a `huggingface` entry pointing at
  `https://huggingface.co` with the legacy `hf_cache` field as
  its cache_dir — so existing configs that only set `hf_cache`
  keep working unchanged.

- `CandleHarness::new(bind_url, &CandleHarnessConfig)` replaces
  `CandleHarness::new(bind_url, hf_cache)`. Resolves every
  configured source's auth env var and cache dir up front so
  `hf_api_for(scheme)` is a pure HashMap lookup on the hot
  load path. Only the `huggingface` scheme gets the legacy
  `HF_HUB_CACHE`/`HF_HOME` env-var fallback chain; other
  schemes resolve to whatever the operator typed.

- `hf_api()` -> `hf_api_for(scheme)`. Builds an
  `hf_hub::Api` with the source's endpoint, cache_dir, and
  auth token. Errors with a useful message naming the
  configured schemes when an unknown scheme is requested.

- `CandleHarness::load_model` parses `spec.model_id` into a
  `ModelSourceId`, substitutes `default_source` for bare ids,
  and threads the parsed source through `preflight`,
  `resolve_files`, `resolve_dense_files`, `load_arch_gguf`,
  `load_arch_dense`, and `load_tp`. The hf-hub `Api::model()`
  call now uses `source_id.repo_path()` so registry calls hit
  the right URL shape regardless of scheme.

- `preflight()` signature gains a `&ModelSourceId` parameter
  (it's the canonical id for log lines and error display);
  `RepoFetchFailed.model_id` etc. now carry the
  scheme-qualified form so operator-visible errors echo
  exactly what was configured.

- `neuron.example.toml` documents the new
  `[harness.candle.sources.*]` table with commented-out
  examples for `huggingface` (explicit override) and `helexa`.

Tests:

- 13 new unit tests in `cortex-core::source` covering parse /
  display round-trip, default-scheme substitution semantics,
  and every `ParseError` variant.
- 6 new unit tests in `neuron::config` covering the
  `effective_sources` synth (legacy `hf_cache` carry-through,
  explicit override preservation, helexa-alongside-huggingface)
  and `effective_default_source` fallback.
- 2 new unit tests in `harness::candle::tests` covering
  multi-scheme `hf_api_for` routing, including the
  "unknown scheme" error path naming configured schemes.
- Preflight integration tests updated to construct
  `ModelSourceId` and assert against the scheme-qualified
  error form.

CI gate: cargo fmt --check, cargo clippy --workspace
--all-targets -- -D warnings, cargo test --workspace (all 24
test groups ok, zero failures).

Out of scope (Phase 3):
- Cortex catalogue `source` field — independent of Phase 1+2,
  ships when the registry comes online.
- `helexa` source endpoint itself — separate project; this
  PR adds the client-side rails only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 13:42:11 +03:00

286 lines
9.8 KiB
Rust

//! End-to-end preflight tests against a mock HF-compatible server.
//!
//! Unit tests in `harness/preflight.rs` exercise the classifier and
//! feasibility table against synthetic file lists. These tests close
//! the loop: spawn an axum server that returns a `RepoInfo`-shaped
//! JSON payload at `/api/models/{org}/{name}`, point `hf_hub::Api` at
//! it, and assert `preflight()` returns the expected outcome.
use axum::Router;
use axum::extract::Path;
use axum::http::StatusCode;
use axum::response::{IntoResponse, Json};
use axum::routing::get;
use cortex_core::harness::ModelSpec;
use cortex_core::source::ModelSourceId;
use neuron::harness::preflight::{PreflightError, SourceFormat, preflight};
use serde_json::{Value, json};
use std::sync::Arc;
use std::sync::Mutex;
/// Per-test mock state: a map from `{org}/{name}` to the JSON body the
/// mock server returns at the corresponding `/api/models/{org}/{name}`
/// endpoint. `None` means "respond 404".
type MockBodies = Arc<Mutex<std::collections::HashMap<String, Option<Value>>>>;
async fn spawn_mock(bodies: MockBodies) -> String {
// hf-hub 0.4 calls /api/models/{org}/{name}/revision/main for
// `repo.info()`. We route both shapes so the test stays robust
// to a future hf-hub upgrade that drops the `/revision/main`
// suffix.
let app = Router::new()
.route("/api/models/{org}/{name}", get(model_info))
.route(
"/api/models/{org}/{name}/revision/{rev}",
get(model_info_rev),
)
.with_state(bodies);
let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();
tokio::spawn(async move {
axum::serve(listener, app).await.unwrap();
});
format!("http://{addr}")
}
async fn model_info(
Path((org, name)): Path<(String, String)>,
axum::extract::State(bodies): axum::extract::State<MockBodies>,
) -> impl IntoResponse {
respond(&format!("{org}/{name}"), &bodies)
}
async fn model_info_rev(
Path((org, name, _rev)): Path<(String, String, String)>,
axum::extract::State(bodies): axum::extract::State<MockBodies>,
) -> impl IntoResponse {
respond(&format!("{org}/{name}"), &bodies)
}
fn respond(key: &str, bodies: &MockBodies) -> axum::response::Response {
let entry = bodies.lock().unwrap().get(key).cloned();
match entry {
Some(Some(body)) => Json(body).into_response(),
Some(None) | None => (StatusCode::NOT_FOUND, "not found").into_response(),
}
}
fn build_api(endpoint: &str, cache_dir: &std::path::Path) -> hf_hub::api::tokio::Api {
hf_hub::api::tokio::ApiBuilder::new()
.with_endpoint(endpoint.to_string())
.with_cache_dir(cache_dir.to_path_buf())
.build()
.expect("build hf-hub Api")
}
fn siblings(filenames: &[&str]) -> Value {
json!({
"sha": "0000000000000000000000000000000000000000",
"siblings": filenames.iter().map(|f| json!({ "rfilename": f })).collect::<Vec<_>>(),
})
}
fn spec(model_id: &str, tp: Option<u32>, quant: Option<&str>) -> ModelSpec {
ModelSpec {
model_id: model_id.into(),
harness: "candle".into(),
quant: quant.map(String::from),
tensor_parallel: tp,
devices: None,
}
}
/// Build a `ModelSourceId` from a bare `org/name` test input,
/// substituting the default scheme so the mock route key matches.
fn sid(model_id: &str) -> ModelSourceId {
model_id
.parse::<ModelSourceId>()
.expect("test model_id parses")
.with_default_scheme("huggingface")
}
#[tokio::test]
async fn preflight_gguf_tp_rejected_over_http() {
let cache = tempfile::tempdir().expect("tempdir");
let bodies: MockBodies = Arc::new(Mutex::new(Default::default()));
bodies.lock().unwrap().insert(
"HauhauCS/Qwen3.6".to_string(),
Some(siblings(&[
"README.md",
".gitattributes",
"Qwen3.6-Q4_K_P.gguf",
"Qwen3.6-Q6_K_P.gguf",
"Qwen3.6-Q8_K_P.gguf",
])),
);
let endpoint = spawn_mock(bodies).await;
let api = build_api(&endpoint, cache.path());
let s = spec("HauhauCS/Qwen3.6", Some(2), Some("q6k"));
let err = preflight(&api, &sid(&s.model_id), &s).await.unwrap_err();
match err {
PreflightError::TpRequiresSafetensors {
model_id,
tp_size,
gguf_quants,
..
} => {
// Scheme prefix surfaces in error display now that
// preflight is source-aware.
assert_eq!(model_id, "huggingface:HauhauCS/Qwen3.6");
assert_eq!(tp_size, 2);
assert_eq!(gguf_quants.len(), 3);
}
other => panic!("expected TpRequiresSafetensors, got {other:?}"),
}
}
#[tokio::test]
async fn preflight_gguf_quant_suggestion_over_http() {
let cache = tempfile::tempdir().expect("tempdir");
let bodies: MockBodies = Arc::new(Mutex::new(Default::default()));
bodies.lock().unwrap().insert(
"HauhauCS/Qwen3.6".to_string(),
Some(siblings(&[
"Qwen3.6-Q4_K_P.gguf",
"Qwen3.6-Q5_K_P.gguf",
"Qwen3.6-Q6_K_P.gguf",
"Qwen3.6-Q8_K_P.gguf",
])),
);
let endpoint = spawn_mock(bodies).await;
let api = build_api(&endpoint, cache.path());
let s = spec("HauhauCS/Qwen3.6", Some(1), Some("q6k"));
let err = preflight(&api, &sid(&s.model_id), &s).await.unwrap_err();
match err {
PreflightError::QuantNotFound {
requested,
nearest,
available,
..
} => {
assert_eq!(requested, "q6k");
assert_eq!(nearest.as_deref(), Some("q6_k_p"));
assert_eq!(available.len(), 4);
}
other => panic!("expected QuantNotFound, got {other:?}"),
}
}
#[tokio::test]
async fn preflight_dense_safetensors_tp_ok() {
let cache = tempfile::tempdir().expect("tempdir");
let bodies: MockBodies = Arc::new(Mutex::new(Default::default()));
bodies.lock().unwrap().insert(
"Qwen/Q3-30B".to_string(),
Some(siblings(&[
"config.json",
"tokenizer.json",
"tokenizer_config.json",
"model.safetensors.index.json",
"model-00001-of-00006.safetensors",
"model-00002-of-00006.safetensors",
"model-00003-of-00006.safetensors",
])),
);
let endpoint = spawn_mock(bodies).await;
let api = build_api(&endpoint, cache.path());
let s = spec("Qwen/Q3-30B", Some(2), Some("q5k"));
let plan = preflight(&api, &sid(&s.model_id), &s)
.await
.expect("dense+tp should succeed");
assert_eq!(plan.tp_size, 2);
assert!(plan.picked_quant_file.is_none());
assert!(matches!(
plan.format,
SourceFormat::DenseSafetensors { sharded: true }
));
}
#[tokio::test]
async fn preflight_gguf_single_gpu_good_quant() {
let cache = tempfile::tempdir().expect("tempdir");
let bodies: MockBodies = Arc::new(Mutex::new(Default::default()));
bodies.lock().unwrap().insert(
"HauhauCS/Qwen3.6".to_string(),
Some(siblings(&["Qwen3.6-Q4_K_P.gguf", "Qwen3.6-Q6_K_P.gguf"])),
);
let endpoint = spawn_mock(bodies).await;
let api = build_api(&endpoint, cache.path());
let s = spec("HauhauCS/Qwen3.6", Some(1), Some("q6_k_p"));
let plan = preflight(&api, &sid(&s.model_id), &s)
.await
.expect("good quant should succeed");
assert_eq!(plan.tp_size, 1);
assert_eq!(
plan.picked_quant_file.as_deref(),
Some("Qwen3.6-Q6_K_P.gguf")
);
}
#[tokio::test]
async fn preflight_repo_fetch_failed_on_404() {
// Mock server has no entry for this id → 404, exercising the
// RepoFetchFailed path (the same shape today's HauhauCS scenario
// would have produced if we'd added preflight before the cache
// download was attempted).
let cache = tempfile::tempdir().expect("tempdir");
let bodies: MockBodies = Arc::new(Mutex::new(Default::default()));
let endpoint = spawn_mock(bodies).await;
let api = build_api(&endpoint, cache.path());
let s = spec("DoesNot/Exist", Some(1), None);
let err = preflight(&api, &sid(&s.model_id), &s).await.unwrap_err();
assert!(
matches!(err, PreflightError::RepoFetchFailed { .. }),
"expected RepoFetchFailed, got {err:?}"
);
}
#[tokio::test]
async fn preflight_empty_repo_rejected() {
let cache = tempfile::tempdir().expect("tempdir");
let bodies: MockBodies = Arc::new(Mutex::new(Default::default()));
bodies.lock().unwrap().insert(
"Empty/Repo".to_string(),
Some(siblings(&["README.md", "tokenizer.json"])),
);
let endpoint = spawn_mock(bodies).await;
let api = build_api(&endpoint, cache.path());
let s = spec("Empty/Repo", Some(1), None);
let err = preflight(&api, &sid(&s.model_id), &s).await.unwrap_err();
assert!(
matches!(err, PreflightError::EmptyRepo { .. }),
"expected EmptyRepo, got {err:?}"
);
}
#[tokio::test]
async fn preflight_mixed_repo_prefers_safetensors() {
let cache = tempfile::tempdir().expect("tempdir");
let bodies: MockBodies = Arc::new(Mutex::new(Default::default()));
bodies.lock().unwrap().insert(
"Mixed/Repo".to_string(),
Some(siblings(&[
"config.json",
"tokenizer.json",
"model.safetensors",
"model-Q4_K_M.gguf",
])),
);
let endpoint = spawn_mock(bodies).await;
let api = build_api(&endpoint, cache.path());
// TP=2 + quant should succeed via the dense path even though a
// GGUF is present — the dense path handles ISQ.
let s = spec("Mixed/Repo", Some(2), Some("q5k"));
let plan = preflight(&api, &sid(&s.model_id), &s)
.await
.expect("mixed should succeed");
assert!(matches!(plan.format, SourceFormat::Mixed { .. }));
}