feat(cortex): unified /v1/models — catalogue × topology feasibility + cold-load
Some checks failed
build-prerelease / Resolve version stamps (push) Successful in 45s
CI / Format (push) Successful in 48s
CI / Clippy (push) Successful in 2m12s
CI / Test (push) Successful in 4m42s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build cortex binary (push) Successful in 5m10s
build-prerelease / Build neuron-blackwell (push) Successful in 3m35s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
Some checks failed
build-prerelease / Resolve version stamps (push) Successful in 45s
CI / Format (push) Successful in 48s
CI / Clippy (push) Successful in 2m12s
CI / Test (push) Successful in 4m42s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build cortex binary (push) Successful in 5m10s
build-prerelease / Build neuron-blackwell (push) Successful in 3m35s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
Realises [project-unified-models-endpoint]: cortex now surfaces every
model the operator has provisioned in the catalogue, transparently
cold-loads on the first request, and routes the request once the load
is done — without per-node configuration or client awareness of which
neuron hosts what.
cortex-core changes:
- NodeState gains `discovery: Option<DiscoveryResponse>` — populated
once per neuron on first successful poll, cached forever after
(topology is invariant for a neuron process).
- ModelProfile gains `is_feasible_on(neuron, devices)` with the
pinned_on / min_devices / min_device_vram_mb logic + 5 unit tests.
- CortexModelEntry expanded with OpenAI-compatible (`id`, `object`,
`created`, `owned_by`) plus helexa-specific extension fields
(`loaded`, `feasible_on`, `locations`).
cortex-gateway changes:
- poller.rs: `maybe_poll_discovery` fetches `GET /discovery` once per
neuron and caches on NodeState.
- handlers.rs::list_models rewritten as union of (catalogue × topology
feasibility) + (currently loaded somewhere). Catalogue-defined models
surface even when not yet loaded.
- router.rs::resolve gains priority 3 (catalogue cold-load):
1. loaded somewhere → route there
2. unloaded somewhere → route + lazy load via neuron
3. in catalogue → pick feasible neuron, POST /models/load, wait,
route. Cache the new entry locally so subsequent requests skip
the poll wait.
4. else 404
- pick_feasible_neuron prefers pinned_on neurons, falls back to any
feasible one (stable by name).
- profile_to_spec translates ModelProfile → ModelSpec, picking devices
by VRAM floor and setting tensor_parallel = min_devices for multi-
device profiles.
- "already loaded" responses from neuron are tolerated (two concurrent
requests racing the same cold-load is a benign outcome).
models.example.toml rewritten to reflect the canonical helexa fleet
(beast = 2x RTX 5090, benjy = RTX 4090, quadbrat = RTX 3060) with a
working TP example (Qwen3.6-27B pinned on beast) plus single-GPU
profiles for the smaller models.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -185,12 +185,62 @@ async fn anthropic_messages(
|
||||
}
|
||||
}
|
||||
|
||||
/// `GET /v1/models` — aggregate models from all nodes.
|
||||
/// `GET /v1/models` — union of (catalogue × topology feasibility) and
|
||||
/// (currently loaded somewhere). The result is what the fleet *could*
|
||||
/// serve, not just what's already loaded — so OpenAI-compatible tools
|
||||
/// see every model the operator has provisioned, and cortex
|
||||
/// transparently cold-loads the first time one is requested.
|
||||
async fn list_models(State(fleet): State<Arc<CortexState>>) -> Json<Value> {
|
||||
use std::collections::HashMap;
|
||||
let now = Utc::now().timestamp() as u64;
|
||||
let nodes = fleet.nodes.read().await;
|
||||
let mut model_map: std::collections::HashMap<String, CortexModelEntry> =
|
||||
std::collections::HashMap::new();
|
||||
let catalogue = &fleet.catalogue;
|
||||
|
||||
let mut entries: HashMap<String, CortexModelEntry> = HashMap::new();
|
||||
|
||||
// Pass 1: catalogue × topology. For every catalogue profile, find
|
||||
// healthy neurons whose discovered devices satisfy the profile.
|
||||
// Catalogue-defined models surface here even if nothing has loaded
|
||||
// them yet — that's the point of the unified endpoint.
|
||||
for profile in &catalogue.models {
|
||||
let mut feasible_on = Vec::new();
|
||||
for node in nodes.values() {
|
||||
if !node.healthy {
|
||||
continue;
|
||||
}
|
||||
let Some(disc) = node.discovery.as_ref() else {
|
||||
continue;
|
||||
};
|
||||
if profile.is_feasible_on(&node.name, &disc.devices) {
|
||||
feasible_on.push(node.name.clone());
|
||||
}
|
||||
}
|
||||
if feasible_on.is_empty() {
|
||||
// The catalogue lists this model but no neuron's topology
|
||||
// matches — surface it as not-loaded with no feasible
|
||||
// location. Hides nothing; lets operators see why a
|
||||
// configured model isn't reachable.
|
||||
feasible_on.clear();
|
||||
}
|
||||
entries.insert(
|
||||
profile.id.clone(),
|
||||
CortexModelEntry {
|
||||
id: profile.id.clone(),
|
||||
object: "model".into(),
|
||||
created: now,
|
||||
owned_by: "helexa".into(),
|
||||
loaded: false,
|
||||
feasible_on,
|
||||
locations: Vec::new(),
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
// Pass 2: layer the actually-loaded state on top. For each
|
||||
// (node, model) entry, attach a ModelLocation. If the model isn't
|
||||
// in the catalogue, create a new CortexModelEntry from scratch —
|
||||
// cortex doesn't refuse to surface a manually-loaded model just
|
||||
// because the operator didn't enumerate it in models.toml.
|
||||
for node in nodes.values() {
|
||||
for (model_id, entry) in &node.models {
|
||||
let location = ModelLocation {
|
||||
@@ -198,19 +248,30 @@ async fn list_models(State(fleet): State<Arc<CortexState>>) -> Json<Value> {
|
||||
status: entry.status,
|
||||
vram_estimate_mb: entry.vram_estimate_mb,
|
||||
};
|
||||
model_map
|
||||
let was_loaded = matches!(entry.status, cortex_core::node::ModelStatus::Loaded);
|
||||
entries
|
||||
.entry(model_id.clone())
|
||||
.and_modify(|e| e.locations.push(location.clone()))
|
||||
.and_modify(|e| {
|
||||
e.locations.push(location.clone());
|
||||
if was_loaded {
|
||||
e.loaded = true;
|
||||
}
|
||||
})
|
||||
.or_insert_with(|| CortexModelEntry {
|
||||
id: model_id.clone(),
|
||||
object: "model".into(),
|
||||
created: now,
|
||||
owned_by: "helexa".into(),
|
||||
loaded: was_loaded,
|
||||
// Not in catalogue — cortex has no opinion on
|
||||
// feasibility; leave empty.
|
||||
feasible_on: Vec::new(),
|
||||
locations: vec![location],
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
let data: Vec<Value> = model_map.values().map(|e| json!(e)).collect();
|
||||
|
||||
let data: Vec<Value> = entries.values().map(|e| json!(e)).collect();
|
||||
Json(json!({
|
||||
"object": "list",
|
||||
"data": data,
|
||||
|
||||
Reference in New Issue
Block a user