feat(gateway): surface mid-prewarm models as Loading on /v1/models
The poller now fetches /health alongside /models on each neuron and stashes the activation snapshot on NodeState. The /v1/models handler gains a Pass 3 that synthesises Loading locations from each neuron's activation.in_progress and activation.pending lists, so a catalogued model that's mid-prewarm surfaces as `status: "loading"` rather than appearing absent (loaded=false, locations=[]). Without this, a client polling /v1/models during a beast restart sees Qwen3.6-27B disappear for the ~5 minutes the q5k load takes, then reappear. Now it stays visible the whole time with a clear status. Adds ModelStatus::Loading to cortex-core. The router's per-node priority loop gets an explicit (no-op) arm: Loading models aren't routable yet, and falling through to the catalogue cold-load path is the existing race — no worse than before, but tagged as a known follow-up needing neuron-side in-flight tracking on /models/load. New test_poller_captures_activation_from_health exercises the full round-trip: mock neuron with empty /models but a pre_warming /health → poller writes node.activation. Common test helpers gain spawn_mock_neuron_with_models_and_health and default_health_response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -3,7 +3,7 @@
|
||||
|
||||
use crate::state::CortexState;
|
||||
use chrono::Utc;
|
||||
use cortex_core::discovery::DiscoveryResponse;
|
||||
use cortex_core::discovery::{DiscoveryResponse, HealthResponse};
|
||||
use cortex_core::harness::ModelInfo;
|
||||
use cortex_core::node::{ModelEntry, ModelStatus};
|
||||
use std::sync::Arc;
|
||||
@@ -142,6 +142,51 @@ async fn poll_neuron(fleet: &CortexState, name: &str, endpoint: &str) {
|
||||
node.healthy = false;
|
||||
}
|
||||
}
|
||||
|
||||
// Release the write lock before the next HTTP call.
|
||||
drop(nodes);
|
||||
|
||||
// Poll /health for the activation snapshot. We don't want this to
|
||||
// flip the node to unhealthy on its own — a neuron that's serving
|
||||
// /models fine is still operational even if /health is briefly
|
||||
// unavailable — so failures are debug-level and leave the existing
|
||||
// activation reading in place.
|
||||
poll_health(fleet, name, endpoint).await;
|
||||
}
|
||||
|
||||
/// Fetch `/health` and stash the activation snapshot on NodeState.
|
||||
/// Decoupled from the /models poll so a /health glitch doesn't mark
|
||||
/// the neuron unhealthy or evict the model list.
|
||||
async fn poll_health(fleet: &CortexState, name: &str, endpoint: &str) {
|
||||
let url = format!("{endpoint}/health");
|
||||
let resp = match fleet
|
||||
.http_client
|
||||
.get(&url)
|
||||
.timeout(Duration::from_secs(5))
|
||||
.send()
|
||||
.await
|
||||
{
|
||||
Ok(r) if r.status().is_success() => r,
|
||||
Ok(r) => {
|
||||
tracing::debug!(node = name, status = %r.status(), "/health probe non-success");
|
||||
return;
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::debug!(node = name, error = %e, "/health probe failed");
|
||||
return;
|
||||
}
|
||||
};
|
||||
match resp.json::<HealthResponse>().await {
|
||||
Ok(h) => {
|
||||
let mut nodes = fleet.nodes.write().await;
|
||||
if let Some(node) = nodes.get_mut(name) {
|
||||
node.activation = Some(h.activation);
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::debug!(node = name, error = %e, "failed to parse /health response");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn parse_status(s: &str) -> ModelStatus {
|
||||
@@ -149,6 +194,7 @@ fn parse_status(s: &str) -> ModelStatus {
|
||||
"loaded" => ModelStatus::Loaded,
|
||||
"unloaded" => ModelStatus::Unloaded,
|
||||
"reloading" => ModelStatus::Reloading,
|
||||
"loading" => ModelStatus::Loading,
|
||||
_ => ModelStatus::Loaded,
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user