feat(gateway): surface mid-prewarm models as Loading on /v1/models
The poller now fetches /health alongside /models on each neuron and stashes the activation snapshot on NodeState. The /v1/models handler gains a Pass 3 that synthesises Loading locations from each neuron's activation.in_progress and activation.pending lists, so a catalogued model that's mid-prewarm surfaces as `status: "loading"` rather than appearing absent (loaded=false, locations=[]). Without this, a client polling /v1/models during a beast restart sees Qwen3.6-27B disappear for the ~5 minutes the q5k load takes, then reappear. Now it stays visible the whole time with a clear status. Adds ModelStatus::Loading to cortex-core. The router's per-node priority loop gets an explicit (no-op) arm: Loading models aren't routable yet, and falling through to the catalogue cold-load path is the existing race — no worse than before, but tagged as a known follow-up needing neuron-side in-flight tracking on /models/load. New test_poller_captures_activation_from_health exercises the full round-trip: mock neuron with empty /models but a pre_warming /health → poller writes node.activation. Common test helpers gain spawn_mock_neuron_with_models_and_health and default_health_response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -237,3 +237,58 @@ async fn test_poller_removes_stale_models() {
|
||||
assert!(node.models.contains_key("keep-me"));
|
||||
assert!(!node.models.contains_key("drop-me"));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_poller_captures_activation_from_health() {
|
||||
// Mock neuron is mid-prewarm: /models reports nothing (the loading
|
||||
// model hasn't been inserted into the harness map yet), but
|
||||
// /health's activation says model-x is in_progress and model-y is
|
||||
// queued behind it.
|
||||
let mock_url = common::spawn_mock_neuron_with_models_and_health(
|
||||
json!([]),
|
||||
json!({
|
||||
"uptime_secs": 30,
|
||||
"devices": [],
|
||||
"activation": {
|
||||
"state": "pre_warming",
|
||||
"pending": ["Qwen/model-y"],
|
||||
"in_progress": "Qwen/model-x",
|
||||
"completed": [],
|
||||
"failed": []
|
||||
}
|
||||
}),
|
||||
)
|
||||
.await;
|
||||
|
||||
let config = GatewayConfig {
|
||||
gateway: GatewaySettings {
|
||||
listen: "127.0.0.1:0".into(),
|
||||
metrics_listen: "127.0.0.1:0".into(),
|
||||
},
|
||||
eviction: EvictionSettings {
|
||||
strategy: EvictionStrategy::Lru,
|
||||
defrag_after_cycles: 0,
|
||||
},
|
||||
neurons: vec![NeuronEndpoint {
|
||||
name: "prewarm-node".into(),
|
||||
endpoint: mock_url,
|
||||
}],
|
||||
models_config: "/dev/null".into(),
|
||||
};
|
||||
|
||||
let fleet = Arc::new(CortexState::from_config(&config));
|
||||
cortex_gateway::poller::poll_once(&fleet).await;
|
||||
|
||||
let nodes = fleet.nodes.read().await;
|
||||
let node = nodes.get("prewarm-node").unwrap();
|
||||
assert!(node.healthy);
|
||||
// /models was empty — no entries in the per-node model map.
|
||||
assert!(node.models.is_empty());
|
||||
// But /health's activation should be captured.
|
||||
let activation = node
|
||||
.activation
|
||||
.as_ref()
|
||||
.expect("activation should be populated after /health poll");
|
||||
assert_eq!(activation.in_progress.as_deref(), Some("Qwen/model-x"));
|
||||
assert_eq!(activation.pending, vec!["Qwen/model-y".to_string()]);
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user