feat(gateway): surface mid-prewarm models as Loading on /v1/models

The poller now fetches /health alongside /models on each neuron and stashes the activation snapshot on NodeState. The /v1/models handler gains a Pass 3 that synthesises Loading locations from each neuron's activation.in_progress and activation.pending lists, so a catalogued model that's mid-prewarm surfaces as `status: "loading"` rather than appearing absent (loaded=false, locations=[]). Without this, a client polling /v1/models during a beast restart sees Qwen3.6-27B disappear for the ~5 minutes the q5k load takes, then reappear. Now it stays visible the whole time with a clear status. Adds ModelStatus::Loading to cortex-core. The router's per-node priority loop gets an explicit (no-op) arm: Loading models aren't routable yet, and falling through to the catalogue cold-load path is the existing race — no worse than before, but tagged as a known follow-up needing neuron-side in-flight tracking on /models/load. New test_poller_captures_activation_from_health exercises the full round-trip: mock neuron with empty /models but a pre_warming /health → poller writes node.activation. Common test helpers gain spawn_mock_neuron_with_models_and_health and default_health_response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 15:26:12 +03:00
parent 800498f530
commit b9e7a76a7a
7 changed files with 211 additions and 2 deletions
--- a/crates/cortex-gateway/tests/common/mod.rs
+++ b/crates/cortex-gateway/tests/common/mod.rs
@@ -164,6 +164,33 @@ pub async fn spawn_streaming_mock_neuron(chunk_count: usize, chunk_delay: Durati

 /// Spawns a mock neuron with a custom models list.
 pub async fn spawn_mock_neuron_with_models(models_response: Value) -> String {
+    spawn_mock_neuron_with_models_and_health(models_response, default_health_response()).await
+}
+
+/// Default `/health` response used by mocks that don't care about the
+/// activation field — empty devices, no in-flight pre-warm, state=ready.
+pub fn default_health_response() -> Value {
+    json!({
+        "uptime_secs": 0,
+        "devices": [],
+        "activation": {
+            "state": "ready",
+            "pending": [],
+            "in_progress": null,
+            "completed": [],
+            "failed": []
+        }
+    })
+}
+
+/// Variant of `spawn_mock_neuron_with_models` that also serves a
+/// `/health` body. Used by tests that drive the gateway's activation
+/// surface (poller reading /health, /v1/models synthesising Loading
+/// locations from in_progress / pending).
+pub async fn spawn_mock_neuron_with_models_and_health(
+    models_response: Value,
+    health_response: Value,
+) -> String {
    let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
    let addr = listener.local_addr().unwrap();
    let base_url = format!("http://{addr}");
@@ -177,6 +204,13 @@ pub async fn spawn_mock_neuron_with_models(models_response: Value) -> String {
                async move { Json(resp) }
            }),
        )
+        .route(
+            "/health",
+            get(move || {
+                let resp = health_response.clone();
+                async move { Json(resp) }
+            }),
+        )
        .route(
            "/models/{model_id}/endpoint",
            get(move |Path(_model_id): Path<String>| {