feat(gateway): surface mid-prewarm models as Loading on /v1/models

The poller now fetches /health alongside /models on each neuron and
stashes the activation snapshot on NodeState. The /v1/models handler
gains a Pass 3 that synthesises Loading locations from each neuron's
activation.in_progress and activation.pending lists, so a catalogued
model that's mid-prewarm surfaces as `status: "loading"` rather than
appearing absent (loaded=false, locations=[]).

Without this, a client polling /v1/models during a beast restart sees
Qwen3.6-27B disappear for the ~5 minutes the q5k load takes, then
reappear. Now it stays visible the whole time with a clear status.

Adds ModelStatus::Loading to cortex-core. The router's per-node priority
loop gets an explicit (no-op) arm: Loading models aren't routable yet,
and falling through to the catalogue cold-load path is the existing
race — no worse than before, but tagged as a known follow-up needing
neuron-side in-flight tracking on /models/load.

New test_poller_captures_activation_from_health exercises the full
round-trip: mock neuron with empty /models but a pre_warming /health
→ poller writes node.activation. Common test helpers gain
spawn_mock_neuron_with_models_and_health and default_health_response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-26 15:26:12 +03:00
parent 800498f530
commit b9e7a76a7a
7 changed files with 211 additions and 2 deletions

View File

@@ -164,6 +164,33 @@ pub async fn spawn_streaming_mock_neuron(chunk_count: usize, chunk_delay: Durati
/// Spawns a mock neuron with a custom models list.
pub async fn spawn_mock_neuron_with_models(models_response: Value) -> String {
spawn_mock_neuron_with_models_and_health(models_response, default_health_response()).await
}
/// Default `/health` response used by mocks that don't care about the
/// activation field — empty devices, no in-flight pre-warm, state=ready.
pub fn default_health_response() -> Value {
json!({
"uptime_secs": 0,
"devices": [],
"activation": {
"state": "ready",
"pending": [],
"in_progress": null,
"completed": [],
"failed": []
}
})
}
/// Variant of `spawn_mock_neuron_with_models` that also serves a
/// `/health` body. Used by tests that drive the gateway's activation
/// surface (poller reading /health, /v1/models synthesising Loading
/// locations from in_progress / pending).
pub async fn spawn_mock_neuron_with_models_and_health(
models_response: Value,
health_response: Value,
) -> String {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();
let base_url = format!("http://{addr}");
@@ -177,6 +204,13 @@ pub async fn spawn_mock_neuron_with_models(models_response: Value) -> String {
async move { Json(resp) }
}),
)
.route(
"/health",
get(move || {
let resp = health_response.clone();
async move { Json(resp) }
}),
)
.route(
"/models/{model_id}/endpoint",
get(move |Path(_model_id): Path<String>| {