feat(gateway): surface mid-prewarm models as Loading on /v1/models

The poller now fetches /health alongside /models on each neuron and
stashes the activation snapshot on NodeState. The /v1/models handler
gains a Pass 3 that synthesises Loading locations from each neuron's
activation.in_progress and activation.pending lists, so a catalogued
model that's mid-prewarm surfaces as `status: "loading"` rather than
appearing absent (loaded=false, locations=[]).

Without this, a client polling /v1/models during a beast restart sees
Qwen3.6-27B disappear for the ~5 minutes the q5k load takes, then
reappear. Now it stays visible the whole time with a clear status.

Adds ModelStatus::Loading to cortex-core. The router's per-node priority
loop gets an explicit (no-op) arm: Loading models aren't routable yet,
and falling through to the catalogue cold-load path is the existing
race — no worse than before, but tagged as a known follow-up needing
neuron-side in-flight tracking on /models/load.

New test_poller_captures_activation_from_health exercises the full
round-trip: mock neuron with empty /models but a pre_warming /health
→ poller writes node.activation. Common test helpers gain
spawn_mock_neuron_with_models_and_health and default_health_response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This commit is contained in:

rob thijssen

2026-05-26 15:26:12 +03:00

parent 800498f530

commit b9e7a76a7a

7 changed files with 211 additions and 2 deletions

									
										1

crates/cortex-gateway/src/state.rs
									
												View File
												
				@@ -27,6 +27,7 @@ impl CortexState {

				                    lifecycle_cycles: 0,

				                    last_poll: None,

				                    discovery: None,

				                    activation: None,

				                },

				            );

				        }

feat(gateway): surface mid-prewarm models as Loading on /v1/models

1 crates/cortex-gateway/src/state.rs Unescape Escape View File

1

crates/cortex-gateway/src/state.rs

View File