feat(neuron): bind listener before pre-warm, surface activation in /health
Some checks failed
build-prerelease / Resolve version stamps (push) Successful in 33s
CI / Format (push) Successful in 41s
CI / Clippy (push) Successful in 2m26s
build-prerelease / Build neuron-blackwell (push) Successful in 3m34s
CI / Test (push) Successful in 4m44s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build cortex binary (push) Successful in 4m29s
build-prerelease / Package cortex RPM (push) Successful in 1m23s
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
Some checks failed
build-prerelease / Resolve version stamps (push) Successful in 33s
CI / Format (push) Successful in 41s
CI / Clippy (push) Successful in 2m26s
build-prerelease / Build neuron-blackwell (push) Successful in 3m34s
CI / Test (push) Successful in 4m44s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build cortex binary (push) Successful in 4m29s
build-prerelease / Package cortex RPM (push) Successful in 1m23s
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
Two coupled changes addressing the 2026-05-26 validate-neuron failure
where a fresh deploy of beast had /health unreachable for ~5 minutes
while Qwen3.6-27B q5k materialised, even though systemd reported the
unit as active.
1. main.rs no longer awaits load_default_models before binding axum.
The listener binds first; pre-warm runs in a spawned background
task that holds a read lock on the harness registry for the
duration of its sequential load loop. Concurrent on-demand
/models/load and /v1/chat/completions traffic still flow.
2. /health gains an `activation` field carrying:
state pre_warming | ready
pending model ids queued but not started
in_progress model id currently loading (Option)
completed model ids loaded successfully this activation
failed [{model_id, error}] for failed entries
The field is `#[serde(default)]` so a pre-change cortex polling a
new neuron — or vice versa — keeps working.
`ActivationTracker` (new module `neuron::activation`) owns the
RwLock-wrapped state; load_default_models takes a tracker reference
and updates it per-model. NeuronState holds an Arc clone for the
/health handler.
Tests updated to construct trackers and assert state transitions
(empty noop, two failures → ready with both in `failed`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,7 +2,9 @@
|
||||
//! individual failures so a single broken catalogue entry doesn't
|
||||
//! prevent the rest of the fleet from starting.
|
||||
|
||||
use cortex_core::discovery::ActivationState;
|
||||
use cortex_core::harness::{HarnessConfig, ModelSpec};
|
||||
use neuron::activation::ActivationTracker;
|
||||
use neuron::config::HarnessSettings;
|
||||
use neuron::harness::HarnessRegistry;
|
||||
use neuron::startup;
|
||||
@@ -37,7 +39,8 @@ async fn test_load_default_models_skips_unknown_harness() {
|
||||
},
|
||||
];
|
||||
|
||||
startup::load_default_models(®istry, &specs).await;
|
||||
let activation = ActivationTracker::new(&specs);
|
||||
startup::load_default_models(®istry, &specs, &activation).await;
|
||||
|
||||
let listed = registry
|
||||
.list_all_models()
|
||||
@@ -47,10 +50,28 @@ async fn test_load_default_models_skips_unknown_harness() {
|
||||
listed.is_empty(),
|
||||
"no models should be loaded after failed entries"
|
||||
);
|
||||
|
||||
// Both specs should land in `failed`; tracker should flip to ready.
|
||||
let snapshot = activation.snapshot().await;
|
||||
assert_eq!(snapshot.state, ActivationState::Ready);
|
||||
assert!(snapshot.pending.is_empty());
|
||||
assert!(snapshot.in_progress.is_none());
|
||||
assert!(snapshot.completed.is_empty());
|
||||
assert_eq!(snapshot.failed.len(), 2);
|
||||
let failed_ids: Vec<&str> = snapshot
|
||||
.failed
|
||||
.iter()
|
||||
.map(|f| f.model_id.as_str())
|
||||
.collect();
|
||||
assert!(failed_ids.contains(&"model-a"));
|
||||
assert!(failed_ids.contains(&"model-b"));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_load_default_models_empty_is_noop() {
|
||||
let registry = HarnessRegistry::new();
|
||||
startup::load_default_models(®istry, &[]).await;
|
||||
let activation = ActivationTracker::new(&[]);
|
||||
startup::load_default_models(®istry, &[], &activation).await;
|
||||
let snapshot = activation.snapshot().await;
|
||||
assert_eq!(snapshot.state, ActivationState::Ready);
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user