feat(neuron): load default_models on service activation
All checks were successful
CI / Format (push) Successful in 34s
CI / Clippy (push) Successful in 2m13s
CI / Test (push) Successful in 4m6s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped

Stage 5 of the candle-native pivot. Adds first-class support for
auto-loading a configured set of models when the neuron service
activates.

Config:
- NeuronConfig.default_models: Vec<ModelSpec> (defaults to []).
- neuron.example.toml ships a commented [[default_models]] example.

Activation flow (crates/neuron/src/startup.rs::load_default_models):
- Sequential — VRAM contention makes parallel loads risky.
- Per-entry timing logged at info level on success.
- Failures logged as warnings; the next entry is still attempted.
- An empty list short-circuits without log noise.

Called from main.rs after the registry is built and before the axum
listener binds, so /models reflects the loaded state from the very
first request.

data/neuron.service gains TimeoutStartSec=1800s. With activation
blocked on potentially slow first-time HF downloads + GGUF
materialisation, systemd's default 90s would kill larger model loads
mid-flight.

Two non-gated tests in tests/activation.rs cover the
continues-past-failure and empty-list paths using a synthetically
unknown harness name to fail loads fast without touching the network.
The cuda-integration test from earlier stages still exercises the
real load/unload lifecycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-18 17:56:08 +03:00
parent 84f5662df1
commit 6779b7526a
7 changed files with 131 additions and 2 deletions

View File

@@ -0,0 +1,38 @@
//! Activation-time orchestration.
//!
//! Wired from `main.rs` after the harness registry is built and before
//! the HTTP listener binds. Kept in its own module so the logic is
//! unit-testable without spinning up a full neuron process.
use crate::harness::HarnessRegistry;
use cortex_core::harness::ModelSpec;
use std::time::Instant;
/// Load each spec sequentially against the registry, treating
/// individual failures as warnings rather than fatal errors.
///
/// VRAM contention makes parallel loads risky; the sequential path is
/// boring but correct. The function logs elapsed time per load so an
/// operator can see which model is hogging activation.
pub async fn load_default_models(registry: &HarnessRegistry, specs: &[ModelSpec]) {
if specs.is_empty() {
return;
}
tracing::info!(count = specs.len(), "loading default models");
for spec in specs {
let start = Instant::now();
match registry.load_model(spec).await {
Ok(()) => tracing::info!(
model = %spec.model_id,
elapsed_ms = start.elapsed().as_millis() as u64,
"loaded default model"
),
Err(e) => tracing::warn!(
model = %spec.model_id,
error = %e,
elapsed_ms = start.elapsed().as_millis() as u64,
"failed to load default model, continuing"
),
}
}
}