feat(neuron): load default_models on service activation

Stage 5 of the candle-native pivot. Adds first-class support for auto-loading a configured set of models when the neuron service activates. Config: - NeuronConfig.default_models: Vec<ModelSpec> (defaults to []). - neuron.example.toml ships a commented [[default_models]] example. Activation flow (crates/neuron/src/startup.rs::load_default_models): - Sequential — VRAM contention makes parallel loads risky. - Per-entry timing logged at info level on success. - Failures logged as warnings; the next entry is still attempted. - An empty list short-circuits without log noise. Called from main.rs after the registry is built and before the axum listener binds, so /models reflects the loaded state from the very first request. data/neuron.service gains TimeoutStartSec=1800s. With activation blocked on potentially slow first-time HF downloads + GGUF materialisation, systemd's default 90s would kill larger model loads mid-flight. Two non-gated tests in tests/activation.rs cover the continues-past-failure and empty-list paths using a synthetically unknown harness name to fail loads fast without touching the network. The cuda-integration test from earlier stages still exercises the real load/unload lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:56:08 +03:00
parent 84f5662df1
commit 6779b7526a
7 changed files with 131 additions and 2 deletions
--- a/crates/neuron/tests/activation.rs
+++ b/crates/neuron/tests/activation.rs
@@ -0,0 +1,56 @@
+//! Activation-time behaviour: load_default_models continues past
+//! individual failures so a single broken catalogue entry doesn't
+//! prevent the rest of the fleet from starting.
+
+use cortex_core::harness::{HarnessConfig, ModelSpec};
+use neuron::config::HarnessSettings;
+use neuron::harness::HarnessRegistry;
+use neuron::startup;
+
+#[tokio::test]
+async fn test_load_default_models_skips_unknown_harness() {
+    let registry = HarnessRegistry::from_configs(
+        &[HarnessConfig {
+            name: "candle".into(),
+        }],
+        "http://localhost:0",
+        &HarnessSettings::default(),
+    );
+
+    // Both entries fail synchronously inside the registry — no network
+    // call escapes (the harness lookup mismatches before hf-hub is
+    // touched). The function should still return cleanly.
+    let specs = vec![
+        ModelSpec {
+            model_id: "model-a".into(),
+            harness: "no-such-harness".into(),
+            quant: None,
+            tensor_parallel: None,
+            devices: None,
+        },
+        ModelSpec {
+            model_id: "model-b".into(),
+            harness: "no-such-harness".into(),
+            quant: None,
+            tensor_parallel: None,
+            devices: None,
+        },
+    ];
+
+    startup::load_default_models(&registry, &specs).await;
+
+    let listed = registry
+        .list_all_models()
+        .await
+        .expect("list_all_models should succeed");
+    assert!(
+        listed.is_empty(),
+        "no models should be loaded after failed entries"
+    );
+}
+
+#[tokio::test]
+async fn test_load_default_models_empty_is_noop() {
+    let registry = HarnessRegistry::new();
+    startup::load_default_models(&registry, &[]).await;
+}