feat(neuron): graceful unload-on-shutdown via SIGTERM/SIGINT

Stage 6 of the candle-native pivot. Adds first-class deactivation: neuron now drains in-flight requests on SIGTERM (systemd stop) or SIGINT (Ctrl-C), then unloads every loaded model before the process exits — releasing CUDA contexts and VRAM cleanly rather than leaving the OS to reclaim them. Mechanism: - startup::shutdown_signal() resolves on either ctrl_c() or a SIGTERM listener. - axum::serve(...).with_graceful_shutdown(shutdown_signal()) stops accepting new connections, lets active requests finish, then returns control to main. - startup::unload_all_models(&registry) iterates list_all_models() and calls unload per entry. Per-model failures are logged warnings; cleanup continues. Empty registry is a fast no-op. - main holds an Arc<NeuronState> reference past axum's lifetime so the registry is still reachable for the unload sweep. data/neuron.service: - TimeoutStopSec=120s — generous bound for big-model unloads before systemd escalates to SIGKILL. - KillSignal=SIGTERM — explicit, matches the handler. Two non-gated tests cover the empty-registry no-op and the no-models- loaded path. Real load-then-unload-on-shutdown is exercised by the cuda-integration test from Stage 2 (which calls unload_model directly) and observable on a real GPU host by stopping the service and watching nvidia-smi. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:58:07 +03:00
parent 6779b7526a
commit aad314cdfa
4 changed files with 111 additions and 5 deletions
--- a/crates/neuron/tests/shutdown.rs
+++ b/crates/neuron/tests/shutdown.rs
@@ -0,0 +1,32 @@
+//! Deactivation behaviour: unload_all_models tolerates an empty
+//! registry and continues past per-model unload failures.
+
+use cortex_core::harness::HarnessConfig;
+use neuron::config::HarnessSettings;
+use neuron::harness::HarnessRegistry;
+use neuron::startup;
+
+#[tokio::test]
+async fn test_unload_all_models_empty_registry_is_noop() {
+    let registry = HarnessRegistry::new();
+    startup::unload_all_models(&registry).await;
+}
+
+#[tokio::test]
+async fn test_unload_all_models_with_no_loaded_models() {
+    let registry = HarnessRegistry::from_configs(
+        &[HarnessConfig {
+            name: "candle".into(),
+        }],
+        "http://localhost:0",
+        &HarnessSettings::default(),
+    );
+
+    startup::unload_all_models(&registry).await;
+
+    let listed = registry
+        .list_all_models()
+        .await
+        .expect("list_all_models should still succeed after shutdown cleanup");
+    assert!(listed.is_empty());
+}