cortex

Files

feat(neuron): graceful unload-on-shutdown via SIGTERM/SIGINT

Stage 6 of the candle-native pivot. Adds first-class deactivation:
neuron now drains in-flight requests on SIGTERM (systemd stop) or
SIGINT (Ctrl-C), then unloads every loaded model before the process
exits — releasing CUDA contexts and VRAM cleanly rather than leaving
the OS to reclaim them.

Mechanism:
- startup::shutdown_signal() resolves on either ctrl_c() or a
  SIGTERM listener.
- axum::serve(...).with_graceful_shutdown(shutdown_signal()) stops
  accepting new connections, lets active requests finish, then
  returns control to main.
- startup::unload_all_models(&registry) iterates list_all_models()
  and calls unload per entry. Per-model failures are logged warnings;
  cleanup continues. Empty registry is a fast no-op.
- main holds an Arc<NeuronState> reference past axum's lifetime so
  the registry is still reachable for the unload sweep.

data/neuron.service:
- TimeoutStopSec=120s — generous bound for big-model unloads before
  systemd escalates to SIGKILL.
- KillSignal=SIGTERM — explicit, matches the handler.

Two non-gated tests cover the empty-registry no-op and the no-models-
loaded path. Real load-then-unload-on-shutdown is exercised by the
cuda-integration test from Stage 2 (which calls unload_model directly)
and observable on a real GPU host by stopping the service and
watching nvidia-smi.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 17:58:07 +03:00

activation.rs

feat(neuron): load default_models on service activation

2026-05-18 17:56:08 +03:00

api.rs

feat(neuron): OpenAI-compatible SSE streaming chat completions

2026-05-18 17:53:14 +03:00

candle_lifecycle.rs

feat(neuron): OpenAI-compatible non-streaming chat completion

2026-05-18 16:47:58 +03:00

shutdown.rs

feat(neuron): graceful unload-on-shutdown via SIGTERM/SIGINT

2026-05-18 17:58:07 +03:00