Stage 6 of the candle-native pivot. Adds first-class deactivation: neuron now drains in-flight requests on SIGTERM (systemd stop) or SIGINT (Ctrl-C), then unloads every loaded model before the process exits — releasing CUDA contexts and VRAM cleanly rather than leaving the OS to reclaim them. Mechanism: - startup::shutdown_signal() resolves on either ctrl_c() or a SIGTERM listener. - axum::serve(...).with_graceful_shutdown(shutdown_signal()) stops accepting new connections, lets active requests finish, then returns control to main. - startup::unload_all_models(®istry) iterates list_all_models() and calls unload per entry. Per-model failures are logged warnings; cleanup continues. Empty registry is a fast no-op. - main holds an Arc<NeuronState> reference past axum's lifetime so the registry is still reachable for the unload sweep. data/neuron.service: - TimeoutStopSec=120s — generous bound for big-model unloads before systemd escalates to SIGKILL. - KillSignal=SIGTERM — explicit, matches the handler. Two non-gated tests cover the empty-registry no-op and the no-models- loaded path. Real load-then-unload-on-shutdown is exercised by the cuda-integration test from Stage 2 (which calls unload_model directly) and observable on a real GPU host by stopping the service and watching nvidia-smi. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
26 lines
814 B
Desktop File
26 lines
814 B
Desktop File
[Unit]
|
|
Description=Neuron — per-node GPU discovery and harness daemon for cortex
|
|
After=network-online.target
|
|
Wants=network-online.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
ExecStart=/usr/bin/neuron --config /etc/neuron/neuron.toml
|
|
Restart=on-failure
|
|
RestartSec=5
|
|
User=neuron
|
|
Group=neuron
|
|
# Loading default_models from neuron.toml happens before the HTTP
|
|
# listener binds; large models can take many minutes to download and
|
|
# materialise on first activation. systemd's default TimeoutStartSec
|
|
# (90s) is far too short; allow 30 minutes.
|
|
TimeoutStartSec=1800s
|
|
# On stop, neuron drains in-flight requests then unloads every model
|
|
# to release CUDA contexts cleanly. Allow generous time for big-model
|
|
# unloads; systemd will SIGKILL after this bound.
|
|
TimeoutStopSec=120s
|
|
KillSignal=SIGTERM
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|