fix(ci): drop sudo from dnf install (runner runs as root, no sudo)

The act runner container has no sudo binary; the runner user already runs as root inside the container. Existing steps (rpmbuild, gpg, etc) already invoke privileged commands directly without sudo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(ci): ensure rust toolchain present on cuda-13.0 runner
2026-05-19 07:06:52 +03:00 · 2026-05-19 07:04:57 +03:00 · 2026-05-18 18:55:02 +03:00 · 2026-05-18 17:58:07 +03:00
6 changed files with 249 additions and 68 deletions
--- a/.gitea/workflows/build-prerelease.yml
+++ b/.gitea/workflows/build-prerelease.yml
@@ -108,14 +108,25 @@ jobs:
            build_jobs: 8
            nvcc_threads: 4
            cargo_features: "cuda cudnn flash-attn"
-    # runner-cuda-13.0 extends runner-rust, so rust/cargo are already
-    # present via dnf — no rustup install step needed.
+    # runner-cuda-13.0 inherits from runner-rust in gongfoo, so rust
+    # *should* be available via dnf. The currently-published image is
+    # missing it though (likely a stale build), so we run a defensive
+    # `dnf install` at the top of the step. When the runner image is
+    # rebuilt with the proper layers this becomes a fast no-op.
    runs-on: ${{ matrix.runner }}
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ inputs.ref }}

+      - name: Ensure rust toolchain present
+        run: |
+          set -eux
+          if ! command -v cargo >/dev/null 2>&1; then
+            dnf install -y --setopt=install_weak_deps=False rust cargo clippy
+          fi
+          cargo --version
+
      - name: Build neuron with CUDA (${{ matrix.flavour }})
        run: |
          set -eux
--- a/crates/neuron/src/main.rs
+++ b/crates/neuron/src/main.rs
@@ -78,11 +78,21 @@ async fn main() -> Result<()> {
        candle,
    });

-    let app = api::neuron_routes().with_state(state);
+    let app = api::neuron_routes().with_state(Arc::clone(&state));
    let addr: std::net::SocketAddr = format!("0.0.0.0:{port}").parse()?;
    tracing::info!("neuron listening on {addr}");
    let listener = tokio::net::TcpListener::bind(addr).await?;
-    axum::serve(listener, app).await?;
+    axum::serve(listener, app)
+        .with_graceful_shutdown(startup::shutdown_signal())
+        .await?;
+
+    // Deactivation: serve has returned (graceful shutdown signal
+    // received and connections drained). Release CUDA contexts / VRAM
+    // by unloading every model before exiting; systemd's TimeoutStopSec
+    // bounds how long this phase may take.
+    let registry = state.registry.read().await;
+    startup::unload_all_models(&registry).await;
+    tracing::info!("shutdown complete");

    Ok(())
 }
--- a/crates/neuron/src/startup.rs
+++ b/crates/neuron/src/startup.rs
@@ -1,12 +1,14 @@
-//! Activation-time orchestration.
+//! Activation- and deactivation-time orchestration.
 //!
-//! Wired from `main.rs` after the harness registry is built and before
-//! the HTTP listener binds. Kept in its own module so the logic is
+//! Wired from `main.rs` around the HTTP listener — activation runs
+//! before bind, deactivation runs after axum returns from its
+//! graceful-shutdown future. Kept in its own module so the logic is
 //! unit-testable without spinning up a full neuron process.

 use crate::harness::HarnessRegistry;
 use cortex_core::harness::ModelSpec;
 use std::time::Instant;
+use tokio::signal;

 /// Load each spec sequentially against the registry, treating
 /// individual failures as warnings rather than fatal errors.
@@ -36,3 +38,60 @@ pub async fn load_default_models(registry: &HarnessRegistry, specs: &[ModelSpec]
        }
    }
 }
+
+/// Future that resolves on SIGINT (Ctrl-C) or SIGTERM (systemd stop).
+///
+/// Wired into `axum::serve(...).with_graceful_shutdown(shutdown_signal())`
+/// so the HTTP listener stops accepting new connections, lets in-flight
+/// requests drain, and then yields control back to main for cleanup.
+pub async fn shutdown_signal() {
+    let ctrl_c = async {
+        signal::ctrl_c().await.ok();
+    };
+    let terminate = async {
+        signal::unix::signal(signal::unix::SignalKind::terminate())
+            .expect("install SIGTERM handler")
+            .recv()
+            .await;
+    };
+    tokio::select! {
+        _ = ctrl_c => tracing::info!("received SIGINT, shutting down"),
+        _ = terminate => tracing::info!("received SIGTERM, shutting down"),
+    }
+}
+
+/// Unload every model currently registered. Called from `main.rs` after
+/// axum's graceful shutdown future resolves, so CUDA contexts and VRAM
+/// are released before the process exits rather than left to the OS to
+/// reclaim. Per-model failures are logged and skipped — keep cleanup
+/// going even when one harness is unhealthy.
+pub async fn unload_all_models(registry: &HarnessRegistry) {
+    let listed = match registry.list_all_models().await {
+        Ok(m) => m,
+        Err(e) => {
+            tracing::warn!(error = %e, "failed to list models during shutdown");
+            return;
+        }
+    };
+
+    if listed.is_empty() {
+        return;
+    }
+
+    tracing::info!(count = listed.len(), "unloading models for shutdown");
+    for model in listed {
+        let start = Instant::now();
+        match registry.unload_model(&model.id).await {
+            Ok(()) => tracing::info!(
+                model = %model.id,
+                elapsed_ms = start.elapsed().as_millis() as u64,
+                "unloaded"
+            ),
+            Err(e) => tracing::warn!(
+                model = %model.id,
+                error = %e,
+                "unload failed during shutdown"
+            ),
+        }
+    }
+}
--- a/crates/neuron/tests/shutdown.rs
+++ b/crates/neuron/tests/shutdown.rs
@@ -0,0 +1,32 @@
+//! Deactivation behaviour: unload_all_models tolerates an empty
+//! registry and continues past per-model unload failures.
+
+use cortex_core::harness::HarnessConfig;
+use neuron::config::HarnessSettings;
+use neuron::harness::HarnessRegistry;
+use neuron::startup;
+
+#[tokio::test]
+async fn test_unload_all_models_empty_registry_is_noop() {
+    let registry = HarnessRegistry::new();
+    startup::unload_all_models(&registry).await;
+}
+
+#[tokio::test]
+async fn test_unload_all_models_with_no_loaded_models() {
+    let registry = HarnessRegistry::from_configs(
+        &[HarnessConfig {
+            name: "candle".into(),
+        }],
+        "http://localhost:0",
+        &HarnessSettings::default(),
+    );
+
+    startup::unload_all_models(&registry).await;
+
+    let listed = registry
+        .list_all_models()
+        .await
+        .expect("list_all_models should still succeed after shutdown cleanup");
+    assert!(listed.is_empty());
+}
--- a/data/neuron.service
+++ b/data/neuron.service
@@ -15,6 +15,11 @@ Group=neuron
 # materialise on first activation. systemd's default TimeoutStartSec
 # (90s) is far too short; allow 30 minutes.
 TimeoutStartSec=1800s
+# On stop, neuron drains in-flight requests then unloads every model
+# to release CUDA contexts cleanly. Allow generous time for big-model
+# unloads; systemd will SIGKILL after this bound.
+TimeoutStopSec=120s
+KillSignal=SIGTERM

 [Install]
 WantedBy=multi-user.target
--- a/script/deploy.sh
+++ b/script/deploy.sh
@@ -27,63 +27,124 @@ mapfile -t neuron_entries < <(
    yq -r '.neurons[] | .host + "\t" + .flavour' "${MANIFEST}"
 )

-latest_helexa_version=$(git -C "${REPO_DIR}" describe --tags --abbrev=0 | sed 's/^v//')
+# Return the installed package's "version-release" string, or
+# "(not installed)" when rpm reports the package as absent. Capture
+# rpm's output into a variable so its "package X is not installed"
+# stdout message (rpm writes that to stdout, not stderr, when -q fails)
+# doesn't leak into the result.
+installed_nvr() {
+    local host="$1" pkg="$2"
+    local nvr
+    if nvr=$(ssh "${host}" "rpm -q --qf '%{version}-%{release}' ${pkg} 2>/dev/null"); then
+        echo "${nvr}"
+    else
+        echo "(not installed)"
+    fi
+}
+
+# Ensure the rpm.lair.cafe unstable repo is configured AND enabled on
+# the remote host.
+#
+# The upstream .repo file at https://rpm.lair.cafe/lair-cafe-unstable.repo
+# ships with `enabled=0` so a host that just fetched it won't start
+# pulling unstable packages by accident. We have to explicitly flip
+# enabled=1 via `dnf config-manager setopt`. Both addrepo and setopt
+# are idempotent.
+#
+# Non-fatal — if either step fails the subsequent `dnf install` will
+# surface a clearer diagnostic on its own.
+ensure_lair_repo() {
+    local host="$1"
+    if ! ssh "${host}" "test -f /etc/yum.repos.d/lair-cafe-unstable.repo" 2>/dev/null; then
+        echo "[${host}] adding rpm.lair.cafe unstable repo"
+        if ! ssh "${host}" sudo dnf config-manager addrepo \
+            --from-repofile=https://rpm.lair.cafe/lair-cafe-unstable.repo \
+            >/dev/null 2>&1; then
+            echo "[${host}] WARNING: failed to add lair.cafe repo file (proceeding anyway)"
+            return 0
+        fi
+    fi
+    # The .repo file ships enabled=0; flip it on. Cheap, idempotent.
+    if ! ssh "${host}" sudo dnf config-manager setopt \
+        lair-cafe-unstable.enabled=1 >/dev/null 2>&1; then
+        echo "[${host}] WARNING: failed to enable lair-cafe-unstable (proceeding anyway)"
+    fi
+}
+
+# True when the named package needs to be installed or upgraded on the
+# remote host — either it's not present, or a newer version exists in
+# the repo. False only when the installed version is current.
+#
+# `dnf check-update <pkg>` returns 0 when the package isn't installed
+# at all (there's nothing to update), so we have to probe with rpm -q
+# first to distinguish "absent" from "current". Other dnf failures
+# collapse into "needs update" so the subsequent install step surfaces
+# the real diagnostic rather than this check swallowing it.
+needs_update() {
+    local host="$1" pkg="$2"
+    # Not installed → needs work.
+    if ! ssh "${host}" "rpm -q ${pkg}" >/dev/null 2>&1; then
+        return 0
+    fi
+    # Installed; ask dnf whether the repo has something newer.
+    if ssh "${host}" sudo dnf check-update --refresh -q "${pkg}" >/dev/null 2>&1; then
+        return 1
+    else
+        return 0
+    fi
+}

 # ---------------------------------------------------------------------------
 # cortex (gateway)
 # ---------------------------------------------------------------------------

-observed_cortex_version=$(ssh "${cortex_host}" cortex --version | sed 's/^cortex //')
-if [[ "${latest_helexa_version}" = "${observed_cortex_version}" ]]; then
-    echo "[${cortex_host}] cortex is up to date (${observed_cortex_version})"
-    if ssh "${cortex_host}" sudo systemctl stop cortex.service && rsync \
-        --archive \
-        --compress \
-        --rsync-path 'sudo rsync' \
-        --chown root:root \
-        --chmod 644 \
-        "${REPO_DIR}/cortex.toml" \
-        "${cortex_host}:/etc/cortex/cortex.toml"; then
-        echo "[${cortex_host}] sync'd cortex.toml"
-        ssh "${cortex_host}" sudo systemctl daemon-reload
-        ssh "${cortex_host}" sudo systemctl start cortex.service
-    else
-        echo "[${cortex_host}] failed to sync cortex.toml"
-    fi
-    if ssh "${cortex_host}" systemctl is-active --quiet cortex.service; then
-        echo "[${cortex_host}] cortex service is active"
-    elif ssh "${cortex_host}" sudo systemctl start cortex.service; then
-        echo "[${cortex_host}] started cortex service"
-    else
-        echo "[${cortex_host}] failed to start cortex service"
-    fi
-else
-    echo "[${cortex_host}] cortex is out of date (${observed_cortex_version} != ${latest_helexa_version})"
-    if ssh "${cortex_host}" sudo systemctl stop cortex.service; then
+ensure_lair_repo "${cortex_host}"
+cortex_nvr=$(installed_nvr "${cortex_host}" cortex)
+if needs_update "${cortex_host}" cortex; then
+    echo "[${cortex_host}] cortex update available (current: ${cortex_nvr})"
+    # Stop the service only if the unit file exists — fresh installs
+    # don't have it, and `systemctl stop` on a missing unit returns
+    # non-zero, which would otherwise short-circuit the install branch
+    # under set -e.
+    if ssh "${cortex_host}" "[ ! -f /usr/lib/systemd/system/cortex.service ] || sudo systemctl stop cortex.service"; then
        echo "[${cortex_host}] stopped cortex service"
-        if ssh "${cortex_host}" sudo dnf upgrade --refresh -y cortex; then
-            echo "[${cortex_host}] upgraded cortex"
-            if rsync \
-                --archive \
-                --compress \
-                --verbose \
-                --rsync-path 'sudo rsync' \
-                --chown root:root \
-                --chmod 644 \
-                "${REPO_DIR}/cortex.toml" \
-                "${cortex_host}:/etc/cortex/cortex.toml"; then
-                echo "[${cortex_host}] sync'd cortex.toml"
-                ssh "${cortex_host}" sudo systemctl daemon-reload
-                ssh "${cortex_host}" sudo systemctl start cortex.service
+        if dnf_output=$(ssh "${cortex_host}" sudo dnf install --refresh --allowerasing -y cortex 2>&1); then
+            cortex_nvr=$(installed_nvr "${cortex_host}" cortex)
+            echo "[${cortex_host}] installed/upgraded cortex to ${cortex_nvr}"
        else
-                echo "[${cortex_host}] failed to sync cortex.toml"
-            fi
-        else
-            echo "[${cortex_host}] failed to upgrade cortex"
+            echo "[${cortex_host}] failed to install/upgrade cortex:"
+            echo "${dnf_output}" | sed "s/^/[${cortex_host}]   /"
        fi
    else
        echo "[${cortex_host}] failed to stop cortex service"
    fi
+else
+    echo "[${cortex_host}] cortex is up to date (${cortex_nvr})"
+    ssh "${cortex_host}" sudo systemctl stop cortex.service || true
+fi
+
+# Sync cortex.toml whether the package was upgraded or not — the config
+# can change without a package bump.
+if rsync \
+    --archive \
+    --compress \
+    --rsync-path 'sudo rsync' \
+    --chown root:root \
+    --chmod 644 \
+    "${REPO_DIR}/cortex.toml" \
+    "${cortex_host}:/etc/cortex/cortex.toml"; then
+    echo "[${cortex_host}] sync'd cortex.toml"
+else
+    echo "[${cortex_host}] failed to sync cortex.toml"
+fi
+
+ssh "${cortex_host}" sudo systemctl daemon-reload
+if ssh "${cortex_host}" systemctl is-active --quiet cortex.service; then
+    echo "[${cortex_host}] cortex service is active"
+elif ssh "${cortex_host}" sudo systemctl start cortex.service; then
+    echo "[${cortex_host}] started cortex service"
+else
+    echo "[${cortex_host}] failed to start cortex service"
 fi

 # ---------------------------------------------------------------------------
@@ -94,26 +155,19 @@ for entry in "${neuron_entries[@]}"; do
    IFS=$'\t' read -r neuron_host neuron_flavour <<< "${entry}"
    package="helexa-neuron-${neuron_flavour}"

-    observed_neuron_version=$(ssh "${neuron_host}" neuron --version 2> /dev/null | sed 's/^neuron //' || true)
-    if [[ "${latest_helexa_version}" = "${observed_neuron_version}" ]]; then
-        echo "[${neuron_host}] neuron is up to date (${observed_neuron_version}, ${package})"
-        if ssh "${neuron_host}" systemctl is-active --quiet neuron.service; then
-            echo "[${neuron_host}] neuron service is active"
-        elif ssh "${neuron_host}" sudo systemctl start neuron.service; then
-            echo "[${neuron_host}] started neuron service"
-        else
-            echo "[${neuron_host}] failed to start neuron service"
-        fi
-    else
-        echo "[${neuron_host}] upgrading neuron from ${observed_neuron_version:-(absent)} to ${latest_helexa_version} (${package})"
+    ensure_lair_repo "${neuron_host}"
+    neuron_nvr=$(installed_nvr "${neuron_host}" "${package}")
+    if needs_update "${neuron_host}" "${package}"; then
+        echo "[${neuron_host}] ${package} update available (current: ${neuron_nvr})"
        if ssh "${neuron_host}" "[ ! -f /usr/lib/systemd/system/neuron.service ] || sudo systemctl stop neuron.service"; then
            echo "[${neuron_host}] stopped neuron service"
            # --allowerasing lets dnf swap out a previously-installed
            # bare helexa-neuron or a different flavour without manual
            # intervention. The Conflicts: clauses in the spec ensure
            # only one flavour is ever resident.
-            if ssh "${neuron_host}" sudo dnf install --refresh --allowerasing -y "${package}" &> /dev/null; then
-                echo "[${neuron_host}] installed/upgraded ${package}"
+            if dnf_output=$(ssh "${neuron_host}" sudo dnf install --refresh --allowerasing -y "${package}" 2>&1); then
+                neuron_nvr=$(installed_nvr "${neuron_host}" "${package}")
+                echo "[${neuron_host}] installed/upgraded ${package} to ${neuron_nvr}"
                # Ensure firewalld allows neuron port
                ssh "${neuron_host}" "sudo firewall-cmd --query-service=helexa-neuron --quiet 2>/dev/null || sudo firewall-cmd --add-service=helexa-neuron --permanent && sudo firewall-cmd --reload" 2>/dev/null || true
                if ssh "${neuron_host}" "sudo systemctl daemon-reload && sudo systemctl start neuron.service"; then
@@ -122,10 +176,20 @@ for entry in "${neuron_entries[@]}"; do
                    echo "[${neuron_host}] failed to start neuron service"
                fi
            else
-                echo "[${neuron_host}] failed to install ${package}"
+                echo "[${neuron_host}] failed to install ${package}:"
+                echo "${dnf_output}" | sed "s/^/[${neuron_host}]   /"
            fi
        else
            echo "[${neuron_host}] failed to stop neuron service"
        fi
+    else
+        echo "[${neuron_host}] ${package} is up to date (${neuron_nvr})"
+        if ssh "${neuron_host}" systemctl is-active --quiet neuron.service; then
+            echo "[${neuron_host}] neuron service is active"
+        elif ssh "${neuron_host}" sudo systemctl start neuron.service; then
+            echo "[${neuron_host}] started neuron service"
+        else
+            echo "[${neuron_host}] failed to start neuron service"
+        fi
    fi
 done
Author	SHA1	Message	Date
rob thijssen	0e9671dd7d	fix(ci): drop sudo from dnf install (runner runs as root, no sudo) All checks were successful CI / Format (push) Successful in 36s Details CI / Clippy (push) Successful in 2m13s Details CI / Test (push) Successful in 4m17s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details The act runner container has no sudo binary; the runner user already runs as root inside the container. Existing steps (rpmbuild, gpg, etc) already invoke privileged commands directly without sudo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 07:06:52 +03:00
rob thijssen	e29c9e35f0	fix(ci): ensure rust toolchain present on cuda-13.0 runner The currently-published runner-cuda-13.0 image (gongfoo) is missing rust/cargo despite inheriting from runner-rust. Build-neuron fails immediately with 'cargo: command not found' even though build-cortex on the bare 'rust' runner builds fine. Add a defensive `dnf install rust cargo clippy` step at the top of build-neuron. Idempotent — on a properly-built runner image this is a fast no-op; on the current broken image it installs the toolchain in a few seconds. The runner image itself should be rebuilt in gongfoo so this step becomes redundant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 07:04:57 +03:00
rob thijssen	8a2334eacb	deploy: dnf-native version check + lair.cafe repo bootstrap Replaces the string compare of 'git describe --tags' vs the binary's self-reported --version (which lies about prereleases — every 0.1.16-* RPM reports just "0.1.16") with the dnf-native question of "is the installed package current against what the repo offers". Mechanism: - installed_nvr(): rpm -q --qf '%{version}-%{release}' for the resident package, falling back to "(not installed)". Capturing rpm's output through a variable keeps its "package X is not installed" stdout message out of the result on failure. - needs_update(): probes rpm -q first (treats absent as "needs work"), then asks dnf check-update --refresh -q. Other dnf failures collapse into "needs update" so the subsequent install surfaces a real error rather than this check swallowing one silently. - ensure_lair_repo(): probes for /etc/yum.repos.d/lair-cafe-unstable.repo and adds it with `dnf config-manager addrepo` when missing. The upstream .repo file ships enabled=0 (unstable channel doesn't auto-engage on fetch), so we then run `dnf config-manager setopt lair-cafe-unstable.enabled=1` every run — cheap, idempotent. - Cortex and neuron install branches now guard `systemctl stop` with `[ ! -f /usr/lib/systemd/system/...service ] \|\| sudo systemctl stop` so fresh installs (no unit file yet) don't short-circuit the install step under set -e. - dnf output is captured into a variable and only printed (with a [host] prefix per line) on failure, so success stays quiet and failures show the actual diagnostic instead of being eaten by &> /dev/null. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 18:55:02 +03:00
rob thijssen	aad314cdfa	feat(neuron): graceful unload-on-shutdown via SIGTERM/SIGINT Stage 6 of the candle-native pivot. Adds first-class deactivation: neuron now drains in-flight requests on SIGTERM (systemd stop) or SIGINT (Ctrl-C), then unloads every loaded model before the process exits — releasing CUDA contexts and VRAM cleanly rather than leaving the OS to reclaim them. Mechanism: - startup::shutdown_signal() resolves on either ctrl_c() or a SIGTERM listener. - axum::serve(...).with_graceful_shutdown(shutdown_signal()) stops accepting new connections, lets active requests finish, then returns control to main. - startup::unload_all_models(&registry) iterates list_all_models() and calls unload per entry. Per-model failures are logged warnings; cleanup continues. Empty registry is a fast no-op. - main holds an Arc<NeuronState> reference past axum's lifetime so the registry is still reachable for the unload sweep. data/neuron.service: - TimeoutStopSec=120s — generous bound for big-model unloads before systemd escalates to SIGKILL. - KillSignal=SIGTERM — explicit, matches the handler. Two non-gated tests cover the empty-registry no-op and the no-models- loaded path. Real load-then-unload-on-shutdown is exercised by the cuda-integration test from Stage 2 (which calls unload_model directly) and observable on a real GPU host by stopping the service and watching nvidia-smi. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:58:07 +03:00