Files
helexa/.gitea/workflows/deploy.yml
rob thijssen 8f6f1d3205
Some checks failed
build-prerelease / Build neuron-ampere (push) Blocked by required conditions
build-prerelease / Build neuron-ada (push) Blocked by required conditions
build-prerelease / Package cortex RPM (push) Blocked by required conditions
build-prerelease / Resolve version stamps + change detection (push) Successful in 29s
build-prerelease / Lint (fmt + clippy) (push) Successful in 2m14s
build-prerelease / Build neuron-blackwell (push) Successful in 10m36s
build-prerelease / Build cortex binary (push) Successful in 2m35s
build-prerelease / Test (push) Successful in 6m35s
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
feat(deploy): validate neuron capability after every deploy
A deploy previously went green the moment systemd reported the
service started — a merge that broke model loading or inference
itself would deploy "successfully" and only surface when a human
noticed. Each neuron deploy now earns its green:

1. Wait for default models: poll /health until activation.state is
   ready, with per-host timeouts in the matrix (beast 900s for the
   27B Q6K TP=2 cold-load, benjy/quadbrat 300s). Any entry in
   activation.failed fails the deploy with the per-model error —
   the structured equivalent of watching the journal for
   "loaded default model", plus failure detail the journal line
   can't carry.
2. LLM smoke probe: ask the first loaded model to reply with one
   specific word (max_tokens 512 so thinking models have room,
   temperature 0) and grep the response for it. Not a quality bar —
   just proof the deploy didn't lobotomize inference.

Hosts whose package is already current still skip everything — the
validation cost is only paid when a restart actually happened. The
probe was dry-run against benjy's production neuron before landing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 15:28:20 +03:00

253 lines
11 KiB
YAML

name: deploy
# Roll the freshly-published unstable RPMs onto the helexa fleet:
# cortex on the gateway, helexa-neuron-<flavour> on each neuron host.
#
# Triggered automatically after `build-prerelease` succeeds (by which
# point the new RPMs are live on rpm.lair.cafe/unstable), and also
# re-runnable manually from the Gitea UI.
#
# Each host self-gates: if dnf sees no newer package than what is
# installed, the service is left alone — no stop, no restart, no model
# cold-load. Combined with build-prerelease's change detection this
# means a docs- or gateway-only push never restarts the neurons (a
# neuron restart costs ~5 min of 27B cold-load, see issue #1).
#
# Per-host one-time setup (gitea_ci user, authorized_keys, scoped
# sudoers drop-in) lives in script/infra-setup.sh — run that once per
# host before this workflow can succeed.
on:
workflow_run:
workflows: [build-prerelease]
types: [completed]
workflow_dispatch:
# Serialize deploys. Overlapping runs would race on dnf metadata
# refresh and service-restart timing; queueing keeps the fleet
# predictable. Don't cancel an in-flight deploy — a half-applied dnf
# transaction is worse than a slightly stale deploy.
concurrency:
group: deploy
cancel-in-progress: false
env:
DEPLOY_KEY: |
${{ secrets.RSYNC_SSH_KEY }}
jobs:
deploy-cortex:
runs-on: fedora-43
# Two trigger paths: manual dispatch always runs; workflow_run
# only runs if the upstream `build-prerelease` actually succeeded.
if: >-
${{
github.event_name == 'workflow_dispatch'
|| github.event.workflow_run.conclusion == 'success'
}}
steps:
- name: SSH init
run: |
mkdir -p ~/.ssh
echo "${DEPLOY_KEY}" > ~/.ssh/id_ed25519
chmod 600 ~/.ssh/id_ed25519
ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=accept-new \
gitea_ci@hanzalova.internal 'hostname -f'
# Gating compares `rpm -q` against the packages.json manifest the
# publish job maintains — NOT unprivileged `dnf check-update`,
# which proved unreliable as the gitea_ci user (hung on metadata
# locks on one host, silently reported "no updates" on others).
# An unreadable/unparsable manifest fails open: deploy proceeds.
- name: Deploy cortex (skips when already current)
run: |
ssh gitea_ci@hanzalova.internal 'bash -s' <<'DEPLOY'
set -eu
pkg=cortex
installed=$(rpm -q --qf '%{VERSION}-%{RELEASE}' "${pkg}" 2>/dev/null || echo "not-installed")
latest=$(curl -fsS --max-time 15 "https://rpm.lair.cafe/fedora/43/x86_64/unstable/packages.json" 2>/dev/null \
| python3 -c '
import json, sys
name = sys.argv[1]
cands = [p for p in json.load(sys.stdin)["packages"] if p.get("name") == name]
if cands:
p = max(cands, key=lambda p: p.get("buildTime", 0))
print(p["version"] + "-" + p["release"])
' "${pkg}" 2>/dev/null || true)
if [ -n "${latest}" ] && [ "${latest}" = "${installed}" ]; then
echo "${pkg}-${installed} already current — leaving service untouched"
exit 0
fi
echo "installed=${installed} published=${latest:-unknown} — deploying"
if systemctl is-active --quiet cortex.service; then
sudo /usr/bin/systemctl stop cortex.service
fi
if rpm -q "${pkg}" >/dev/null 2>&1; then
sudo /usr/bin/dnf upgrade --refresh --allowerasing -y cortex
else
sudo /usr/bin/dnf install --refresh --allowerasing -y cortex
fi
sudo /usr/bin/systemctl daemon-reload
sudo /usr/bin/systemctl start cortex.service
DEPLOY
# Wait for the service to either come up or wedge, then capture
# the latest-invocation journal. Runs even on prior failure so a
# failed start step still leaves a usable record in the deploy log.
- name: Capture cortex.service startup journal
if: always()
run: |
sleep 10
ssh gitea_ci@hanzalova.internal \
'journalctl --unit cortex.service -I --no-pager'
deploy-neurons:
needs: [deploy-cortex]
runs-on: fedora-43
strategy:
# One neuron failing must not cancel the others. Cortex is up
# already; a partial neuron deploy is strictly better than
# rolling back to zero.
fail-fast: false
matrix:
include:
# load_timeout: how long to wait for default_models to finish
# loading after a restart. beast cold-loads Qwen3.6-27B Q6K
# TP=2 (~5-6 min typical, see #1); benjy/quadbrat load small
# single-GPU models in well under a minute.
- host: beast.hanzalova.internal
flavour: blackwell
load_timeout: 900
- host: benjy.hanzalova.internal
flavour: ada
load_timeout: 300
- host: quadbrat.hanzalova.internal
flavour: ampere
load_timeout: 300
steps:
- name: SSH init
run: |
mkdir -p ~/.ssh
echo "${DEPLOY_KEY}" > ~/.ssh/id_ed25519
chmod 600 ~/.ssh/id_ed25519
ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=accept-new \
gitea_ci@${{ matrix.host }} 'hostname -f'
# See deploy-cortex for why gating uses the publish manifest and
# not unprivileged `dnf check-update`.
- name: Deploy helexa-neuron-${{ matrix.flavour }} (skips when already current)
run: |
ssh gitea_ci@${{ matrix.host }} 'bash -s' <<'DEPLOY'
set -eu
pkg=helexa-neuron-${{ matrix.flavour }}
installed=$(rpm -q --qf '%{VERSION}-%{RELEASE}' "${pkg}" 2>/dev/null || echo "not-installed")
latest=$(curl -fsS --max-time 15 "https://rpm.lair.cafe/fedora/43/x86_64/unstable/packages.json" 2>/dev/null \
| python3 -c '
import json, sys
name = sys.argv[1]
cands = [p for p in json.load(sys.stdin)["packages"] if p.get("name") == name]
if cands:
p = max(cands, key=lambda p: p.get("buildTime", 0))
print(p["version"] + "-" + p["release"])
' "${pkg}" 2>/dev/null || true)
if [ -n "${latest}" ] && [ "${latest}" = "${installed}" ]; then
echo "${pkg}-${installed} already current — leaving service untouched"
exit 0
fi
echo "installed=${installed} published=${latest:-unknown} — deploying"
if systemctl is-active --quiet neuron.service; then
sudo /usr/bin/systemctl stop neuron.service
fi
if rpm -q "${pkg}" >/dev/null 2>&1; then
sudo /usr/bin/dnf upgrade --refresh --allowerasing -y "${pkg}"
else
sudo /usr/bin/dnf install --refresh --allowerasing -y "${pkg}"
fi
sudo /usr/bin/systemctl daemon-reload
sudo /usr/bin/systemctl start neuron.service
# ── Post-deploy validation ────────────────────────────────
# A deploy only goes green if the neuron (a) finishes loading
# its default models and (b) answers a trivial prompt like an
# LLM should. Catches the class of bug where the binary
# starts fine but model load or inference is broken — which
# previously surfaced only when a human noticed. The wait
# polls /health activation (the structured source of the
# "loaded default model" journal line, plus per-model failure
# detail); the journal-capture step below still runs for
# forensics either way.
load_timeout=${{ matrix.load_timeout }}
echo "waiting for default models (timeout ${load_timeout}s)"
deadline=$(( $(date +%s) + load_timeout ))
health=""
while :; do
health=$(curl -fsS --max-time 5 http://localhost:13131/health 2>/dev/null || true)
state=$(printf %s "${health}" | python3 -c '
import json, sys
try:
print(json.load(sys.stdin).get("activation", {}).get("state", ""))
except Exception:
print("")
')
if [ "${state}" = "ready" ]; then
break
fi
if [ "$(date +%s)" -ge "${deadline}" ]; then
echo "FAIL: activation not ready within ${load_timeout}s (last state: ${state:-unreachable})"
exit 1
fi
sleep 10
done
model=$(printf %s "${health}" | python3 -c '
import json, sys
a = json.load(sys.stdin).get("activation", {})
failed = a.get("failed", [])
if failed:
for f in failed:
msg = "FAILED " + str(f.get("model_id")) + ": " + str(f.get("error", ""))[:400]
sys.stderr.write(msg + chr(10))
sys.exit(1)
completed = a.get("completed", [])
print(completed[0] if completed else "")
')
if [ -z "${model}" ]; then
echo "no default models configured — skipping LLM probe"
exit 0
fi
echo "LLM probe against ${model}"
probe_body=$(printf '{"model":"%s","messages":[{"role":"user","content":"Reply with exactly one word: pineapple"}],"max_tokens":512,"temperature":0}' "${model}")
resp=$(curl -fsS --max-time 180 -H "content-type: application/json" \
-d "${probe_body}" http://localhost:13131/v1/chat/completions) || {
echo "FAIL: probe request errored"
exit 1
}
if printf %s "${resp}" | grep -qi pineapple; then
echo "LLM probe passed"
else
echo "FAIL: probe response missing expected token"
printf %s "${resp}" | head -c 2000
echo
exit 1
fi
DEPLOY
- name: Ensure firewalld allows helexa-neuron
run: |
ssh gitea_ci@${{ matrix.host }} '
if ! sudo /usr/bin/firewall-cmd --query-service=helexa-neuron --quiet 2>/dev/null; then
sudo /usr/bin/firewall-cmd --add-service=helexa-neuron --permanent
sudo /usr/bin/firewall-cmd --reload
fi'
# Wait for the service to either come up or wedge, then capture
# the latest-invocation journal. Runs even on prior failure so a
# failed start step still leaves a usable record in the deploy log.
- name: Capture neuron.service startup journal
if: always()
run: |
sleep 10
ssh gitea_ci@${{ matrix.host }} \
'journalctl --unit neuron.service -I --no-pager'