feat(script): validate-neuron.sh waits for /health activation=ready
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 30s
CI / Format (push) Successful in 30s
CI / Clippy (push) Successful in 2m12s
build-prerelease / Build neuron-blackwell (push) Successful in 3m48s
CI / Test (push) Successful in 5m2s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build cortex binary (push) Successful in 5m11s
build-prerelease / Package cortex RPM (push) Successful in 1m21s
build-prerelease / Build neuron-ampere (push) Successful in 5m25s
build-prerelease / Build neuron-ada (push) Successful in 4m58s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m0s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 6m50s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s

Adds wait_for_ready() that polls /health until activation.state flips
to "ready" (or the NEURON_LOAD_TIMEOUT deadline). Inserted between
probe_health and the is_loaded/trigger_load step.

Before this, running validate-neuron.sh right after deploy.sh raced
the background pre-warm and failed in ~9 ms with "neuron not reachable"
(the pre-2026-05-26 build) or with a partial-load error (the new
build, where the listener binds before default_models finishes).

The poll prints the in_progress model on each tick so an operator
watching the log can see which model is delaying readiness. Backs off
from 2s to 10s after the first few iterations so a long TP load
doesn't spam.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-26 15:26:21 +03:00
parent b9e7a76a7a
commit becf61b9c1

View File

@@ -66,6 +66,58 @@ probe_health() {
|| die "neuron not reachable at ${BASE}/health" || die "neuron not reachable at ${BASE}/health"
} }
# Block until the neuron reports `activation.state == "ready"` on
# `/health`. Without this, validate-neuron.sh used to race the
# background pre-warm (the listener binds immediately but big TP
# loads run for minutes after) and either fail with ECONNREFUSED
# (pre-2026-05-26 build, where load was synchronous before bind) or
# get a 404 from /models/load against a partially-loaded model.
#
# The poll cap is `NEURON_LOAD_TIMEOUT` since pre-warm and an
# on-demand load are the same operation under different triggers.
# Short interval at the start (catches a quick-loading host without
# extra latency) backs off after the first few iterations to keep
# log spam down on a slow load.
wait_for_ready() {
local deadline=$(( $(date +%s) + LOAD_TIMEOUT ))
local state= attempt=0
while (( $(date +%s) < deadline )); do
attempt=$(( attempt + 1 ))
state=$(
curl --silent --max-time 5 "${BASE}/health" \
| jq -r '.activation.state // "unknown"'
) || state=unreachable
case "${state}" in
ready)
say "/health activation.state=ready (after ${attempt} probe(s))"
return 0
;;
pre_warming)
local in_progress
in_progress=$(
curl --silent --max-time 5 "${BASE}/health" \
| jq -r '.activation.in_progress // "<none>"'
) || in_progress='<unreadable>'
say "/health pre_warming (in_progress=${in_progress}); waiting"
;;
unreachable)
say "/health unreachable; waiting"
;;
*)
say "/health unexpected activation.state=${state}; waiting"
;;
esac
# 2s for the first few iterations to catch quick loads, then
# 10s to avoid log spam on a multi-minute TP load.
if (( attempt < 5 )); then
sleep 2
else
sleep 10
fi
done
die "neuron not ready within ${LOAD_TIMEOUT}s (last state: ${state})"
}
list_loaded_ids() { list_loaded_ids() {
# The manifest is YAML and uses yq; HTTP responses are JSON and use # The manifest is YAML and uses yq; HTTP responses are JSON and use
# jq directly. pip-yq parses input as YAML by default, which trips # jq directly. pip-yq parses input as YAML by default, which trips
@@ -157,6 +209,11 @@ run_probe() {
say "validating neuron at ${BASE}" say "validating neuron at ${BASE}"
probe_health probe_health
say "/health OK" say "/health OK"
# Background pre-warm from default_models means /health is reachable
# but `activation.state` can still be `pre_warming` for minutes after
# service start. Block here so the subsequent is_loaded / trigger_load
# steps don't race a partially-materialised model.
wait_for_ready
if is_loaded; then if is_loaded; then
say "${MODEL_ID} already loaded" say "${MODEL_ID} already loaded"