Two reasons the previous run silently bailed after POST /models/load:
1. Default model was Qwen/Qwen3-0.6B-GGUF (official). That repo ships
ONLY Q8_0 — no Q4_K_M, no Q4_0, nothing else. The GGUF filename
matcher in CandleHarness::resolve_files returned "no GGUF file
matching quant Q4_K_M" and the load endpoint returned an error,
but the script used `curl --silent --fail` and swallowed it.
2. /models/load is synchronous (it awaits the full HF download + GGUF
parse). curl --max-time 30 was way too short for a 400 MB fresh
download.
Fixes:
- Default model is now unsloth/Qwen3-0.6B-GGUF, which mirrors the
full Q-spectrum (Q2_K through Q8_0 plus BF16) so Q4_K_M actually
exists.
- trigger_load / run_probe now use --write-out to capture HTTP code
and emit the response body on non-2xx, so failures surface a real
diagnostic instead of an opaque set -e abort.
- LOAD_TIMEOUT bumped to 600s; INFER_TIMEOUT to 120s.
- Probe payload built via `yq -n` so JSON quoting is reliable
regardless of the prompt text.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>