Once a CUDA driver error has hit a forward or kv-cache call, the
device's context is unrecoverable in-process — subsequent kernels can
hang (the failure mode seen on beast on 2026-05-26), return garbage,
or trip another illegal-address. The harness now marks the model
poisoned on any forward / spawn_blocking / TP-task failure, refuses
further inference against it with a clear "unload and reload" error,
and surfaces `status: "poisoned"` on `/models` so an operator running
`curl beast:13131/models` (or cortex polling) can see the bad state.
Without this, a single OOM on a too-large prefill quietly turned every
subsequent request into a stuck wait on the pool lock; with it, the
first request fails fast with the driver error in the journal and the
client gets a usable 5xx instead of a hung connection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>