cortex

Files

feat(neuron,candle): detect CUDA context poisoning and refuse follow-ups

Once a CUDA driver error has hit a forward or kv-cache call, the
device's context is unrecoverable in-process — subsequent kernels can
hang (the failure mode seen on beast on 2026-05-26), return garbage,
or trip another illegal-address. The harness now marks the model
poisoned on any forward / spawn_blocking / TP-task failure, refuses
further inference against it with a clear "unload and reload" error,
and surfaces `status: "poisoned"` on `/models` so an operator running
`curl beast:13131/models` (or cortex polling) can see the bad state.

Without this, a single OOM on a too-large prefill quietly turned every
subsequent request into a stuck wait on the pool lock; with it, the
first request fails fast with the driver error in the journal and the
client gets a usable 5xx instead of a hung connection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-26 12:28:42 +03:00

src

feat(neuron,candle): detect CUDA context poisoning and refuse follow-ups

2026-05-26 12:28:42 +03:00

tests

Stage 7a-ii: real NCCL handshake behind the worker pool

2026-05-19 16:40:01 +03:00

build.rs

feat(stage-8d-1): import mistralrs GDN CUDA kernels — build infra only

2026-05-21 11:34:11 +03:00

Cargo.toml

feat(stage-8d-7): direct safetensors fused-region loader

2026-05-21 17:49:35 +03:00