cortex

helexa/cortex

Fork 0

Files

History

rob thijssen 249b2e5c98

build-prerelease / Resolve version stamps (push) Successful in 38s

Details

CI / Clippy (push) Successful in 2m22s

Details

CI / Test (push) Successful in 4m55s

Details

build-prerelease / Build cortex binary (push) Successful in 4m24s

Details

build-prerelease / Build neuron-blackwell (push) Successful in 5m49s

Details

build-prerelease / Package cortex RPM (push) Successful in 1m23s

Details

build-prerelease / Build neuron-ampere (push) Successful in 8m7s

Details

build-prerelease / Build neuron-ada (push) Successful in 5m0s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m6s

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m48s

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m5s

Details

CI / Format (push) Failing after 33s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

fix(neuron): only poison the model on actual device faults

Previously every inference Err — shape mismatch, NaN logits, tokenizer
error, missing handle — marked the model poisoned and rejected every
subsequent request until an operator unload+reloaded. The benjy
incident on 2026-05-27 showed how this misfires: a concurrency bug
produced a `broadcast_add: shape mismatch` error that had nothing to
do with CUDA, but the model was taken down anyway.

Add `is_device_fault(err_chain: &str)` — a conservative classifier
that returns false only for errors we know are pre-kernel / CPU-side
(shape mismatches, NaN logits, tokenize/detokenize, missing handle,
DecodeStream, empty prompt). Everything else defaults to true so a
genuine driver fault still poisons.

Applied at all six poisoning sites:
  - chat_completion CUDA worker path
  - chat_completion CPU spawn_blocking path
  - chat_completion_stream CUDA worker path
  - chat_completion_stream CPU spawn_blocking path
  - chat_completion_tp non-streaming wrapper
  - chat_completion_tp_stream spawned task

Each site now logs either "model marked poisoned" (device fault) or
"model NOT marked poisoned" (non-device) so the journal makes the
classification visible. Tests cover the known non-device patterns and
a couple of real CUDA driver messages.

Pairs with the inference_lock commit (c59da83): together they
eliminate both the cause of the spurious-poisoning we just observed
(the shape mismatch) AND the over-reaction to it (the unconditional
poison). Each fix is independently useful but the combination is
what makes the system actually robust to concurrent agent workloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-27 18:57:48 +03:00

cortex-cli

feat(neuron): OpenAI-compatible non-streaming chat completion

2026-05-18 16:47:58 +03:00

cortex-core

feat(catalogue,gateway): model aliases (helexa/small, helexa/balanced, helexa/large)

2026-05-26 16:10:41 +03:00

cortex-gateway

feat(catalogue,gateway): model aliases (helexa/small, helexa/balanced, helexa/large)

2026-05-26 16:10:41 +03:00

neuron

fix(neuron): only poison the model on actual device faults

2026-05-27 18:57:48 +03:00