All checks were successful
build-prerelease / Resolve version stamps + change detection (push) Successful in 33s
build-prerelease / Lint (fmt + clippy) (push) Successful in 2m20s
build-prerelease / Build helexa-bench binary (push) Has been skipped
build-prerelease / Package helexa-bench RPM (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 1m46s
build-prerelease / Build neuron-ada (push) Successful in 2m9s
build-prerelease / Build cortex binary (push) Successful in 2m24s
build-prerelease / Build neuron-ampere (push) Successful in 2m52s
build-prerelease / Test (push) Successful in 4m16s
build-prerelease / Package cortex RPM (push) Successful in 1m25s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 1m43s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 1m43s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 1m44s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 55s
The deeper reason opencode showed "Context: 0 tokens / 0% used" and flew into a 400: streaming responses carried NO `usage`. Clients track context (and trigger compaction) from the `usage` field; the legacy candle streaming path set `usage: None` on every chunk, so a streaming client had no token count at all — `max_model_len` alone is a denominator with no numerator. InferenceEvent::Finish now carries prompt_tokens + completion_tokens (the streaming loops already have both: prompt_tokens.len() and the generated all_tokens.len()). The openai_chat projector emits an OpenAI-style trailing usage chunk (empty `choices`, populated `usage`) after the finish chunk. cortex's Anthropic stream translator already reads chunk.usage, so this fixes context tracking on BOTH the OpenAI (opencode) and Anthropic (Claude Code) paths. Also harden the max_model_len plumbing's sibling: cortex re-polls /discovery while a neuron's max_prompt_tokens is still 0 (unknown), so a rolling-deploy race where cortex caches discovery before the neuron has the field self-heals instead of pinning max_model_len to None until a manual cortex restart. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>