helexa

helexa/helexa

Fork 0

Files

History

rob thijssen cb758d4706

build-prerelease / Resolve version stamps + change detection (push) Successful in 33s

Details

build-prerelease / Lint (fmt + clippy) (push) Successful in 2m20s

Details

build-prerelease / Build helexa-bench binary (push) Has been skipped

Details

build-prerelease / Package helexa-bench RPM (push) Has been skipped

Details

build-prerelease / Build neuron-blackwell (push) Successful in 1m46s

Details

build-prerelease / Build neuron-ada (push) Successful in 2m9s

Details

build-prerelease / Build cortex binary (push) Successful in 2m24s

Details

build-prerelease / Build neuron-ampere (push) Successful in 2m52s

Details

build-prerelease / Test (push) Successful in 4m16s

Details

build-prerelease / Package cortex RPM (push) Successful in 1m25s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 1m43s

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 1m43s

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 1m44s

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 55s

Details

feat(neuron): emit usage on the streaming path so clients can track context

The deeper reason opencode showed "Context: 0 tokens / 0% used" and flew
into a 400: streaming responses carried NO `usage`. Clients track context
(and trigger compaction) from the `usage` field; the legacy candle
streaming path set `usage: None` on every chunk, so a streaming client
had no token count at all — `max_model_len` alone is a denominator with
no numerator.

InferenceEvent::Finish now carries prompt_tokens + completion_tokens
(the streaming loops already have both: prompt_tokens.len() and the
generated all_tokens.len()). The openai_chat projector emits an
OpenAI-style trailing usage chunk (empty `choices`, populated `usage`)
after the finish chunk. cortex's Anthropic stream translator already
reads chunk.usage, so this fixes context tracking on BOTH the OpenAI
(opencode) and Anthropic (Claude Code) paths.

Also harden the max_model_len plumbing's sibling: cortex re-polls
/discovery while a neuron's max_prompt_tokens is still 0 (unknown), so a
rolling-deploy race where cortex caches discovery before the neuron has
the field self-heals instead of pinning max_model_len to None until a
manual cortex restart.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-15 19:43:59 +03:00

src

feat(neuron): emit usage on the streaming path so clients can track context

2026-06-15 19:43:59 +03:00

tests

fix(cortex): emit tool_call arguments as an object so Qwen3.6 can chain tools

2026-06-15 16:43:17 +03:00

Cargo.toml

fix(router): rewrite loopback inference URLs to use neuron's host

2026-05-22 06:23:47 +03:00