Files
cortex/crates/neuron
rob thijssen abbedf8d8a
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 44s
CI / Format (push) Successful in 45s
CI / Clippy (push) Successful in 2m41s
build-prerelease / Build neuron-blackwell (push) Successful in 5m35s
build-prerelease / Build cortex binary (push) Successful in 4m32s
CI / Test (push) Successful in 5m29s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Package cortex RPM (push) Successful in 1m20s
build-prerelease / Build neuron-ampere (push) Successful in 8m6s
build-prerelease / Build neuron-ada (push) Successful in 5m19s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m55s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m57s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s
chore(neuron): bump default max_tokens from 512 to 8192
512 is too low for any modern coding model — clients that don't
explicitly set max_tokens get clipped responses with no diagnostic.
Bump the fallback at all four inference call sites (single-GPU
streaming + non-streaming, TP leader + non-leader) to 8192, which
fits comfortably within Qwen3-class context windows after a
typical agent prompt and lines up with what helexa-acp / a0 / curl
clients reasonably expect.

Clients that explicitly set max_tokens (now including helexa-acp
via HELEXA_ACP_MAX_TOKENS / per-endpoint TOML) override this.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 12:38:28 +03:00
..