cortex

helexa/cortex

Fork 0

Files

History

rob thijssen abbedf8d8a

build-prerelease / Resolve version stamps (push) Successful in 44s

Details

CI / Format (push) Successful in 45s

Details

CI / Clippy (push) Successful in 2m41s

Details

build-prerelease / Build neuron-blackwell (push) Successful in 5m35s

Details

build-prerelease / Build cortex binary (push) Successful in 4m32s

Details

CI / Test (push) Successful in 5m29s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

build-prerelease / Package cortex RPM (push) Successful in 1m20s

Details

build-prerelease / Build neuron-ampere (push) Successful in 8m6s

Details

build-prerelease / Build neuron-ada (push) Successful in 5m19s

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m55s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m57s

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s

Details

chore(neuron): bump default max_tokens from 512 to 8192

512 is too low for any modern coding model — clients that don't
explicitly set max_tokens get clipped responses with no diagnostic.
Bump the fallback at all four inference call sites (single-GPU
streaming + non-streaming, TP leader + non-leader) to 8192, which
fits comfortably within Qwen3-class context windows after a
typical agent prompt and lines up with what helexa-acp / a0 / curl
clients reasonably expect.

Clients that explicitly set max_tokens (now including helexa-acp
via HELEXA_ACP_MAX_TOKENS / per-endpoint TOML) override this.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-28 12:38:28 +03:00

src

chore(neuron): bump default max_tokens from 512 to 8192

2026-05-28 12:38:28 +03:00

tests

refactor(neuron): phase 3 — TP forward + NCCL state move onto device worker

2026-05-27 10:16:02 +03:00

build.rs

feat(stage-8d-1): import mistralrs GDN CUDA kernels — build infra only

2026-05-21 11:34:11 +03:00

Cargo.toml

feat(stage-8d-7): direct safetensors fused-region loader

2026-05-21 17:49:35 +03:00