All checks were successful
build-prerelease / Resolve version stamps + change detection (push) Successful in 30s
build-prerelease / Lint (fmt + clippy) (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Has been skipped
build-prerelease / Build neuron-ampere (push) Has been skipped
build-prerelease / Build neuron-ada (push) Has been skipped
build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped
build-prerelease / Test (push) Has been skipped
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped
build-prerelease / Build cortex binary (push) Has been skipped
build-prerelease / Package cortex RPM (push) Has been skipped
build-prerelease / Build helexa-bench binary (push) Has been skipped
build-prerelease / Package helexa-bench RPM (push) Has been skipped
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped
Roll the per-model context cap into deploy.yml so it is deterministic per host and rolled out (with a restart) alongside the rest of the service config, rather than hand-edited in local.conf. The deploy now writes /etc/systemd/system/neuron.service.d/model.conf from a new per-host `max_prompt_tokens` matrix field, and restarts a neuron when the package OR the drop-in changes — so a cap change applies even with no new RPM. beast (Qwen3.6-27B, hybrid linear, 2x 32GB) -> 131072 (~128k); benjy and quadbrat (dense, VRAM-bound) stay at 16384 but become deploy-managed. Adds the scoped sudoers grant for the root-owned drop-in install, and doc/context-limits.md documenting the knob relationships and KV/VRAM math (refs #62 for the eventual /models-advertised source of truth, #65 for the length-aware text VRAM guard that gates pushing beyond 128k). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3.1 KiB
3.1 KiB