helexa/asset/sudoers.d/neuron-host.conf at c83f1eb98cdae53c92bc79e3c4be355d6068aa06

helexa/helexa

Fork 0

Files

rob thijssen 6088830e7d

build-prerelease / Resolve version stamps + change detection (push) Successful in 30s

Details

build-prerelease / Lint (fmt + clippy) (push) Has been skipped

Details

build-prerelease / Build neuron-blackwell (push) Has been skipped

Details

build-prerelease / Build neuron-ampere (push) Has been skipped

Details

build-prerelease / Build neuron-ada (push) Has been skipped

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped

Details

build-prerelease / Test (push) Has been skipped

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped

Details

build-prerelease / Build cortex binary (push) Has been skipped

Details

build-prerelease / Package cortex RPM (push) Has been skipped

Details

build-prerelease / Build helexa-bench binary (push) Has been skipped

Details

build-prerelease / Package helexa-bench RPM (push) Has been skipped

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped

Details

feat(deploy): manage NEURON_MAX_PROMPT_TOKENS per host via model.conf drop-in

Roll the per-model context cap into deploy.yml so it is deterministic per
host and rolled out (with a restart) alongside the rest of the service
config, rather than hand-edited in local.conf. The deploy now writes
/etc/systemd/system/neuron.service.d/model.conf from a new per-host
`max_prompt_tokens` matrix field, and restarts a neuron when the package
OR the drop-in changes — so a cap change applies even with no new RPM.

beast (Qwen3.6-27B, hybrid linear, 2x 32GB) -> 131072 (~128k); benjy and
quadbrat (dense, VRAM-bound) stay at 16384 but become deploy-managed.

Adds the scoped sudoers grant for the root-owned drop-in install, and
doc/context-limits.md documenting the knob relationships and KV/VRAM math
(refs #62 for the eventual /models-advertised source of truth, #65 for
the length-aware text VRAM guard that gates pushing beyond 128k).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-16 18:48:19 +03:00

3.1 KiB

Raw Blame History

View Raw

3.1 KiB Raw Blame History

3.1 KiB

Raw Blame History