All checks were successful
build-prerelease / Resolve version stamps + change detection (push) Successful in 37s
build-prerelease / Build neuron-blackwell (push) Successful in 1m36s
build-prerelease / Lint (fmt + clippy) (push) Successful in 2m33s
build-prerelease / Build neuron-ada (push) Successful in 2m2s
build-prerelease / Build neuron-ampere (push) Successful in 2m47s
build-prerelease / Build helexa-bench binary (push) Successful in 2m8s
build-prerelease / Build cortex binary (push) Successful in 2m35s
build-prerelease / Test (push) Successful in 5m13s
build-prerelease / Package helexa-bench RPM (push) Successful in 1m17s
build-prerelease / Package cortex RPM (push) Successful in 1m18s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 1m43s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 1m42s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 1m43s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 54s
Resolves #62. opencode's helexa provider discovers a model's serving budget from /v1/models and uses it to size context, trigger compaction, and show spend with no hand-configuration. Each model entry now carries: - limit { context, input?, output } — operator-declared in models.toml - cost { input, output, cache_read?, cache_write? } — USD per 1M tokens - tool_call / reasoning — runtime-detected by the candle harness and OR-ed in from each serving neuron Composition: the catalogue profile supplies limit/cost (Pass 1); the poller carries the neuron's detected tool_call/reasoning into ModelEntry, which the gateway unions onto the entry (Pass 2); aliases propagate every field (Pass 4). Wire types extend ModelInfo / ModelProfile / CortexModelEntry additively (serde default + skip_serializing_if), so older neurons and clients are unaffected. helexa-bench's ModelInfo constructor and the gateway test fixtures are updated for the new fields. Adds tests/model_limits.rs asserting /v1/models surfaces limit + cost (catalogue) and tool_call + reasoning (runtime), and that max_model_len is gone. Removes max_model_len. It was write-only with no consumer — opencode's source references it nowhere and it is not an OpenAI /v1/models field — and doubly misleading: vLLM's max_model_len means total sequence length, but cortex populated it from NEURON_MAX_PROMPT_TOKENS, a prompt-only cap. The limit{} contract replaces it. The neuron's max_prompt_tokens remains the enforced prompt cap (neuron-side); cortex just stops re-advertising a derived, mis-named copy. Closes #66 — its stale-max_model_len premise is moot once the field is gone. limit/cost are operator-declared (catalogue) per #62's design; auto- deriving the advertised budget from each neuron's reported cap is a tracked follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>