Files
helexa/crates/helexa-bench
rob thijssen 8a636c687f
All checks were successful
build-prerelease / Resolve version stamps + change detection (push) Successful in 37s
build-prerelease / Build neuron-blackwell (push) Successful in 1m36s
build-prerelease / Lint (fmt + clippy) (push) Successful in 2m33s
build-prerelease / Build neuron-ada (push) Successful in 2m2s
build-prerelease / Build neuron-ampere (push) Successful in 2m47s
build-prerelease / Build helexa-bench binary (push) Successful in 2m8s
build-prerelease / Build cortex binary (push) Successful in 2m35s
build-prerelease / Test (push) Successful in 5m13s
build-prerelease / Package helexa-bench RPM (push) Successful in 1m17s
build-prerelease / Package cortex RPM (push) Successful in 1m18s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 1m43s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 1m42s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 1m43s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 54s
feat(cortex): per-model limit + cost on /v1/models; remove max_model_len
Resolves #62. opencode's helexa provider discovers a model's serving
budget from /v1/models and uses it to size context, trigger compaction,
and show spend with no hand-configuration. Each model entry now carries:

  - limit { context, input?, output }  — operator-declared in models.toml
  - cost  { input, output, cache_read?, cache_write? }  — USD per 1M tokens
  - tool_call / reasoning  — runtime-detected by the candle harness and
    OR-ed in from each serving neuron

Composition: the catalogue profile supplies limit/cost (Pass 1); the
poller carries the neuron's detected tool_call/reasoning into ModelEntry,
which the gateway unions onto the entry (Pass 2); aliases propagate every
field (Pass 4). Wire types extend ModelInfo / ModelProfile /
CortexModelEntry additively (serde default + skip_serializing_if), so
older neurons and clients are unaffected. helexa-bench's ModelInfo
constructor and the gateway test fixtures are updated for the new fields.
Adds tests/model_limits.rs asserting /v1/models surfaces limit + cost
(catalogue) and tool_call + reasoning (runtime), and that max_model_len
is gone.

Removes max_model_len. It was write-only with no consumer — opencode's
source references it nowhere and it is not an OpenAI /v1/models field —
and doubly misleading: vLLM's max_model_len means total sequence length,
but cortex populated it from NEURON_MAX_PROMPT_TOKENS, a prompt-only cap.
The limit{} contract replaces it. The neuron's max_prompt_tokens remains
the enforced prompt cap (neuron-side); cortex just stops re-advertising a
derived, mis-named copy. Closes #66 — its stale-max_model_len premise is
moot once the field is gone.

limit/cost are operator-declared (catalogue) per #62's design; auto-
deriving the advertised budget from each neuron's reported cap is a
tracked follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 09:26:55 +03:00
..