helexa

Author	SHA1	Message	Date
rob thijssen	cdf87284af	feat(#47 phase 1d): budget enforcement — hard caps, reserve→settle, 429 All checks were successful CI / Format (push) Successful in 1s Details CI / CUDA type-check (push) Successful in 1m40s Details CI / Clippy (push) Successful in 2m40s Details CI / Test (push) Successful in 6m23s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Resolve version stamps + change detection (push) Successful in 34s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m19s Details build-prerelease / Test (push) Successful in 4m28s Details build-prerelease / Build neuron-blackwell (push) Has been skipped Details build-prerelease / Build neuron-ampere (push) Has been skipped Details build-prerelease / Build neuron-ada (push) Has been skipped Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped Details build-prerelease / Build helexa-bench binary (push) Has been skipped Details build-prerelease / Package helexa-bench RPM (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 2m27s Details build-prerelease / Package cortex RPM (push) Successful in 1m23s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 50s Details Stage 1 complete: the A0 seatbelt (#52). Flips the metering-only reserve(0) from #51 to the request's real upper-bound cost and refuses over-cap requests before neuron is hit. - metering::reservation_estimate: prompt estimate (~4 chars/token over the body — cortex has no tokenizer, so a conservative over-estimate; neuron stays the exact context wall) + max output. Max output comes from max_completion_tokens / legacy max_tokens, else the model's advertised limit.output (#62), else FALLBACK_MAX_OUTPUT. Over-reserving is safe — settle reconciles to actual. - metering::reserve_or_reject: reserve the estimate; on BudgetError map to the #63 envelope and the caller refuses before dispatch — rolling window → 429 rate_limit_exceeded + Retry-After (until reset); hard balance → 429 insufficient_quota (no Retry-After). Never 402. - Wired into both the OpenAI proxy path (proxy_with_metrics) and the Anthropic path (estimate from the translated body). advertised_output_limit reads the loaded model's limit.output from fleet state. - Reservation prevents overshoot under concurrency: a successful reserve gates on spent+reserved+estimate ≤ cap, and settle records actual ≤ reserved, so spend can never exceed the hard cap. 4 integration tests with a hit-counting mock neuron: balance over-cap → 429 insufficient_quota (no Retry-After, not dispatched); rolling over-cap → 429 rate_limit_exceeded + Retry-After (not dispatched); within-cap served; A0 repro — a capped key's 20-request fan-out drains the cap, then is refused, neuron only saw the served ones, and spend never exceeds the cap. Plus 5 metering unit tests. Local fmt/clippy/test all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 19:35:04 +03:00

1 Commits