Stage 1's build seam (#50): the interface auth, metering, and budget
enforcement all hang off, with a local/static provider so the A0
amplification fix can land before any upstream clearing house exists.
The future helexa-upstream client (#57) is just another impl.
- cortex-core::entitlements: Principal {account_id, key_id}, CapWindow
(Balance | Rolling{seconds}), Reservation handle, BudgetSnapshot,
AuthError/BudgetError, and the async EntitlementProvider trait
(resolve / reserve / settle / release / snapshot). BudgetError carries
the window semantics so callers pick the #63 code (rate_limit_exceeded
+ Retry-After vs insufficient_quota) without the provider touching HTTP.
- cortex-core::config: [entitlements] section on GatewayConfig
(require_auth + [[entitlements.keys]] with account_id, optional key_id,
hard_cap, window). Additive + serde(default) — anonymous/uncapped when
omitted, so existing setups are unaffected.
- cortex-gateway::entitlements_local: LocalEntitlementProvider. Budget
math serialized under one Mutex so spent+reserved can never exceed a
hard cap under concurrency (the #52 guarantee); rolling windows reset
lazily; uncapped keys (no hard_cap) always reserve but still meter.
- CortexState gains Arc<dyn EntitlementProvider> + require_auth, built in
from_config. Not yet consumed by the request path — auth middleware is
1b (#49), enforcement is 1d (#52).
- cortex.example.toml documents the section; test GatewayConfig literals
updated for the new field.
6 provider unit tests (resolve, unknown-key, round-trip, balance/rolling
over-cap codes, uncapped infra key). Local fmt/clippy/test all green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Neuron hardcodes its bind_url as `http://localhost:13131` (it can't
reliably know its own externally-resolvable name). When cortex runs
on a different host than the neuron it's routing to, blindly
proxying to that URL hits localhost on the cortex box instead of the
neuron.
Cortex already knows each neuron's reachable host from cortex.toml.
After fetching the inference URL from `/models/{id}/endpoint`, if
the host is a loopback name (localhost / 127.0.0.1 / 0.0.0.0 / ::1),
swap it for the configured neuron host. Preserve the port and path
from neuron's URL so a future harness serving inference on a
different port than the management API still works.
Adds `url` (already a transitive dep via reqwest) as a direct
dep for the URL parsing.
Tests cover: localhost rewrite, distinct inference port preservation,
non-loopback passthrough, malformed input.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace NodeConfig (static vram_mb, pinned) with NeuronEndpoint.
Hardware discovery and model pinning now come from neuron API and
models.toml catalogue respectively.
- config.rs: nodes -> neurons, add models_config path
- catalogue.rs: ModelProfile with pinned_on, ModelCatalogue
- poller.rs: poll neuron GET /models (ModelInfo format)
- router.rs: resolve inference endpoint via neuron GET /models/{id}/endpoint
- evictor.rs: call neuron POST /models/unload
- node.rs: remove vram_mb, pinned fields (come from discovery/catalogue)
- All 22 gateway tests updated to mock neuron API
- Remove MistralModelsResponse, ModelLifecycleRequest (no longer needed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 tests proving the scaffold works end-to-end:
- chat completion proxied through gateway to mock backend
- /health endpoint with healthy node
- /v1/models returns seeded model list
- 404 for unknown model
- 404 when no healthy nodes available
- 400 when request body missing model field
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>