Commit Graph

14 Commits

Author SHA1 Message Date
bc74e0e95f feat(#47 phase 1a): EntitlementProvider trait + local/static provider
Some checks failed
CI / Format (push) Successful in 38s
CI / CUDA type-check (push) Successful in 1m39s
CI / Clippy (push) Successful in 2m26s
CI / Test (push) Successful in 4m49s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Package helexa-bench RPM (push) Blocked by required conditions
build-prerelease / Resolve version stamps + change detection (push) Successful in 32s
build-prerelease / Build neuron-blackwell (push) Successful in 1m40s
build-prerelease / Build neuron-ada (push) Successful in 2m19s
build-prerelease / Build neuron-ampere (push) Successful in 2m22s
build-prerelease / Lint (fmt + clippy) (push) Successful in 2m49s
build-prerelease / Build cortex binary (push) Successful in 3m0s
build-prerelease / Test (push) Successful in 4m25s
build-prerelease / Package cortex RPM (push) Successful in 1m32s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 1m50s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 1m49s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 1m54s
build-prerelease / Build helexa-bench binary (push) Successful in 2m12s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
Stage 1's build seam (#50): the interface auth, metering, and budget
enforcement all hang off, with a local/static provider so the A0
amplification fix can land before any upstream clearing house exists.
The future helexa-upstream client (#57) is just another impl.

- cortex-core::entitlements: Principal {account_id, key_id}, CapWindow
  (Balance | Rolling{seconds}), Reservation handle, BudgetSnapshot,
  AuthError/BudgetError, and the async EntitlementProvider trait
  (resolve / reserve / settle / release / snapshot). BudgetError carries
  the window semantics so callers pick the #63 code (rate_limit_exceeded
  + Retry-After vs insufficient_quota) without the provider touching HTTP.
- cortex-core::config: [entitlements] section on GatewayConfig
  (require_auth + [[entitlements.keys]] with account_id, optional key_id,
  hard_cap, window). Additive + serde(default) — anonymous/uncapped when
  omitted, so existing setups are unaffected.
- cortex-gateway::entitlements_local: LocalEntitlementProvider. Budget
  math serialized under one Mutex so spent+reserved can never exceed a
  hard cap under concurrency (the #52 guarantee); rolling windows reset
  lazily; uncapped keys (no hard_cap) always reserve but still meter.
- CortexState gains Arc<dyn EntitlementProvider> + require_auth, built in
  from_config. Not yet consumed by the request path — auth middleware is
  1b (#49), enforcement is 1d (#52).
- cortex.example.toml documents the section; test GatewayConfig literals
  updated for the new field.

6 provider unit tests (resolve, unknown-key, round-trip, balance/rolling
over-cap codes, uncapped infra key). Local fmt/clippy/test all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 19:00:05 +03:00
8b2e01a072 feat(#67 phase 4): advertise neuron-computed limit on /models; drop catalogue override
Some checks failed
CI / Test (push) Waiting to run
CI / Format (push) Successful in 35s
CI / CUDA type-check (push) Successful in 2m12s
CI / Clippy (push) Successful in 2m10s
CI / Build cortex SRPM (push) Has been cancelled
CI / Build neuron SRPM (push) Has been cancelled
CI / Publish cortex to COPR (push) Has been cancelled
CI / Publish neuron to COPR (push) Has been cancelled
CI / Bump version in source (push) Has been cancelled
The neuron now self-derives and advertises limit{context,input,output}
per loaded model; cortex forwards it and stops consulting the
operator-declared catalogue limit (which can't track hot-swapped models
or live capacity). Operator-set `cost` still flows from the catalogue.

neuron:
- CandleHarness gains context_limit_cfg (from [harness.candle.context_limit]).
- LoadedHandle::derived_limit(): profile + live tightest-card free VRAM
  (single: query_vram; TP: query_vram_tightest_free_mb) + prefill-rate
  EMA (bootstrap until first sample) → derive_limit. None for arches
  without a context profile. No operator clamp here (advertise the honest
  derived value; the clamp is an enforcement-side backstop).
- list_models() fills ModelInfo.limit from derived_limit (was None).
- derive_limit treats free_tightest_mb == 0 (unknown/CPU sentinel) as
  "no VRAM ceiling" instead of collapsing to zero.

cortex:
- ModelEntry gains `limit`, copied from ModelInfo.limit by the poller.
- /v1/models: catalogue `limit` no longer flows (Pass 1 sets None);
  Pass 2 adopts the neuron's limit, taking the tightest across neurons
  via tightest_limit(). cost unchanged.
- model_limits.rs rewritten: catalogue limit (999999) is ignored; the
  neuron's ModelEntry.limit is advertised; cost still from catalogue.
- All ModelEntry literals updated with the new field.

fmt/clippy/test green; CUDA paths type-checked in CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 14:10:20 +03:00
8a636c687f feat(cortex): per-model limit + cost on /v1/models; remove max_model_len
All checks were successful
build-prerelease / Resolve version stamps + change detection (push) Successful in 37s
build-prerelease / Build neuron-blackwell (push) Successful in 1m36s
build-prerelease / Lint (fmt + clippy) (push) Successful in 2m33s
build-prerelease / Build neuron-ada (push) Successful in 2m2s
build-prerelease / Build neuron-ampere (push) Successful in 2m47s
build-prerelease / Build helexa-bench binary (push) Successful in 2m8s
build-prerelease / Build cortex binary (push) Successful in 2m35s
build-prerelease / Test (push) Successful in 5m13s
build-prerelease / Package helexa-bench RPM (push) Successful in 1m17s
build-prerelease / Package cortex RPM (push) Successful in 1m18s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 1m43s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 1m42s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 1m43s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 54s
Resolves #62. opencode's helexa provider discovers a model's serving
budget from /v1/models and uses it to size context, trigger compaction,
and show spend with no hand-configuration. Each model entry now carries:

  - limit { context, input?, output }  — operator-declared in models.toml
  - cost  { input, output, cache_read?, cache_write? }  — USD per 1M tokens
  - tool_call / reasoning  — runtime-detected by the candle harness and
    OR-ed in from each serving neuron

Composition: the catalogue profile supplies limit/cost (Pass 1); the
poller carries the neuron's detected tool_call/reasoning into ModelEntry,
which the gateway unions onto the entry (Pass 2); aliases propagate every
field (Pass 4). Wire types extend ModelInfo / ModelProfile /
CortexModelEntry additively (serde default + skip_serializing_if), so
older neurons and clients are unaffected. helexa-bench's ModelInfo
constructor and the gateway test fixtures are updated for the new fields.
Adds tests/model_limits.rs asserting /v1/models surfaces limit + cost
(catalogue) and tool_call + reasoning (runtime), and that max_model_len
is gone.

Removes max_model_len. It was write-only with no consumer — opencode's
source references it nowhere and it is not an OpenAI /v1/models field —
and doubly misleading: vLLM's max_model_len means total sequence length,
but cortex populated it from NEURON_MAX_PROMPT_TOKENS, a prompt-only cap.
The limit{} contract replaces it. The neuron's max_prompt_tokens remains
the enforced prompt cap (neuron-side); cortex just stops re-advertising a
derived, mis-named copy. Closes #66 — its stale-max_model_len premise is
moot once the field is gone.

limit/cost are operator-declared (catalogue) per #62's design; auto-
deriving the advertised budget from each neuron's reported cap is a
tracked follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 09:26:55 +03:00
d662fa20ef fix(cortex): translate Anthropic tools to OpenAI shape + wire-debug logging
All checks were successful
build-prerelease / Resolve version stamps + change detection (push) Successful in 30s
build-prerelease / Lint (fmt + clippy) (push) Successful in 2m20s
build-prerelease / Build helexa-bench binary (push) Successful in 2m6s
build-prerelease / Build cortex binary (push) Successful in 2m20s
build-prerelease / Test (push) Successful in 4m12s
build-prerelease / Build neuron-blackwell (push) Successful in 1m38s
build-prerelease / Build neuron-ada (push) Successful in 2m5s
build-prerelease / Build neuron-ampere (push) Successful in 4m44s
build-prerelease / Package helexa-bench RPM (push) Successful in 1m17s
build-prerelease / Package cortex RPM (push) Successful in 1m17s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 1m41s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 1m42s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 1m48s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 53s
Claude Code (ANTHROPIC_BASE_URL -> cortex) hits POST /v1/messages, but
anthropic_to_openai forwarded the request's `tools` array verbatim via
the flattened `extra`. neuron feeds that straight into the HF chat
template, which iterates the OpenAI shape (tool.function.name/.parameters).
Anthropic-shaped tools ({name, description, input_schema}) rendered as
broken/empty definitions, the model improvised an unparseable
<tool_use_name>...</tool_use_name> tool-call format, neuron's
<tool_call>{json}</tool_call> detector missed it, and the markup fell
through as plain assistant text — so CC never received a structured
tool_use and the agent loop died.

Request-side translation now reshapes:
- tool definitions: {name, description, input_schema}
  -> {type:"function", function:{name, description, parameters}}
- tool_choice: auto->"auto", any->"required", none->"none",
  tool->{type:"function",function:{name}}
- assistant tool_use blocks -> OpenAI assistant.tool_calls
  (arguments JSON-stringified) — fixes multi-turn
- user tool_result blocks -> standalone role:"tool" messages keyed by
  tool_call_id
- system content blocks flatten to text instead of being JSON-serialised
  into the prompt; best-effort image-block -> image_url part

Wire-debug instrumentation (tracing levels only; cortex/neuron ship at
info, operator infra runs at debug):
- every handler emits a debug! "inbound request" line tagging the wire
  surface (anthropic | openai-chat | openai-responses | openai-completions)
  plus model/stream/tools and, for Anthropic, tool_history/system
- response side reports upstream_tool_calls + finish_reason, streaming
  and non-streaming
- full inbound + translated-upstream bodies at trace! (UTF-8-safe, capped)

Tests: 8 request-side unit tests + an end-to-end gateway test asserting
the upstream neuron receives OpenAI-shaped tools and a
user->assistant(+tool_calls)->tool->user history.

Also tighten script/infra-log-verbosity.sh: independent cortex/neuron
RUST_LOG args, cortex-only by default (neuron restart behind
--with-neuron so we don't needlessly cold-reload models), mkdir -p the
drop-in dir, symmetric RUST_LOG cleanup, and set -euo pipefail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 09:58:25 +03:00
569c528c4b feat(gateway): Anthropic streaming SSE translation (#24)
All checks were successful
CI / Format (push) Successful in 36s
CI / CUDA type-check (push) Successful in 2m25s
CI / Clippy (push) Successful in 2m25s
CI / Format (pull_request) Successful in 41s
CI / CUDA type-check (pull_request) Successful in 2m9s
CI / Clippy (pull_request) Successful in 2m45s
CI / Test (push) Successful in 5m3s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
CI / Test (pull_request) Successful in 4m29s
CI / Build cortex SRPM (pull_request) Has been skipped
CI / Publish cortex to COPR (pull_request) Has been skipped
CI / Build neuron SRPM (pull_request) Has been skipped
CI / Publish neuron to COPR (pull_request) Has been skipped
CI / Bump version in source (pull_request) Has been skipped
The /v1/messages handler translated request envelopes but proxied raw
OpenAI SSE frames back to streaming Anthropic clients — the gap
between the README's "point your tooling at it once" contract and
what Claude Code actually received.

cortex-core gains AnthropicStreamTranslator, a pure per-stream state
machine: OpenAI chunks in, ordered (event, payload) pairs out —
message_start → content_block_start/delta/stop (text and tool_use
blocks, indexed; tool_calls map to input_json_delta) → message_delta
(stop_reason mapped via the now-shared map_stop_reason, which also
teaches the non-streaming path tool_calls→tool_use) → message_stop.
Without an upstream usage frame the output count falls back to the
delta count (engine-exact for neuron's one-chunk-per-token streams,
#31); with one, input/output tokens ride message_delta.

cortex-gateway gains anthropic_sse: the wire pump that splits the
upstream byte stream into SSE events, parses data: payloads
(leniently — engines omit fields on special frames), feeds the
translator, and frames results as `event:`/`data:` pairs through a
bounded channel (slow client back-pressures the upstream read).
Upstream truncation without [DONE] still closes the Anthropic event
sequence. Nothing is buffered beyond the current event's bytes.

Tests: 5 state-machine unit tests (text flow, stop-reason mapping +
defaults, tool_use blocks, usage propagation, idempotent finish) and
2 gateway integration tests (full event sequence + text reassembly,
usage propagation into message_delta). Validated end-to-end by
running this branch's gateway against a production neuron and
streaming a live Anthropic request.

Closes #24

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 15:47:30 +03:00
6a36d15ef1 feat(gateway): per-request token metrics — TTFT and tok/s (#21)
All checks were successful
CI / Format (push) Successful in 45s
CI / Format (pull_request) Successful in 37s
CI / CUDA type-check (push) Successful in 2m25s
CI / Clippy (push) Successful in 2m37s
CI / Test (push) Successful in 4m22s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
CI / Clippy (pull_request) Successful in 2m23s
CI / Test (pull_request) Successful in 4m19s
CI / CUDA type-check (pull_request) Successful in 1m57s
CI / Build cortex SRPM (pull_request) Has been skipped
CI / Publish cortex to COPR (pull_request) Has been skipped
CI / Build neuron SRPM (pull_request) Has been skipped
CI / Publish neuron to COPR (pull_request) Has been skipped
CI / Bump version in source (pull_request) Has been skipped
The deferred Phase 6b, and the unblock for the 7→8 milestone's
benchmark work (#22): until cortex measures itself per request,
nothing downstream can be benchmarked or graphed.

The proxy wraps the upstream byte stream in a pass-through inspector
(TokenMetricsStream): chunks are forwarded verbatim — never buffered
or re-serialised — while the inspector records arrival times and
keeps a bounded (64 KiB) tail of the body text. At stream end (or
client disconnect, via Drop) it extracts the final OpenAI usage
object — present on the last SSE chunk and non-streaming JSON bodies
alike — for engine-truth token counts.

Per request, labelled {model, node}:
- cortex_time_to_first_token_seconds (histogram) — first body chunk
- cortex_tokens_per_second (histogram) — completion tokens over the
  decode window (first→last chunk); falls back to total request
  duration for single-chunk non-streaming bodies
- cortex_prompt_tokens_total / cortex_completion_tokens_total
  (counters)

The extractor is pure and chunk-boundary-safe; quoted-needle matching
keeps completion_tokens_details from shadowing completion_tokens,
and the last usage object wins. Covers chat completions, completions,
the Responses API, and the Anthropic streaming path (which currently
proxies OpenAI SSE).

Tests: 4 extractor unit tests; integration test with a streaming
mock emitting a stream_options-style final usage chunk, asserting
both histograms and exact-or-greater counter values (the test
recorder is process-global and shared across the binary's tests).

Closes #21

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 15:11:52 +03:00
4972c7d1e7 feat(cortex-gateway): C3 — propagate vision capabilities through /v1/models
ModelEntry and CortexModelEntry gain a `capabilities: Vec<String>`
field (serde-default for back-compat). The poller copies it verbatim
from each neuron's ModelInfo.capabilities; list_models computes the
union across every node where a model is loaded so a checkpoint loaded
text-only on one neuron and text+vision on another reports both to the
fleet. Catalogue-only and mid-prewarm entries default to empty until
the catalogue gains a capabilities declaration.

Aliases inherit their target's capability union. New gateway test mocks
two nodes with differing capability arrays and asserts the unioned
/v1/models response.

Closes part of #16 (Stage C3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 13:49:54 +03:00
5ed1140c97 feat(cortex-gateway): proxy /v1/responses to neuron
Some checks failed
CI / CUDA type-check (push) Failing after 12s
build-prerelease / Resolve version stamps (push) Successful in 33s
CI / Format (push) Successful in 37s
CI / Clippy (push) Failing after 1m5s
build-prerelease / Build cortex binary (push) Successful in 4m26s
CI / Test (push) Successful in 5m17s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 5m39s
build-prerelease / Package cortex RPM (push) Successful in 1m24s
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
Step 3 of the Responses rollout: plain proxy route on the gateway,
no translation. Neuron speaks the Responses API natively after Step
2 (commit 957f704), so the gateway just needs the same routing
shape it uses for /v1/chat/completions — extract `model`, resolve
via router::resolve, forward verbatim.

- New `POST /v1/responses` handler in handlers.rs::responses.
- Mock neuron under tests/common/mod.rs gains a `/v1/responses`
  endpoint that mirrors the ResponsesResponse shape neuron emits.
- New integration test file `tests/responses.rs` exercises:
  - Happy path (200, body round-trips, ResponsesUsage shape).
  - Unknown model → 404 (matches chat-completions error shape).
  - Missing `model` field → 400 (same extract_model helper).

Streaming proxy works through the same path as chat completions —
the upstream Content-Type (`text/event-stream` for stream:true,
`application/json` otherwise) propagates through proxy_with_metrics
unchanged. Live-stream integration tests against a streaming mock
deferred until we exercise the path against a real neuron, since
the chat-completions streaming test already covers the proxy's
SSE forwarding mechanics.

Three new tests; clippy + fmt clean across the workspace.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-31 11:21:43 +03:00
b9e7a76a7a feat(gateway): surface mid-prewarm models as Loading on /v1/models
The poller now fetches /health alongside /models on each neuron and
stashes the activation snapshot on NodeState. The /v1/models handler
gains a Pass 3 that synthesises Loading locations from each neuron's
activation.in_progress and activation.pending lists, so a catalogued
model that's mid-prewarm surfaces as `status: "loading"` rather than
appearing absent (loaded=false, locations=[]).

Without this, a client polling /v1/models during a beast restart sees
Qwen3.6-27B disappear for the ~5 minutes the q5k load takes, then
reappear. Now it stays visible the whole time with a clear status.

Adds ModelStatus::Loading to cortex-core. The router's per-node priority
loop gets an explicit (no-op) arm: Loading models aren't routable yet,
and falling through to the catalogue cold-load path is the existing
race — no worse than before, but tagged as a known follow-up needing
neuron-side in-flight tracking on /models/load.

New test_poller_captures_activation_from_health exercises the full
round-trip: mock neuron with empty /models but a pre_warming /health
→ poller writes node.activation. Common test helpers gain
spawn_mock_neuron_with_models_and_health and default_health_response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 15:26:12 +03:00
3cccc2c56b refactor(neuron): cut mistralrs/llamacpp, scaffold candle harness
Stage 1 of the candle-native pivot. Replaces the external-process
harness model (mistralrs over HTTP, llamacpp placeholder) with an
in-process Harness trait whose sole implementation is candle. The
trait keeps its shape so future engines slot in additively, but
start/stop default to no-ops and HarnessConfig drops endpoint and
systemd_unit since no harness needs external supervision.

Behaviour is unchanged on the wire: load_model returns a "not
implemented yet (Stage 2)" error and list_models is empty. The
gateway-side proxy, poller, and router are untouched.

CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are
marked superseded; the staged plan lives in
~/.claude/plans/create-a-more-aggressive-calm-naur.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:53:04 +03:00
e42e8ee81f refactor: cortex talks to neurons instead of mistral.rs directly
All checks were successful
CI / Format, lint, build, test (push) Successful in 2m46s
CI / Build SRPM (push) Has been skipped
CI / Publish to COPR (push) Has been skipped
Replace NodeConfig (static vram_mb, pinned) with NeuronEndpoint.
Hardware discovery and model pinning now come from neuron API and
models.toml catalogue respectively.

- config.rs: nodes -> neurons, add models_config path
- catalogue.rs: ModelProfile with pinned_on, ModelCatalogue
- poller.rs: poll neuron GET /models (ModelInfo format)
- router.rs: resolve inference endpoint via neuron GET /models/{id}/endpoint
- evictor.rs: call neuron POST /models/unload
- node.rs: remove vram_mb, pinned fields (come from discovery/catalogue)
- All 22 gateway tests updated to mock neuron API
- Remove MistralModelsResponse, ModelLifecycleRequest (no longer needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:42:52 +03:00
d5f19b9ff2 test: add Phase 3 poller integration tests
All checks were successful
CI / Format, lint, build, test (push) Successful in 2m31s
CI / Build SRPM (push) Has been skipped
CI / Publish to COPR (push) Has been skipped
Extract public poll_once() from poll_loop() for testability.
4 tests proving the poller correctly discovers models, updates
gateway state, marks unreachable nodes unhealthy, and prunes
stale models.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 19:31:17 +03:00
c2118aa81c test: add Phase 2 streaming SSE passthrough tests
All checks were successful
CI / Format, lint, build, test (push) Successful in 2m36s
CI / Build SRPM (push) Has been skipped
CI / Publish to COPR (push) Has been skipped
Confirms the existing proxy streams SSE chunks incrementally:
- 5-chunk test with 50ms delays verifies time spread between first
  and last chunk arrival (not buffered)
- Verifies data: [DONE] terminator is forwarded

No src/ changes needed — Body::from_stream(bytes_stream()) already
handles SSE correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 19:28:33 +03:00
1b339b1426 test: add Phase 1 integration tests for basic proxy
Some checks failed
CI / Build SRPM (push) Has been cancelled
CI / Publish to COPR (push) Has been cancelled
CI / Format, lint, build, test (push) Has been cancelled
6 tests proving the scaffold works end-to-end:
- chat completion proxied through gateway to mock backend
- /health endpoint with healthy node
- /v1/models returns seeded model list
- 404 for unknown model
- 404 when no healthy nodes available
- 400 when request body missing model field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 19:26:12 +03:00