Comprehensive sweep across cortex-gateway's request handling. Every
failure path now emits exactly one structured warn (or error) event
on the cortex side with the wire-level detail an operator needs;
the API response carries only a generic message plus, where useful,
the upstream status code.
proxy.rs::forward_request:
- warn on network failure (network error, target URL).
- warn on upstream non-2xx (status, target URL). Streaming body still
passes through to the client; we just can't snippet without
breaking the stream.
- warn on response-build failure.
- ProxyError::into_response no longer interpolates the inner error
into the API body — generic "upstream request failed" / "failed to
build response" instead.
handlers.rs::chat_completions, handlers.rs::completions:
- warn on missing model field, with handler= label.
- warn on route resolve failure with model + error chain. The
user-facing 404 keeps the RouteError Display string (which is
short, informative, and contains no internal detail beyond the
model id and config'd node names).
handlers.rs::anthropic_messages:
- warn on invalid Anthropic body, on translated-OpenAI serialise
failure (which is internal), on route resolve, on upstream network
error, on upstream non-2xx (with 512-char body snippet for parse
errors), on upstream body read, on response parse.
- All warns share consistent field shape: handler, model, node, url,
status / error / body as applicable.
- API response messages are now uniformly generic.
- Adds an info-level "proxying request" log on the non-streaming
path so successful proxies are also visible.
handlers.rs::proxy_with_metrics:
- still calls e.into_response() but proxy::forward_request already
warn'd at the wire layer, so no double-log here.
Tests:
- All 32 existing unit tests + 22 gateway integration tests + 4
new router tests pass.
- Tests that asserted on the "no healthy nodes" / "not found"
strings still match because RouteError messages are preserved
in the 404 user-facing path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stage 3 of the candle-native pivot. neuron now serves
POST /v1/chat/completions backed by candle's quantized_qwen3 forward
pass on a per-model serialised generation loop, returning the standard
OpenAI ChatCompletionResponse envelope.
Pipeline per request:
- Look up the LoadedModel by request.model (404 if absent).
- Apply the Qwen3 chat template across all messages.
- Tokenize, then spawn_blocking onto tokio's blocking pool to acquire
the per-model arch lock and run prefill + greedy/temperature/top-p
sampling via LogitsProcessor.
- Stop on <|im_end|>/<|endoftext|> EOS or max_tokens (finish_reason
"stop" vs "length").
- Decode with skip_special_tokens=true, build OpenAI response with
prompt/completion/total usage counts.
Supporting changes:
- HarnessRegistry now stores Arc<dyn Harness> and caches a typed
Arc<CandleHarness> so inference routes bypass dyn-Trait dispatch.
- LoadedModel.arch becomes Arc<Mutex<ModelArch>> so the lock guard
can be moved into spawn_blocking.
- NeuronState gains an Option<Arc<CandleHarness>> field for the new
inference route.
- Typed InferenceError lets the handler map ModelNotLoaded → 404 and
other failures → 500 without string-matching anyhow messages.
- stream=true returns 501 until Stage 4 wires up SSE.
- Two leftover mistral.rs string references in proxy.rs and cortex-cli
(missed during the Stage 1 sweep) are corrected here.
Three new default-feature tests cover the no-candle 503, model-not-
loaded 404, and stream=true 501 paths. The cuda-integration test from
Stage 2 still covers real load/unload; a streaming-feature gated test
exercising actual generation will arrive with Stage 4.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add .gitea/workflows/ci.yml with fmt/clippy/test on all branches
and SRPM build + COPR publish on version tags
- Add cortex.spec for Fedora RPM packaging
- Add GPL-3.0-or-later LICENSE file
- Add cortex.example.toml with generic hostnames; gitignore cortex.toml
- Scrub infrastructure-specific hostnames from README.md, CLAUDE.md,
and doc comments
- Fix unused imports and clippy warnings to pass -D warnings
- Fix missing deps (bytes, reqwest, serde_json) exposed during build
- Run cargo fmt across workspace
- Update SPDX license identifier to GPL-3.0-or-later
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>