cortex

Author	SHA1	Message	Date
rob thijssen	5ed1140c97	feat(cortex-gateway): proxy /v1/responses to neuron Some checks failed CI / CUDA type-check (push) Failing after 12s Details build-prerelease / Resolve version stamps (push) Successful in 33s Details CI / Format (push) Successful in 37s Details CI / Clippy (push) Failing after 1m5s Details build-prerelease / Build cortex binary (push) Successful in 4m26s Details CI / Test (push) Successful in 5m17s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 5m39s Details build-prerelease / Package cortex RPM (push) Successful in 1m24s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details Step 3 of the Responses rollout: plain proxy route on the gateway, no translation. Neuron speaks the Responses API natively after Step 2 (commit `957f704`), so the gateway just needs the same routing shape it uses for /v1/chat/completions — extract `model`, resolve via router::resolve, forward verbatim. - New `POST /v1/responses` handler in handlers.rs::responses. - Mock neuron under tests/common/mod.rs gains a `/v1/responses` endpoint that mirrors the ResponsesResponse shape neuron emits. - New integration test file `tests/responses.rs` exercises: - Happy path (200, body round-trips, ResponsesUsage shape). - Unknown model → 404 (matches chat-completions error shape). - Missing `model` field → 400 (same extract_model helper). Streaming proxy works through the same path as chat completions — the upstream Content-Type (`text/event-stream` for stream:true, `application/json` otherwise) propagates through proxy_with_metrics unchanged. Live-stream integration tests against a streaming mock deferred until we exercise the path against a real neuron, since the chat-completions streaming test already covers the proxy's SSE forwarding mechanics. Three new tests; clippy + fmt clean across the workspace. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 11:21:43 +03:00
rob thijssen	24e20dcb5c	feat(catalogue,gateway): model aliases (helexa/small, helexa/balanced, helexa/large) All checks were successful build-prerelease / Resolve version stamps (push) Successful in 39s Details CI / Format (push) Successful in 40s Details CI / Clippy (push) Successful in 2m21s Details CI / Test (push) Successful in 4m40s Details build-prerelease / Build neuron-blackwell (push) Successful in 3m38s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m19s Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Build neuron-ampere (push) Successful in 5m20s Details build-prerelease / Build neuron-ada (push) Successful in 4m45s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m10s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 9m40s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s Details Operators can now define tier aliases in models.toml: [aliases] "helexa/small" = "Qwen/Qwen3-1.7B" "helexa/balanced" = "Qwen/Qwen3-8B" "helexa/large" = "Qwen/Qwen3.6-27B" A client request for `model: "helexa/small"` is resolved to the concrete model id at routing time. The gateway also rewrites the proxied body's `model` field to the concrete id so neuron sees a name that matches its loaded handle (otherwise the harness rejects the request). Motivated by the finger-in-the-wind benchmark: same "what's the capital of Georgia" probe runs in 2.5s on the 1.7B vs 6.7s on the 27B with identical correctness. Aliases let clients pick a latency tier without hardcoding model ids, and let operators swap targets without changing client code. Changes: * cortex-core: `ModelCatalogue` gains `aliases: HashMap<String, String>` + `resolve_alias(&str) -> &str`. Unit tests cover the basic resolution + TOML round-trip. * cortex-gateway: * `RouteDecision` gains `resolved_model_id: String`. `router::resolve` consumes aliases at entry and threads the concrete id through. * Handlers (chat_completions, completions, anthropic_messages streaming + non-streaming) rewrite the body's `model` field with `rewrite_model_in_body` before proxying, using the resolved id for metrics labels, LRU touch, and the body itself. * `/v1/models` (Pass 4) emits each alias as its own entry mirroring the target's `loaded` flag, feasible_on, and locations — clients browsing the endpoint see both names and can pick either. * `models.toml` declares the three tier aliases; `models.example.toml` documents the section as opt-in. * Integration tests verify: end-to-end alias→concrete request flow, alias surfacing in /v1/models, and no-op fall-through for non-alias model ids. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 16:10:41 +03:00
rob thijssen	b9e7a76a7a	feat(gateway): surface mid-prewarm models as Loading on /v1/models The poller now fetches /health alongside /models on each neuron and stashes the activation snapshot on NodeState. The /v1/models handler gains a Pass 3 that synthesises Loading locations from each neuron's activation.in_progress and activation.pending lists, so a catalogued model that's mid-prewarm surfaces as `status: "loading"` rather than appearing absent (loaded=false, locations=[]). Without this, a client polling /v1/models during a beast restart sees Qwen3.6-27B disappear for the ~5 minutes the q5k load takes, then reappear. Now it stays visible the whole time with a clear status. Adds ModelStatus::Loading to cortex-core. The router's per-node priority loop gets an explicit (no-op) arm: Loading models aren't routable yet, and falling through to the catalogue cold-load path is the existing race — no worse than before, but tagged as a known follow-up needing neuron-side in-flight tracking on /models/load. New test_poller_captures_activation_from_health exercises the full round-trip: mock neuron with empty /models but a pre_warming /health → poller writes node.activation. Common test helpers gain spawn_mock_neuron_with_models_and_health and default_health_response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 15:26:12 +03:00
rob thijssen	3cccc2c56b	refactor(neuron): cut mistralrs/llamacpp, scaffold candle harness Stage 1 of the candle-native pivot. Replaces the external-process harness model (mistralrs over HTTP, llamacpp placeholder) with an in-process Harness trait whose sole implementation is candle. The trait keeps its shape so future engines slot in additively, but start/stop default to no-ops and HarnessConfig drops endpoint and systemd_unit since no harness needs external supervision. Behaviour is unchanged on the wire: load_model returns a "not implemented yet (Stage 2)" error and list_models is empty. The gateway-side proxy, poller, and router are untouched. CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are marked superseded; the staged plan lives in ~/.claude/plans/create-a-more-aggressive-calm-naur.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:53:04 +03:00
rob thijssen	e42e8ee81f	refactor: cortex talks to neurons instead of mistral.rs directly All checks were successful CI / Format, lint, build, test (push) Successful in 2m46s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details Replace NodeConfig (static vram_mb, pinned) with NeuronEndpoint. Hardware discovery and model pinning now come from neuron API and models.toml catalogue respectively. - config.rs: nodes -> neurons, add models_config path - catalogue.rs: ModelProfile with pinned_on, ModelCatalogue - poller.rs: poll neuron GET /models (ModelInfo format) - router.rs: resolve inference endpoint via neuron GET /models/{id}/endpoint - evictor.rs: call neuron POST /models/unload - node.rs: remove vram_mb, pinned fields (come from discovery/catalogue) - All 22 gateway tests updated to mock neuron API - Remove MistralModelsResponse, ModelLifecycleRequest (no longer needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:42:52 +03:00
rob thijssen	67b9b044d3	feat: add per-request Prometheus metrics instrumentation All checks were successful CI / Format, lint, build, test (push) Successful in 2m26s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details Emit cortex_requests_total, cortex_request_duration_seconds, cortex_request_errors_total, and cortex_cold_starts_total with model and node labels on every proxied request. Add install_test_recorder() for testing metrics without HTTP listener. Integration test verifies counters and histograms appear after proxy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 19:42:09 +03:00
rob thijssen	29c8f10761	feat: implement non-streaming Anthropic response translation Wire up openai_to_anthropic in the /v1/messages handler: buffer upstream OpenAI response, parse, translate to Anthropic format (stop_reason mapping, usage field names, content blocks). 5 integration tests covering round-trip translation, system prompt, content blocks, and error cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 19:36:16 +03:00
rob thijssen	24c5e1e361	feat: add LRU eviction tests and last_accessed tracking All checks were successful CI / Format, lint, build, test (push) Successful in 2m37s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details - Add touch_model() in handlers to update last_accessed timestamp on every proxied request, driving LRU eviction ordering - 5 integration tests: LRU eviction, pinned model protection, nothing-to-evict case, lifecycle_cycles increment, and last_accessed update verification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 19:34:08 +03:00
rob thijssen	d5f19b9ff2	test: add Phase 3 poller integration tests All checks were successful CI / Format, lint, build, test (push) Successful in 2m31s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details Extract public poll_once() from poll_loop() for testability. 4 tests proving the poller correctly discovers models, updates gateway state, marks unreachable nodes unhealthy, and prunes stale models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 19:31:17 +03:00
rob thijssen	c2118aa81c	test: add Phase 2 streaming SSE passthrough tests All checks were successful CI / Format, lint, build, test (push) Successful in 2m36s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details Confirms the existing proxy streams SSE chunks incrementally: - 5-chunk test with 50ms delays verifies time spread between first and last chunk arrival (not buffered) - Verifies data: [DONE] terminator is forwarded No src/ changes needed — Body::from_stream(bytes_stream()) already handles SSE correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 19:28:33 +03:00
rob thijssen	1b339b1426	test: add Phase 1 integration tests for basic proxy Some checks failed CI / Build SRPM (push) Has been cancelled Details CI / Publish to COPR (push) Has been cancelled Details CI / Format, lint, build, test (push) Has been cancelled Details 6 tests proving the scaffold works end-to-end: - chat completion proxied through gateway to mock backend - /health endpoint with healthy node - /v1/models returns seeded model list - 404 for unknown model - 404 when no healthy nodes available - 400 when request body missing model field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 19:26:12 +03:00

11 Commits