Stage 2 of the candle-native pivot. Fleshes out CandleHarness with a
LoadedModel registry keyed by model_id, hf-hub-backed GGUF download,
and Qwen3 quantized weight construction via candle-transformers'
quantized_qwen3 module. unload_model drops the entry; Drop on the
candle ModelWeights frees device memory.
Device selection prefers CUDA (gated behind the new `cuda` feature),
falling back to CPU when CUDA is unavailable so default builds work
on non-GPU hosts. The candle CUDA toolchain isn't pulled in unless
`--features cuda` is passed, keeping CI green on CPU runners.
Config gains a [harness.candle] block with an optional hf_cache path.
HarnessRegistry::from_configs now takes HarnessSettings so per-harness
config flows through.
A gated tests/candle_lifecycle.rs exercises real load → list → unload
→ list-empty when run with `--features cuda-integration` against a
host with HF network access. The default-feature test in tests/api.rs
covers the wrong-harness rejection path without needing the network.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stage 1 of the candle-native pivot. Replaces the external-process
harness model (mistralrs over HTTP, llamacpp placeholder) with an
in-process Harness trait whose sole implementation is candle. The
trait keeps its shape so future engines slot in additively, but
start/stop default to no-ops and HarnessConfig drops endpoint and
systemd_unit since no harness needs external supervision.
Behaviour is unchanged on the wire: load_model returns a "not
implemented yet (Stage 2)" error and list_models is empty. The
gateway-side proxy, poller, and router are untouched.
CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are
marked superseded; the staged plan lives in
~/.claude/plans/create-a-more-aggressive-calm-naur.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous defaults collided with well-trodden infra services and with
the Linux ephemeral port range:
- cortex API 8000 — common dev-server default (Django, minio UI)
- cortex metrics 9100 — Prometheus node_exporter default
- neuron API 9090 — Cockpit default on Fedora, Prometheus self
Move to helexa-themed palindromic ports, all below Linux's
32768-60999 ephemeral range and not registered to any well-known
service:
- cortex API 31313
- cortex metrics 31314
- neuron API 13131
Updated places:
- cortex.example.toml, neuron.example.toml defaults
- default impls in cortex-core and neuron config
- cortex-cli --endpoint default for the status subcommand
- doc comments citing example URLs
- README.md and CLAUDE.md snippets
Consumers already on the old ports need a one-line edit in their
/etc/cortex/cortex.toml or /etc/neuron/neuron.toml to match;
firewall rules and prometheus scrape configs will also need
updating.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Package name, lib name, and binary all now just "neuron" without
the cortex- prefix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace NodeConfig (static vram_mb, pinned) with NeuronEndpoint.
Hardware discovery and model pinning now come from neuron API and
models.toml catalogue respectively.
- config.rs: nodes -> neurons, add models_config path
- catalogue.rs: ModelProfile with pinned_on, ModelCatalogue
- poller.rs: poll neuron GET /models (ModelInfo format)
- router.rs: resolve inference endpoint via neuron GET /models/{id}/endpoint
- evictor.rs: call neuron POST /models/unload
- node.rs: remove vram_mb, pinned fields (come from discovery/catalogue)
- All 22 gateway tests updated to mock neuron API
- Remove MistralModelsResponse, ModelLifecycleRequest (no longer needed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MistralRsHarness: Harness trait impl wrapping mistral.rs HTTP API
(list/load/unload models, health check, start/stop via systemd)
- HarnessRegistry: maps harness name -> Box<dyn Harness>, built from
neuron.toml config
- Neuron API endpoints: GET /models, POST /models/load,
POST /models/unload, GET /models/:id/endpoint
- NeuronConfig: figment-based config loading from neuron.toml
- Integration test: full model lifecycle through mock mistral.rs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Emit cortex_requests_total, cortex_request_duration_seconds,
cortex_request_errors_total, and cortex_cold_starts_total with
model and node labels on every proxied request.
Add install_test_recorder() for testing metrics without HTTP listener.
Integration test verifies counters and histograms appear after proxy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire up openai_to_anthropic in the /v1/messages handler: buffer
upstream OpenAI response, parse, translate to Anthropic format
(stop_reason mapping, usage field names, content blocks).
5 integration tests covering round-trip translation, system prompt,
content blocks, and error cases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract public poll_once() from poll_loop() for testability.
4 tests proving the poller correctly discovers models, updates
gateway state, marks unreachable nodes unhealthy, and prunes
stale models.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Confirms the existing proxy streams SSE chunks incrementally:
- 5-chunk test with 50ms delays verifies time spread between first
and last chunk arrival (not buffered)
- Verifies data: [DONE] terminator is forwarded
No src/ changes needed — Body::from_stream(bytes_stream()) already
handles SSE correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 tests proving the scaffold works end-to-end:
- chat completion proxied through gateway to mock backend
- /health endpoint with healthy node
- /v1/models returns seeded model list
- 404 for unknown model
- 404 when no healthy nodes available
- 400 when request body missing model field
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add .gitea/workflows/ci.yml with fmt/clippy/test on all branches
and SRPM build + COPR publish on version tags
- Add cortex.spec for Fedora RPM packaging
- Add GPL-3.0-or-later LICENSE file
- Add cortex.example.toml with generic hostnames; gitignore cortex.toml
- Scrub infrastructure-specific hostnames from README.md, CLAUDE.md,
and doc comments
- Fix unused imports and clippy warnings to pass -D warnings
- Fix missing deps (bytes, reqwest, serde_json) exposed during build
- Run cargo fmt across workspace
- Update SPDX license identifier to GPL-3.0-or-later
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>