cortex

Author	SHA1	Message	Date
rob thijssen	735945ee81	feat(cortex): unified /v1/models — catalogue × topology feasibility + cold-load Some checks failed build-prerelease / Resolve version stamps (push) Successful in 45s Details CI / Format (push) Successful in 48s Details CI / Clippy (push) Successful in 2m12s Details CI / Test (push) Successful in 4m42s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 5m10s Details build-prerelease / Build neuron-blackwell (push) Successful in 3m35s Details build-prerelease / Package cortex RPM (push) Successful in 1m19s Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details Realises [project-unified-models-endpoint]: cortex now surfaces every model the operator has provisioned in the catalogue, transparently cold-loads on the first request, and routes the request once the load is done — without per-node configuration or client awareness of which neuron hosts what. cortex-core changes: - NodeState gains `discovery: Option<DiscoveryResponse>` — populated once per neuron on first successful poll, cached forever after (topology is invariant for a neuron process). - ModelProfile gains `is_feasible_on(neuron, devices)` with the pinned_on / min_devices / min_device_vram_mb logic + 5 unit tests. - CortexModelEntry expanded with OpenAI-compatible (`id`, `object`, `created`, `owned_by`) plus helexa-specific extension fields (`loaded`, `feasible_on`, `locations`). cortex-gateway changes: - poller.rs: `maybe_poll_discovery` fetches `GET /discovery` once per neuron and caches on NodeState. - handlers.rs::list_models rewritten as union of (catalogue × topology feasibility) + (currently loaded somewhere). Catalogue-defined models surface even when not yet loaded. - router.rs::resolve gains priority 3 (catalogue cold-load): 1. loaded somewhere → route there 2. unloaded somewhere → route + lazy load via neuron 3. in catalogue → pick feasible neuron, POST /models/load, wait, route. Cache the new entry locally so subsequent requests skip the poll wait. 4. else 404 - pick_feasible_neuron prefers pinned_on neurons, falls back to any feasible one (stable by name). - profile_to_spec translates ModelProfile → ModelSpec, picking devices by VRAM floor and setting tensor_parallel = min_devices for multi- device profiles. - "already loaded" responses from neuron are tolerated (two concurrent requests racing the same cold-load is a benign outcome). models.example.toml rewritten to reflect the canonical helexa fleet (beast = 2x RTX 5090, benjy = RTX 4090, quadbrat = RTX 3060) with a working TP example (Qwen3.6-27B pinned on beast) plus single-GPU profiles for the smaller models. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:39:04 +03:00
rob thijssen	3cccc2c56b	refactor(neuron): cut mistralrs/llamacpp, scaffold candle harness Stage 1 of the candle-native pivot. Replaces the external-process harness model (mistralrs over HTTP, llamacpp placeholder) with an in-process Harness trait whose sole implementation is candle. The trait keeps its shape so future engines slot in additively, but start/stop default to no-ops and HarnessConfig drops endpoint and systemd_unit since no harness needs external supervision. Behaviour is unchanged on the wire: load_model returns a "not implemented yet (Stage 2)" error and list_models is empty. The gateway-side proxy, poller, and router are untouched. CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are marked superseded; the staged plan lives in ~/.claude/plans/create-a-more-aggressive-calm-naur.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:53:04 +03:00
rob thijssen	3f94c50817	chore: move default ports out of common-collision ranges Previous defaults collided with well-trodden infra services and with the Linux ephemeral port range: - cortex API 8000 — common dev-server default (Django, minio UI) - cortex metrics 9100 — Prometheus node_exporter default - neuron API 9090 — Cockpit default on Fedora, Prometheus self Move to helexa-themed palindromic ports, all below Linux's 32768-60999 ephemeral range and not registered to any well-known service: - cortex API 31313 - cortex metrics 31314 - neuron API 13131 Updated places: - cortex.example.toml, neuron.example.toml defaults - default impls in cortex-core and neuron config - cortex-cli --endpoint default for the status subcommand - doc comments citing example URLs - README.md and CLAUDE.md snippets Consumers already on the old ports need a one-line edit in their /etc/cortex/cortex.toml or /etc/neuron/neuron.toml to match; firewall rules and prometheus scrape configs will also need updating. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 17:45:25 +03:00
rob thijssen	e42e8ee81f	refactor: cortex talks to neurons instead of mistral.rs directly All checks were successful CI / Format, lint, build, test (push) Successful in 2m46s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details Replace NodeConfig (static vram_mb, pinned) with NeuronEndpoint. Hardware discovery and model pinning now come from neuron API and models.toml catalogue respectively. - config.rs: nodes -> neurons, add models_config path - catalogue.rs: ModelProfile with pinned_on, ModelCatalogue - poller.rs: poll neuron GET /models (ModelInfo format) - router.rs: resolve inference endpoint via neuron GET /models/{id}/endpoint - evictor.rs: call neuron POST /models/unload - node.rs: remove vram_mb, pinned fields (come from discovery/catalogue) - All 22 gateway tests updated to mock neuron API - Remove MistralModelsResponse, ModelLifecycleRequest (no longer needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:42:52 +03:00
rob thijssen	6dc717ebcd	feat: add neuron daemon with GPU discovery and health endpoints All checks were successful CI / Format, lint, build, test (push) Successful in 2m29s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details Replace cortex-agent stub with neuron (cortex-neuron binary). cortex-core additions: - discovery.rs: DeviceInfo, DiscoveryResponse, DeviceHealth, HealthResponse - harness.rs: Harness async trait, HarnessConfig, ModelSpec, ModelInfo neuron crate (crates/neuron/): - discovery.rs: nvidia-smi CSV parsing (pure functions) + system discovery via uname/nvidia-smi/nvcc - health.rs: cached GPU health polling every 5s - api.rs: GET /discovery and GET /health axum handlers - main.rs: CLI entrypoint with --port flag (default 9090) - harness stubs for mistralrs (Phase 8) and llamacpp (Phase 11) 12 new tests (9 unit + 3 integration), 35 total. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:23:42 +03:00
rob thijssen	6bb3004cfc	ci: add Gitea CI, RPM spec, license, and repo hygiene All checks were successful CI / Format, lint, build, test (push) Successful in 2m15s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details - Add .gitea/workflows/ci.yml with fmt/clippy/test on all branches and SRPM build + COPR publish on version tags - Add cortex.spec for Fedora RPM packaging - Add GPL-3.0-or-later LICENSE file - Add cortex.example.toml with generic hostnames; gitignore cortex.toml - Scrub infrastructure-specific hostnames from README.md, CLAUDE.md, and doc comments - Fix unused imports and clippy warnings to pass -D warnings - Fix missing deps (bytes, reqwest, serde_json) exposed during build - Run cargo fmt across workspace - Update SPDX license identifier to GPL-3.0-or-later Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 18:24:04 +03:00
rob thijssen	0da68833af	feat: scaffold cortex workspace Rust reverse-proxy for multi-node mistral.rs inference clusters. Includes crate structure (cortex-core, cortex-gateway, cortex-agent, cortex-cli), config loading, OpenAI/Anthropic translation stubs, model routing, eviction, polling, and streaming proxy scaffolding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 18:13:30 +03:00

7 Commits