cortex

Author	SHA1	Message	Date
rob thijssen	b9e7a76a7a	feat(gateway): surface mid-prewarm models as Loading on /v1/models The poller now fetches /health alongside /models on each neuron and stashes the activation snapshot on NodeState. The /v1/models handler gains a Pass 3 that synthesises Loading locations from each neuron's activation.in_progress and activation.pending lists, so a catalogued model that's mid-prewarm surfaces as `status: "loading"` rather than appearing absent (loaded=false, locations=[]). Without this, a client polling /v1/models during a beast restart sees Qwen3.6-27B disappear for the ~5 minutes the q5k load takes, then reappear. Now it stays visible the whole time with a clear status. Adds ModelStatus::Loading to cortex-core. The router's per-node priority loop gets an explicit (no-op) arm: Loading models aren't routable yet, and falling through to the catalogue cold-load path is the existing race — no worse than before, but tagged as a known follow-up needing neuron-side in-flight tracking on /models/load. New test_poller_captures_activation_from_health exercises the full round-trip: mock neuron with empty /models but a pre_warming /health → poller writes node.activation. Common test helpers gain spawn_mock_neuron_with_models_and_health and default_health_response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 15:26:12 +03:00
rob thijssen	735945ee81	feat(cortex): unified /v1/models — catalogue × topology feasibility + cold-load Some checks failed build-prerelease / Resolve version stamps (push) Successful in 45s Details CI / Format (push) Successful in 48s Details CI / Clippy (push) Successful in 2m12s Details CI / Test (push) Successful in 4m42s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 5m10s Details build-prerelease / Build neuron-blackwell (push) Successful in 3m35s Details build-prerelease / Package cortex RPM (push) Successful in 1m19s Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details Realises [project-unified-models-endpoint]: cortex now surfaces every model the operator has provisioned in the catalogue, transparently cold-loads on the first request, and routes the request once the load is done — without per-node configuration or client awareness of which neuron hosts what. cortex-core changes: - NodeState gains `discovery: Option<DiscoveryResponse>` — populated once per neuron on first successful poll, cached forever after (topology is invariant for a neuron process). - ModelProfile gains `is_feasible_on(neuron, devices)` with the pinned_on / min_devices / min_device_vram_mb logic + 5 unit tests. - CortexModelEntry expanded with OpenAI-compatible (`id`, `object`, `created`, `owned_by`) plus helexa-specific extension fields (`loaded`, `feasible_on`, `locations`). cortex-gateway changes: - poller.rs: `maybe_poll_discovery` fetches `GET /discovery` once per neuron and caches on NodeState. - handlers.rs::list_models rewritten as union of (catalogue × topology feasibility) + (currently loaded somewhere). Catalogue-defined models surface even when not yet loaded. - router.rs::resolve gains priority 3 (catalogue cold-load): 1. loaded somewhere → route there 2. unloaded somewhere → route + lazy load via neuron 3. in catalogue → pick feasible neuron, POST /models/load, wait, route. Cache the new entry locally so subsequent requests skip the poll wait. 4. else 404 - pick_feasible_neuron prefers pinned_on neurons, falls back to any feasible one (stable by name). - profile_to_spec translates ModelProfile → ModelSpec, picking devices by VRAM floor and setting tensor_parallel = min_devices for multi- device profiles. - "already loaded" responses from neuron are tolerated (two concurrent requests racing the same cold-load is a benign outcome). models.example.toml rewritten to reflect the canonical helexa fleet (beast = 2x RTX 5090, benjy = RTX 4090, quadbrat = RTX 3060) with a working TP example (Qwen3.6-27B pinned on beast) plus single-GPU profiles for the smaller models. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:39:04 +03:00
rob thijssen	e42e8ee81f	refactor: cortex talks to neurons instead of mistral.rs directly All checks were successful CI / Format, lint, build, test (push) Successful in 2m46s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details Replace NodeConfig (static vram_mb, pinned) with NeuronEndpoint. Hardware discovery and model pinning now come from neuron API and models.toml catalogue respectively. - config.rs: nodes -> neurons, add models_config path - catalogue.rs: ModelProfile with pinned_on, ModelCatalogue - poller.rs: poll neuron GET /models (ModelInfo format) - router.rs: resolve inference endpoint via neuron GET /models/{id}/endpoint - evictor.rs: call neuron POST /models/unload - node.rs: remove vram_mb, pinned fields (come from discovery/catalogue) - All 22 gateway tests updated to mock neuron API - Remove MistralModelsResponse, ModelLifecycleRequest (no longer needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:42:52 +03:00
rob thijssen	0da68833af	feat: scaffold cortex workspace Rust reverse-proxy for multi-node mistral.rs inference clusters. Includes crate structure (cortex-core, cortex-gateway, cortex-agent, cortex-cli), config loading, OpenAI/Anthropic translation stubs, model routing, eviction, polling, and streaming proxy scaffolding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 18:13:30 +03:00

4 Commits