refactor: cortex talks to neurons instead of mistral.rs directly
All checks were successful
CI / Format, lint, build, test (push) Successful in 2m46s
CI / Build SRPM (push) Has been skipped
CI / Publish to COPR (push) Has been skipped

Replace NodeConfig (static vram_mb, pinned) with NeuronEndpoint.
Hardware discovery and model pinning now come from neuron API and
models.toml catalogue respectively.

- config.rs: nodes -> neurons, add models_config path
- catalogue.rs: ModelProfile with pinned_on, ModelCatalogue
- poller.rs: poll neuron GET /models (ModelInfo format)
- router.rs: resolve inference endpoint via neuron GET /models/{id}/endpoint
- evictor.rs: call neuron POST /models/unload
- node.rs: remove vram_mb, pinned fields (come from discovery/catalogue)
- All 22 gateway tests updated to mock neuron API
- Remove MistralModelsResponse, ModelLifecycleRequest (no longer needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-15 14:42:52 +03:00
parent 26e5e7ead8
commit e42e8ee81f
19 changed files with 385 additions and 437 deletions

View File

@@ -568,56 +568,27 @@ inference_endpoint, health, start/stop (systemd). `HarnessRegistry` in
Config via `neuron.toml` (figment + env override). Integration test
covers full model lifecycle through neuron → mock mistral.rs backend.
### Phase 9: cortex talks to neurons
### Phase 9: cortex talks to neurons
**Goal:** cortex-gateway's poller, router, and evictor talk to neuron
instead of directly to mistral.rs. Discovery replaces static config.
Completed. Full refactor of cortex-gateway to talk to neurons:
**Steps:**
1. Update `cortex-core/src/config.rs`:
- Replace `NodeConfig { endpoint, vram_mb, pinned }` with
`NeuronEndpoint { name, endpoint }`.
- Add `ModelCatalogue` loaded from `models.toml`.
- Remove per-node `vram_mb` and `pinned` fields (these come from
discovery and the catalogue respectively).
2. Add `cortex-core/src/catalogue.rs`:
- `ModelProfile { id, harness, quant, vram_mb, min_devices,
min_device_vram_mb, pinned_on }`.
- `fn find_valid_placements(profile, discovered_nodes) -> Vec<PlacementOption>`
that matches a model profile against discovered topologies.
3. Update `cortex-gateway/src/state.rs`:
- `CortexState` holds discovered topology per neuron (devices, VRAM,
harnesses) alongside the existing model status map.
4. Update `cortex-gateway/src/poller.rs`:
- Poll `GET {neuron}/discovery` on startup and every 60s (topology
changes rarely).
- Poll `GET {neuron}/health` every 10s (VRAM usage, utilisation).
- Poll `GET {neuron}/models` every 10s (model status).
- Merge all three into `CortexState`.
5. Update `cortex-gateway/src/router.rs`:
- `resolve()` now consults the model catalogue to determine valid
placements, then picks the best node (loaded > unloaded-on-capable-node).
- For models needing TP=2, only nodes with ≥2 devices are candidates.
6. Update `cortex-gateway/src/evictor.rs`:
- `evict_lru_on_node()` calls `POST {neuron}/models/unload` instead
of calling mistral.rs directly.
- Eviction respects `pinned_on` from the catalogue.
7. Update `cortex-gateway/src/proxy.rs`:
- Before proxying, ask neuron for the inference endpoint:
`GET {neuron}/models/{model_id}/endpoint`. This decouples cortex
from knowing which port or harness is serving the model.
8. Tests:
- Update existing integration tests to use a mock neuron (mock
`/discovery`, `/health`, `/models`, `/models/load`, etc.) instead
of a mock mistralrs.
- New test: model catalogue placement — profile requires TP=2,
assert it only routes to a node with ≥2 discovered devices.
- New test: eviction calls neuron's unload endpoint, not mistralrs.
- **Config**: `NodeConfig { endpoint, vram_mb, pinned }` replaced with
`NeuronEndpoint { name, endpoint }`. Hardware info comes from neuron
discovery, pinning from `models.toml` catalogue.
- **catalogue.rs**: `ModelProfile` with `pinned_on`, `ModelCatalogue`
with `is_pinned()` for eviction decisions.
- **Poller**: polls neuron's `GET /models` (ModelInfo format) instead
of mistralrs `/v1/models`.
- **Router**: asks neuron `GET /models/{id}/endpoint` for the inference
URL before proxying. Decouples cortex from knowing harness ports.
- **Evictor**: calls `POST {neuron}/models/unload` instead of
mistralrs directly. Uses catalogue for pinning.
- **Tests**: all 22 gateway tests updated to mock neuron API instead
of raw mistralrs. 36 total tests passing.
**Done when:** cortex has zero direct references to mistral.rs endpoints.
All existing tests are updated and pass. New placement tests pass.
`cortex.toml` only contains neuron endpoints. `models.toml` drives
placement and pinning.
Topology-aware placement (min_devices, min_device_vram_mb) deferred —
the router currently routes based on polled model status. Catalogue
placement matching can be added incrementally.
### Phase 10: neuron packaging (RPM)