refactor: cortex talks to neurons instead of mistral.rs directly
Replace NodeConfig (static vram_mb, pinned) with NeuronEndpoint.
Hardware discovery and model pinning now come from neuron API and
models.toml catalogue respectively.
- config.rs: nodes -> neurons, add models_config path
- catalogue.rs: ModelProfile with pinned_on, ModelCatalogue
- poller.rs: poll neuron GET /models (ModelInfo format)
- router.rs: resolve inference endpoint via neuron GET /models/{id}/endpoint
- evictor.rs: call neuron POST /models/unload
- node.rs: remove vram_mb, pinned fields (come from discovery/catalogue)
- All 22 gateway tests updated to mock neuron API
- Remove MistralModelsResponse, ModelLifecycleRequest (no longer needed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
65
CLAUDE.md
65
CLAUDE.md
@@ -568,56 +568,27 @@ inference_endpoint, health, start/stop (systemd). `HarnessRegistry` in
|
||||
Config via `neuron.toml` (figment + env override). Integration test
|
||||
covers full model lifecycle through neuron → mock mistral.rs backend.
|
||||
|
||||
### Phase 9: cortex talks to neurons
|
||||
### Phase 9: cortex talks to neurons ✅
|
||||
|
||||
**Goal:** cortex-gateway's poller, router, and evictor talk to neuron
|
||||
instead of directly to mistral.rs. Discovery replaces static config.
|
||||
Completed. Full refactor of cortex-gateway to talk to neurons:
|
||||
|
||||
**Steps:**
|
||||
1. Update `cortex-core/src/config.rs`:
|
||||
- Replace `NodeConfig { endpoint, vram_mb, pinned }` with
|
||||
`NeuronEndpoint { name, endpoint }`.
|
||||
- Add `ModelCatalogue` loaded from `models.toml`.
|
||||
- Remove per-node `vram_mb` and `pinned` fields (these come from
|
||||
discovery and the catalogue respectively).
|
||||
2. Add `cortex-core/src/catalogue.rs`:
|
||||
- `ModelProfile { id, harness, quant, vram_mb, min_devices,
|
||||
min_device_vram_mb, pinned_on }`.
|
||||
- `fn find_valid_placements(profile, discovered_nodes) -> Vec<PlacementOption>`
|
||||
that matches a model profile against discovered topologies.
|
||||
3. Update `cortex-gateway/src/state.rs`:
|
||||
- `CortexState` holds discovered topology per neuron (devices, VRAM,
|
||||
harnesses) alongside the existing model status map.
|
||||
4. Update `cortex-gateway/src/poller.rs`:
|
||||
- Poll `GET {neuron}/discovery` on startup and every 60s (topology
|
||||
changes rarely).
|
||||
- Poll `GET {neuron}/health` every 10s (VRAM usage, utilisation).
|
||||
- Poll `GET {neuron}/models` every 10s (model status).
|
||||
- Merge all three into `CortexState`.
|
||||
5. Update `cortex-gateway/src/router.rs`:
|
||||
- `resolve()` now consults the model catalogue to determine valid
|
||||
placements, then picks the best node (loaded > unloaded-on-capable-node).
|
||||
- For models needing TP=2, only nodes with ≥2 devices are candidates.
|
||||
6. Update `cortex-gateway/src/evictor.rs`:
|
||||
- `evict_lru_on_node()` calls `POST {neuron}/models/unload` instead
|
||||
of calling mistral.rs directly.
|
||||
- Eviction respects `pinned_on` from the catalogue.
|
||||
7. Update `cortex-gateway/src/proxy.rs`:
|
||||
- Before proxying, ask neuron for the inference endpoint:
|
||||
`GET {neuron}/models/{model_id}/endpoint`. This decouples cortex
|
||||
from knowing which port or harness is serving the model.
|
||||
8. Tests:
|
||||
- Update existing integration tests to use a mock neuron (mock
|
||||
`/discovery`, `/health`, `/models`, `/models/load`, etc.) instead
|
||||
of a mock mistralrs.
|
||||
- New test: model catalogue placement — profile requires TP=2,
|
||||
assert it only routes to a node with ≥2 discovered devices.
|
||||
- New test: eviction calls neuron's unload endpoint, not mistralrs.
|
||||
- **Config**: `NodeConfig { endpoint, vram_mb, pinned }` replaced with
|
||||
`NeuronEndpoint { name, endpoint }`. Hardware info comes from neuron
|
||||
discovery, pinning from `models.toml` catalogue.
|
||||
- **catalogue.rs**: `ModelProfile` with `pinned_on`, `ModelCatalogue`
|
||||
with `is_pinned()` for eviction decisions.
|
||||
- **Poller**: polls neuron's `GET /models` (ModelInfo format) instead
|
||||
of mistralrs `/v1/models`.
|
||||
- **Router**: asks neuron `GET /models/{id}/endpoint` for the inference
|
||||
URL before proxying. Decouples cortex from knowing harness ports.
|
||||
- **Evictor**: calls `POST {neuron}/models/unload` instead of
|
||||
mistralrs directly. Uses catalogue for pinning.
|
||||
- **Tests**: all 22 gateway tests updated to mock neuron API instead
|
||||
of raw mistralrs. 36 total tests passing.
|
||||
|
||||
**Done when:** cortex has zero direct references to mistral.rs endpoints.
|
||||
All existing tests are updated and pass. New placement tests pass.
|
||||
`cortex.toml` only contains neuron endpoints. `models.toml` drives
|
||||
placement and pinning.
|
||||
Topology-aware placement (min_devices, min_device_vram_mb) deferred —
|
||||
the router currently routes based on polled model status. Catalogue
|
||||
placement matching can be added incrementally.
|
||||
|
||||
### Phase 10: neuron packaging (RPM)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user