refactor(neuron): cut mistralrs/llamacpp, scaffold candle harness

Stage 1 of the candle-native pivot. Replaces the external-process
harness model (mistralrs over HTTP, llamacpp placeholder) with an
in-process Harness trait whose sole implementation is candle. The
trait keeps its shape so future engines slot in additively, but
start/stop default to no-ops and HarnessConfig drops endpoint and
systemd_unit since no harness needs external supervision.

Behaviour is unchanged on the wire: load_model returns a "not
implemented yet (Stage 2)" error and list_models is empty. The
gateway-side proxy, poller, and router are untouched.

CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are
marked superseded; the staged plan lives in
~/.claude/plans/create-a-more-aggressive-calm-naur.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-18 15:53:04 +03:00
parent 7f797b0265
commit 3cccc2c56b
19 changed files with 203 additions and 401 deletions

View File

@@ -0,0 +1,54 @@
//! Candle harness — in-process inference using huggingface/candle.
//!
//! This is the sole `Harness` implementation. Unlike the previous
//! mistralrs/llamacpp harnesses, candle inference runs inside the neuron
//! process itself — no external subprocess, no systemd indirection.
//!
//! Stage 1 ships this as an inert skeleton; Stage 2 wires up actual
//! model load/unload via `candle-transformers`.
use anyhow::Result;
use async_trait::async_trait;
use cortex_core::harness::{Harness, HarnessHealth, ModelInfo, ModelSpec};
pub struct CandleHarness {
/// URL where this neuron serves inference (its own bind address).
bind_url: String,
}
impl CandleHarness {
pub fn new(bind_url: String) -> Self {
Self { bind_url }
}
}
#[async_trait]
impl Harness for CandleHarness {
fn name(&self) -> &str {
"candle"
}
async fn health(&self) -> HarnessHealth {
HarnessHealth {
name: "candle".into(),
running: true,
uptime_secs: None,
}
}
async fn list_models(&self) -> Result<Vec<ModelInfo>> {
Ok(Vec::new())
}
async fn load_model(&self, _spec: &ModelSpec) -> Result<()> {
anyhow::bail!("candle harness load_model not implemented yet (Stage 2)")
}
async fn unload_model(&self, _model_id: &str) -> Result<()> {
anyhow::bail!("candle harness unload_model not implemented yet (Stage 2)")
}
async fn inference_endpoint(&self, _model_id: &str) -> Option<String> {
Some(self.bind_url.clone())
}
}