refactor(neuron): cut mistralrs/llamacpp, scaffold candle harness
Stage 1 of the candle-native pivot. Replaces the external-process harness model (mistralrs over HTTP, llamacpp placeholder) with an in-process Harness trait whose sole implementation is candle. The trait keeps its shape so future engines slot in additively, but start/stop default to no-ops and HarnessConfig drops endpoint and systemd_unit since no harness needs external supervision. Behaviour is unchanged on the wire: load_model returns a "not implemented yet (Stage 2)" error and list_models is empty. The gateway-side proxy, poller, and router are untouched. CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are marked superseded; the staged plan lives in ~/.claude/plans/create-a-more-aggressive-calm-naur.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
54
crates/neuron/src/harness/candle.rs
Normal file
54
crates/neuron/src/harness/candle.rs
Normal file
@@ -0,0 +1,54 @@
|
||||
//! Candle harness — in-process inference using huggingface/candle.
|
||||
//!
|
||||
//! This is the sole `Harness` implementation. Unlike the previous
|
||||
//! mistralrs/llamacpp harnesses, candle inference runs inside the neuron
|
||||
//! process itself — no external subprocess, no systemd indirection.
|
||||
//!
|
||||
//! Stage 1 ships this as an inert skeleton; Stage 2 wires up actual
|
||||
//! model load/unload via `candle-transformers`.
|
||||
|
||||
use anyhow::Result;
|
||||
use async_trait::async_trait;
|
||||
use cortex_core::harness::{Harness, HarnessHealth, ModelInfo, ModelSpec};
|
||||
|
||||
pub struct CandleHarness {
|
||||
/// URL where this neuron serves inference (its own bind address).
|
||||
bind_url: String,
|
||||
}
|
||||
|
||||
impl CandleHarness {
|
||||
pub fn new(bind_url: String) -> Self {
|
||||
Self { bind_url }
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Harness for CandleHarness {
|
||||
fn name(&self) -> &str {
|
||||
"candle"
|
||||
}
|
||||
|
||||
async fn health(&self) -> HarnessHealth {
|
||||
HarnessHealth {
|
||||
name: "candle".into(),
|
||||
running: true,
|
||||
uptime_secs: None,
|
||||
}
|
||||
}
|
||||
|
||||
async fn list_models(&self) -> Result<Vec<ModelInfo>> {
|
||||
Ok(Vec::new())
|
||||
}
|
||||
|
||||
async fn load_model(&self, _spec: &ModelSpec) -> Result<()> {
|
||||
anyhow::bail!("candle harness load_model not implemented yet (Stage 2)")
|
||||
}
|
||||
|
||||
async fn unload_model(&self, _model_id: &str) -> Result<()> {
|
||||
anyhow::bail!("candle harness unload_model not implemented yet (Stage 2)")
|
||||
}
|
||||
|
||||
async fn inference_endpoint(&self, _model_id: &str) -> Option<String> {
|
||||
Some(self.bind_url.clone())
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user