feat(neuron): wire candle harness load/unload via GGUF

Stage 2 of the candle-native pivot. Fleshes out CandleHarness with a
LoadedModel registry keyed by model_id, hf-hub-backed GGUF download,
and Qwen3 quantized weight construction via candle-transformers'
quantized_qwen3 module. unload_model drops the entry; Drop on the
candle ModelWeights frees device memory.

Device selection prefers CUDA (gated behind the new `cuda` feature),
falling back to CPU when CUDA is unavailable so default builds work
on non-GPU hosts. The candle CUDA toolchain isn't pulled in unless
`--features cuda` is passed, keeping CI green on CPU runners.

Config gains a [harness.candle] block with an optional hf_cache path.
HarnessRegistry::from_configs now takes HarnessSettings so per-harness
config flows through.

A gated tests/candle_lifecycle.rs exercises real load → list → unload
→ list-empty when run with `--features cuda-integration` against a
host with HF network access. The default-feature test in tests/api.rs
covers the wrong-harness rejection path without needing the network.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-18 16:02:49 +03:00
parent 3cccc2c56b
commit 5c2bd1a1da
9 changed files with 1934 additions and 47 deletions

View File

@@ -6,7 +6,7 @@ use figment::{
providers::{Env, Format, Toml},
};
use serde::{Deserialize, Serialize};
use std::path::Path;
use std::path::{Path, PathBuf};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NeuronConfig {
@@ -14,6 +14,25 @@ pub struct NeuronConfig {
pub port: u16,
#[serde(default)]
pub harnesses: Vec<HarnessConfig>,
/// Per-harness configuration. Currently only `candle` is recognised.
#[serde(default)]
pub harness: HarnessSettings,
}
/// Settings for individual harness implementations. Each harness owns
/// its own sub-table so users only configure the harnesses they enable.
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
pub struct HarnessSettings {
#[serde(default)]
pub candle: CandleHarnessConfig,
}
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
pub struct CandleHarnessConfig {
/// HuggingFace cache directory for model weights.
/// When unset, defers to hf-hub's default (~/.cache/huggingface).
#[serde(default)]
pub hf_cache: Option<PathBuf>,
}
fn default_port() -> u16 {
@@ -35,6 +54,7 @@ impl Default for NeuronConfig {
Self {
port: 13131,
harnesses: vec![],
harness: HarnessSettings::default(),
}
}
}