feat(neuron): wire candle harness load/unload via GGUF

Stage 2 of the candle-native pivot. Fleshes out CandleHarness with a
LoadedModel registry keyed by model_id, hf-hub-backed GGUF download,
and Qwen3 quantized weight construction via candle-transformers'
quantized_qwen3 module. unload_model drops the entry; Drop on the
candle ModelWeights frees device memory.

Device selection prefers CUDA (gated behind the new `cuda` feature),
falling back to CPU when CUDA is unavailable so default builds work
on non-GPU hosts. The candle CUDA toolchain isn't pulled in unless
`--features cuda` is passed, keeping CI green on CPU runners.

Config gains a [harness.candle] block with an optional hf_cache path.
HarnessRegistry::from_configs now takes HarnessSettings so per-harness
config flows through.

A gated tests/candle_lifecycle.rs exercises real load → list → unload
→ list-empty when run with `--features cuda-integration` against a
host with HF network access. The default-feature test in tests/api.rs
covers the wrong-harness rejection path without needing the network.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-18 16:02:49 +03:00
parent 3cccc2c56b
commit 5c2bd1a1da
9 changed files with 1934 additions and 47 deletions

View File

@@ -12,6 +12,18 @@ path = "src/lib.rs"
name = "neuron"
path = "src/main.rs"
[features]
default = []
# Enables CUDA acceleration in candle. Without this feature, candle
# compiles for CPU only and Device::new_cuda calls fall back to CPU.
cuda = [
"candle-core/cuda",
"candle-nn/cuda",
"candle-transformers/cuda",
]
# Reserved for GPU-only integration tests in later stages.
cuda-integration = ["cuda"]
[dependencies]
cortex-core.workspace = true
tokio.workspace = true
@@ -27,6 +39,15 @@ clap.workspace = true
figment.workspace = true
toml.workspace = true
# candle for in-process inference. CUDA support is gated behind the
# crate's `cuda` feature (default off) so the workspace builds on
# non-CUDA hosts and CI runners.
candle-core = "0.10.2"
candle-nn = "0.10.2"
candle-transformers = "0.10.2"
tokenizers = { version = "0.22", default-features = false, features = ["onig"] }
hf-hub = { version = "0.4", features = ["tokio"] }
[dev-dependencies]
tokio = { workspace = true, features = ["test-util"] }
reqwest.workspace = true