Commit Graph

4 Commits

Author SHA1 Message Date
6779b7526a feat(neuron): load default_models on service activation
All checks were successful
CI / Format (push) Successful in 34s
CI / Clippy (push) Successful in 2m13s
CI / Test (push) Successful in 4m6s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
Stage 5 of the candle-native pivot. Adds first-class support for
auto-loading a configured set of models when the neuron service
activates.

Config:
- NeuronConfig.default_models: Vec<ModelSpec> (defaults to []).
- neuron.example.toml ships a commented [[default_models]] example.

Activation flow (crates/neuron/src/startup.rs::load_default_models):
- Sequential — VRAM contention makes parallel loads risky.
- Per-entry timing logged at info level on success.
- Failures logged as warnings; the next entry is still attempted.
- An empty list short-circuits without log noise.

Called from main.rs after the registry is built and before the axum
listener binds, so /models reflects the loaded state from the very
first request.

data/neuron.service gains TimeoutStartSec=1800s. With activation
blocked on potentially slow first-time HF downloads + GGUF
materialisation, systemd's default 90s would kill larger model loads
mid-flight.

Two non-gated tests in tests/activation.rs cover the
continues-past-failure and empty-list paths using a synthetically
unknown harness name to fail loads fast without touching the network.
The cuda-integration test from earlier stages still exercises the
real load/unload lifecycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:56:08 +03:00
5c2bd1a1da feat(neuron): wire candle harness load/unload via GGUF
Stage 2 of the candle-native pivot. Fleshes out CandleHarness with a
LoadedModel registry keyed by model_id, hf-hub-backed GGUF download,
and Qwen3 quantized weight construction via candle-transformers'
quantized_qwen3 module. unload_model drops the entry; Drop on the
candle ModelWeights frees device memory.

Device selection prefers CUDA (gated behind the new `cuda` feature),
falling back to CPU when CUDA is unavailable so default builds work
on non-GPU hosts. The candle CUDA toolchain isn't pulled in unless
`--features cuda` is passed, keeping CI green on CPU runners.

Config gains a [harness.candle] block with an optional hf_cache path.
HarnessRegistry::from_configs now takes HarnessSettings so per-harness
config flows through.

A gated tests/candle_lifecycle.rs exercises real load → list → unload
→ list-empty when run with `--features cuda-integration` against a
host with HF network access. The default-feature test in tests/api.rs
covers the wrong-harness rejection path without needing the network.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:02:49 +03:00
3f94c50817 chore: move default ports out of common-collision ranges
Previous defaults collided with well-trodden infra services and with
the Linux ephemeral port range:

- cortex API     8000 — common dev-server default (Django, minio UI)
- cortex metrics 9100 — Prometheus node_exporter default
- neuron API     9090 — Cockpit default on Fedora, Prometheus self

Move to helexa-themed palindromic ports, all below Linux's
32768-60999 ephemeral range and not registered to any well-known
service:

- cortex API     31313
- cortex metrics 31314
- neuron API     13131

Updated places:
- cortex.example.toml, neuron.example.toml defaults
- default impls in cortex-core and neuron config
- cortex-cli --endpoint default for the status subcommand
- doc comments citing example URLs
- README.md and CLAUDE.md snippets

Consumers already on the old ports need a one-line edit in their
/etc/cortex/cortex.toml or /etc/neuron/neuron.toml to match;
firewall rules and prometheus scrape configs will also need
updating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 17:45:25 +03:00
26e5e7ead8 feat: implement mistral.rs harness and neuron model API
All checks were successful
CI / Format, lint, build, test (push) Successful in 2m30s
CI / Build SRPM (push) Has been skipped
CI / Publish to COPR (push) Has been skipped
- MistralRsHarness: Harness trait impl wrapping mistral.rs HTTP API
  (list/load/unload models, health check, start/stop via systemd)
- HarnessRegistry: maps harness name -> Box<dyn Harness>, built from
  neuron.toml config
- Neuron API endpoints: GET /models, POST /models/load,
  POST /models/unload, GET /models/:id/endpoint
- NeuronConfig: figment-based config loading from neuron.toml
- Integration test: full model lifecycle through mock mistral.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:29:42 +03:00