All checks were successful
CI / CUDA type-check (push) Successful in 46s
CI / Format (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 42s
CI / Clippy (push) Successful in 2m40s
build-prerelease / Build cortex binary (push) Successful in 4m23s
CI / Test (push) Successful in 5m28s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 5m39s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Build neuron-ampere (push) Successful in 7m53s
build-prerelease / Build neuron-ada (push) Successful in 5m18s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s
Phase 1 of plan-source-aware-loader-preflight. Makes neuron's
loader treat `huggingface:org/name` and `helexa:org/name` as
first-class distinct sources with per-source endpoint + cache,
while staying backwards-compatible with bare `org/name` ids.
Zero behavior change for existing operator configs.
Motivation: helexa is adding an EU-hosted registry
(`registry.helexa.ai`) alongside HF. Both speak HF-compatible
wire format, but the bytes, jurisdiction, trust root, and cache
namespace are distinct. The loader needs to disambiguate which
registry serves a given model id, and to keep their caches from
colliding on disk when both happen to host the same `org/name`.
What lands:
- `cortex-core::source` — new module. `ModelSourceId { scheme,
org, name }` with `FromStr` accepting both `scheme:org/name`
and bare `org/name`. `Display` round-trips. `repo_path()`
emits the `org/name` half for the hf-hub `Api::model(...)`
call regardless of which scheme/endpoint we're hitting.
Rejects malformed input with typed `ParseError` variants
(empty scheme, missing slash, scheme with `/`, name with
`:`, etc.).
- `neuron::config::CandleHarnessConfig` gains
`default_source: Option<String>` and
`sources: HashMap<String, SourceConfig>`. `SourceConfig`
mirrors what `hf_hub::ApiBuilder` consumes: endpoint URL,
optional `auth_env` (env var name read at startup so secrets
stay out of TOML), and optional cache_dir. Defaults
synthesise a `huggingface` entry pointing at
`https://huggingface.co` with the legacy `hf_cache` field as
its cache_dir — so existing configs that only set `hf_cache`
keep working unchanged.
- `CandleHarness::new(bind_url, &CandleHarnessConfig)` replaces
`CandleHarness::new(bind_url, hf_cache)`. Resolves every
configured source's auth env var and cache dir up front so
`hf_api_for(scheme)` is a pure HashMap lookup on the hot
load path. Only the `huggingface` scheme gets the legacy
`HF_HUB_CACHE`/`HF_HOME` env-var fallback chain; other
schemes resolve to whatever the operator typed.
- `hf_api()` -> `hf_api_for(scheme)`. Builds an
`hf_hub::Api` with the source's endpoint, cache_dir, and
auth token. Errors with a useful message naming the
configured schemes when an unknown scheme is requested.
- `CandleHarness::load_model` parses `spec.model_id` into a
`ModelSourceId`, substitutes `default_source` for bare ids,
and threads the parsed source through `preflight`,
`resolve_files`, `resolve_dense_files`, `load_arch_gguf`,
`load_arch_dense`, and `load_tp`. The hf-hub `Api::model()`
call now uses `source_id.repo_path()` so registry calls hit
the right URL shape regardless of scheme.
- `preflight()` signature gains a `&ModelSourceId` parameter
(it's the canonical id for log lines and error display);
`RepoFetchFailed.model_id` etc. now carry the
scheme-qualified form so operator-visible errors echo
exactly what was configured.
- `neuron.example.toml` documents the new
`[harness.candle.sources.*]` table with commented-out
examples for `huggingface` (explicit override) and `helexa`.
Tests:
- 13 new unit tests in `cortex-core::source` covering parse /
display round-trip, default-scheme substitution semantics,
and every `ParseError` variant.
- 6 new unit tests in `neuron::config` covering the
`effective_sources` synth (legacy `hf_cache` carry-through,
explicit override preservation, helexa-alongside-huggingface)
and `effective_default_source` fallback.
- 2 new unit tests in `harness::candle::tests` covering
multi-scheme `hf_api_for` routing, including the
"unknown scheme" error path naming configured schemes.
- Preflight integration tests updated to construct
`ModelSourceId` and assert against the scheme-qualified
error form.
CI gate: cargo fmt --check, cargo clippy --workspace
--all-targets -- -D warnings, cargo test --workspace (all 24
test groups ok, zero failures).
Out of scope (Phase 3):
- Cortex catalogue `source` field — independent of Phase 1+2,
ships when the registry comes online.
- `helexa` source endpoint itself — separate project; this
PR adds the client-side rails only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
82 lines
3.3 KiB
TOML
82 lines
3.3 KiB
TOML
# neuron.example.toml — example configuration
|
|
#
|
|
# Copy to /etc/neuron/neuron.toml and adjust for your environment.
|
|
#
|
|
# Environment variable overrides use NEURON_ prefix with __ separators:
|
|
# NEURON_PORT=13131
|
|
|
|
port = 13131
|
|
|
|
# -- Harnesses ---------------------------------------------------------------
|
|
# Each [[harnesses]] entry enables an inference engine. Currently only
|
|
# "candle" is supported — it runs in-process and uses huggingface/candle
|
|
# for inference on local CUDA devices (or CPU when CUDA is unavailable).
|
|
|
|
[[harnesses]]
|
|
name = "candle"
|
|
|
|
# -- Candle harness settings -------------------------------------------------
|
|
# Optional tuning for the candle harness.
|
|
|
|
[harness.candle]
|
|
# HuggingFace cache directory for model weights.
|
|
#
|
|
# Resolution order (first hit wins):
|
|
# 1. `hf_cache` here in this file (applies to the synth `huggingface`
|
|
# source only — see [harness.candle.sources.*] below for explicit
|
|
# per-source paths).
|
|
# 2. `HF_HUB_CACHE` env var — same convention as the Python
|
|
# `huggingface_hub` library, so an existing cache directory shared
|
|
# with other tooling can be reused without per-tool config.
|
|
# 3. `HF_HOME` env var (cache appended as `$HF_HOME/hub`).
|
|
# 4. hf-hub's default (`~/.cache/huggingface/hub`).
|
|
#
|
|
# For per-host overrides (e.g. one neuron has an SSD with prefetched
|
|
# weights), prefer a systemd drop-in over editing this file:
|
|
# /etc/systemd/system/neuron.service.d/local.conf:
|
|
# [Service]
|
|
# Environment=HF_HUB_CACHE=/archive/hf-cache
|
|
# hf_cache = "/var/lib/neuron/hf-cache"
|
|
|
|
# Default scheme applied to bare `org/name` model ids (those without a
|
|
# `scheme:` prefix). Defaults to "huggingface" when unset. Set to
|
|
# "helexa" to make `default_models = [{ model_id = "Helexa/Foo" }]`
|
|
# resolve via the helexa registry without prefixing every entry.
|
|
# default_source = "huggingface"
|
|
|
|
# Per-scheme source endpoints. Each scheme maps to an HF-compatible
|
|
# registry. The `huggingface` source is auto-synthesised pointing at
|
|
# `https://huggingface.co` when omitted; declare it explicitly here to
|
|
# override the endpoint, auth env, or cache dir.
|
|
#
|
|
# [harness.candle.sources.huggingface]
|
|
# endpoint = "https://huggingface.co"
|
|
# auth_env = "HF_TOKEN" # optional bearer token via env var
|
|
# cache_dir = "/archive3/llm-cache/huggingface"
|
|
#
|
|
# Add helexa (or any operator-run mirror speaking the HF-compatible
|
|
# wire format) by adding another sources entry. Caches are
|
|
# disambiguated per scheme so a mirror serving the same `org/name` as
|
|
# HF cannot collide on disk.
|
|
#
|
|
# [harness.candle.sources.helexa]
|
|
# endpoint = "https://registry.helexa.ai"
|
|
# auth_env = "HELEXA_TOKEN"
|
|
# cache_dir = "/archive3/llm-cache/helexa"
|
|
|
|
# -- Default models ----------------------------------------------------------
|
|
# Models listed here are loaded automatically when the neuron service
|
|
# activates. Loading is sequential — a slow or failing entry doesn't
|
|
# block the rest of the fleet, but it does push out the time before
|
|
# neuron starts serving HTTP, so keep the list short. Operators can
|
|
# load additional models on demand via POST /models/load.
|
|
#
|
|
# Make sure data/neuron.service's TimeoutStartSec is generous enough to
|
|
# cover the slowest entry's first-time download + materialisation.
|
|
|
|
# [[default_models]]
|
|
# model_id = "Qwen/Qwen3-0.6B-GGUF"
|
|
# harness = "candle"
|
|
# quant = "Q4_K_M"
|
|
# devices = [0]
|