cortex

helexa/cortex

Fork 0

Commit Graph

Author	SHA1	Message	Date
rob thijssen	d4e1b05956	feat(neuron,cortex-core): source-aware loader (scheme:org/name) All checks were successful CI / CUDA type-check (push) Successful in 46s Details CI / Format (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 42s Details CI / Clippy (push) Successful in 2m40s Details build-prerelease / Build cortex binary (push) Successful in 4m23s Details CI / Test (push) Successful in 5m28s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 5m39s Details build-prerelease / Package cortex RPM (push) Successful in 1m19s Details build-prerelease / Build neuron-ampere (push) Successful in 7m53s Details build-prerelease / Build neuron-ada (push) Successful in 5m18s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s Details Phase 1 of plan-source-aware-loader-preflight. Makes neuron's loader treat `huggingface:org/name` and `helexa:org/name` as first-class distinct sources with per-source endpoint + cache, while staying backwards-compatible with bare `org/name` ids. Zero behavior change for existing operator configs. Motivation: helexa is adding an EU-hosted registry (`registry.helexa.ai`) alongside HF. Both speak HF-compatible wire format, but the bytes, jurisdiction, trust root, and cache namespace are distinct. The loader needs to disambiguate which registry serves a given model id, and to keep their caches from colliding on disk when both happen to host the same `org/name`. What lands: - `cortex-core::source` — new module. `ModelSourceId { scheme, org, name }` with `FromStr` accepting both `scheme:org/name` and bare `org/name`. `Display` round-trips. `repo_path()` emits the `org/name` half for the hf-hub `Api::model(...)` call regardless of which scheme/endpoint we're hitting. Rejects malformed input with typed `ParseError` variants (empty scheme, missing slash, scheme with `/`, name with `:`, etc.). - `neuron::config::CandleHarnessConfig` gains `default_source: Option<String>` and `sources: HashMap<String, SourceConfig>`. `SourceConfig` mirrors what `hf_hub::ApiBuilder` consumes: endpoint URL, optional `auth_env` (env var name read at startup so secrets stay out of TOML), and optional cache_dir. Defaults synthesise a `huggingface` entry pointing at `https://huggingface.co` with the legacy `hf_cache` field as its cache_dir — so existing configs that only set `hf_cache` keep working unchanged. - `CandleHarness::new(bind_url, &CandleHarnessConfig)` replaces `CandleHarness::new(bind_url, hf_cache)`. Resolves every configured source's auth env var and cache dir up front so `hf_api_for(scheme)` is a pure HashMap lookup on the hot load path. Only the `huggingface` scheme gets the legacy `HF_HUB_CACHE`/`HF_HOME` env-var fallback chain; other schemes resolve to whatever the operator typed. - `hf_api()` -> `hf_api_for(scheme)`. Builds an `hf_hub::Api` with the source's endpoint, cache_dir, and auth token. Errors with a useful message naming the configured schemes when an unknown scheme is requested. - `CandleHarness::load_model` parses `spec.model_id` into a `ModelSourceId`, substitutes `default_source` for bare ids, and threads the parsed source through `preflight`, `resolve_files`, `resolve_dense_files`, `load_arch_gguf`, `load_arch_dense`, and `load_tp`. The hf-hub `Api::model()` call now uses `source_id.repo_path()` so registry calls hit the right URL shape regardless of scheme. - `preflight()` signature gains a `&ModelSourceId` parameter (it's the canonical id for log lines and error display); `RepoFetchFailed.model_id` etc. now carry the scheme-qualified form so operator-visible errors echo exactly what was configured. - `neuron.example.toml` documents the new `[harness.candle.sources.*]` table with commented-out examples for `huggingface` (explicit override) and `helexa`. Tests: - 13 new unit tests in `cortex-core::source` covering parse / display round-trip, default-scheme substitution semantics, and every `ParseError` variant. - 6 new unit tests in `neuron::config` covering the `effective_sources` synth (legacy `hf_cache` carry-through, explicit override preservation, helexa-alongside-huggingface) and `effective_default_source` fallback. - 2 new unit tests in `harness::candle::tests` covering multi-scheme `hf_api_for` routing, including the "unknown scheme" error path naming configured schemes. - Preflight integration tests updated to construct `ModelSourceId` and assert against the scheme-qualified error form. CI gate: cargo fmt --check, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace (all 24 test groups ok, zero failures). Out of scope (Phase 3): - Cortex catalogue `source` field — independent of Phase 1+2, ships when the registry comes online. - `helexa` source endpoint itself — separate project; this PR adds the client-side rails only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 13:42:11 +03:00
rob thijssen	61adff347a	feat(neuron): preflight placement check with structured errors Some checks failed CI / CUDA type-check (push) Successful in 31s Details CI / Format (push) Successful in 30s Details build-prerelease / Resolve version stamps (push) Successful in 48s Details CI / Test (push) Failing after 1m10s Details CI / Clippy (push) Successful in 2m49s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m25s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m53s Details build-prerelease / Package cortex RPM (push) Successful in 1m20s Details build-prerelease / Build neuron-ampere (push) Successful in 8m0s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details Phase 2 of plan-source-aware-loader-preflight. Adds a one-RTT placement feasibility check that runs before any device allocation, NCCL handshake, or weight fetch. Replaces today's opaque "fetch config.json … 404" failure mode (when an operator points `tensor_parallel = 2` at a GGUF-only repo) with a structured error that names the failure class and points at the fix. What lands: - `crates/neuron/src/harness/preflight.rs` — new module. Classifies a repo's siblings listing into `SourceFormat` (Gguf \| DenseSafetensors \| Mixed \| Empty), applies the tp/quant feasibility table, returns a `PlacementPlan` on success or a typed `PreflightError` on rejection. `PreflightError` is `serde::Serialize` so the HTTP layer can emit the structured shape verbatim; it's `thiserror::Error` so log lines get a single-line Display when downcasting from anyhow. Includes best-effort Levenshtein-nearest suggestion for malformed quant names (the second sharp edge the HauhauCS scenario surfaced — operator writes `q6k` against filenames containing `Q6_K_P`, and today's matcher just says "no GGUF file matching quant"). - `CandleHarness::load_model` — calls `preflight(...)` first thing after the "already loaded" guard, before any `ensure_device_worker` or `resolve_*`. Failure wraps the typed error in `anyhow::Error` so the existing trait surface is unchanged; the HTTP handler and the startup logger downcast to recover the structured form. - `crates/neuron/src/api.rs::load_model` handler — maps `PreflightError` to 422 Unprocessable Entity with `{"error": {"kind": "...", "model_id": "...", "suggestion": "..." }}`. Other failures keep the existing 400 + free-form `format!("{e:#}")` shape. - `crates/neuron/src/startup.rs::load_default_models` — when the failure is a preflight rejection, log as `reason=<kind> detail=<msg>` instead of the opaque `error=<chain>`, so journalctl on beast will now show `reason=tp_requires_safetensors detail="repo is GGUF-only (8 .gguf files); TP requires dense safetensors..."` instead of `error=fetch config.json from HauhauCS/...: 404 Not Found`. Tests: - 18 unit tests in `harness/preflight.rs` covering classifier, quant matching, Levenshtein, error serialization, and the full feasibility table (gguf+tp rejected, gguf+bad-quant suggests nearest, gguf+good-quant ok, dense+tp ok, empty rejected, mixed prefers safetensors). - 7 integration tests in `tests/preflight.rs` exercising the network path through an axum mock that serves hf-hub-compatible `/api/models/{org}/{name}/revision/main` payloads. Adds `tempfile` as a dev-dependency for per-test cache dirs. Out of scope (deferred to subsequent phases): - Phase 1 (source-aware loader plumbing — `scheme:org/name` parsing, per-scheme `SourceConfig`, cache disambiguation). Preflight runs against the single configured HuggingFace source today; the scheme threading lands cleanly when Phase 1 ships. - Phase 3 (cortex catalogue source field). - GGUF tensor-parallel loading. Preflight rejects this combination with `TpRequiresSafetensors`; the underlying loader gap is the separate `Helexa` curated-registry / heretic-rs conversation. Refs #4-#9 architectural follow-up; no specific issue closed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 13:24:30 +03:00

Author

SHA1

Message

Date

rob thijssen

d4e1b05956

feat(neuron,cortex-core): source-aware loader (scheme:org/name)

CI / CUDA type-check (push) Successful in 46s

Details

CI / Format (push) Successful in 32s

Details

build-prerelease / Resolve version stamps (push) Successful in 42s

Details

CI / Clippy (push) Successful in 2m40s

Details

build-prerelease / Build cortex binary (push) Successful in 4m23s

Details

CI / Test (push) Successful in 5m28s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

build-prerelease / Build neuron-blackwell (push) Successful in 5m39s

Details

build-prerelease / Package cortex RPM (push) Successful in 1m19s

Details

build-prerelease / Build neuron-ampere (push) Successful in 7m53s

Details

build-prerelease / Build neuron-ada (push) Successful in 5m18s

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s

Details

Phase 1 of plan-source-aware-loader-preflight. Makes neuron's
loader treat `huggingface:org/name` and `helexa:org/name` as
first-class distinct sources with per-source endpoint + cache,
while staying backwards-compatible with bare `org/name` ids.
Zero behavior change for existing operator configs.

Motivation: helexa is adding an EU-hosted registry
(`registry.helexa.ai`) alongside HF. Both speak HF-compatible
wire format, but the bytes, jurisdiction, trust root, and cache
namespace are distinct. The loader needs to disambiguate which
registry serves a given model id, and to keep their caches from
colliding on disk when both happen to host the same `org/name`.

What lands:

- `cortex-core::source` — new module. `ModelSourceId { scheme,
  org, name }` with `FromStr` accepting both `scheme:org/name`
  and bare `org/name`. `Display` round-trips. `repo_path()`
  emits the `org/name` half for the hf-hub `Api::model(...)`
  call regardless of which scheme/endpoint we're hitting.
  Rejects malformed input with typed `ParseError` variants
  (empty scheme, missing slash, scheme with `/`, name with
  `:`, etc.).

- `neuron::config::CandleHarnessConfig` gains
  `default_source: Option<String>` and
  `sources: HashMap<String, SourceConfig>`. `SourceConfig`
  mirrors what `hf_hub::ApiBuilder` consumes: endpoint URL,
  optional `auth_env` (env var name read at startup so secrets
  stay out of TOML), and optional cache_dir. Defaults
  synthesise a `huggingface` entry pointing at
  `https://huggingface.co` with the legacy `hf_cache` field as
  its cache_dir — so existing configs that only set `hf_cache`
  keep working unchanged.

- `CandleHarness::new(bind_url, &CandleHarnessConfig)` replaces
  `CandleHarness::new(bind_url, hf_cache)`. Resolves every
  configured source's auth env var and cache dir up front so
  `hf_api_for(scheme)` is a pure HashMap lookup on the hot
  load path. Only the `huggingface` scheme gets the legacy
  `HF_HUB_CACHE`/`HF_HOME` env-var fallback chain; other
  schemes resolve to whatever the operator typed.

- `hf_api()` -> `hf_api_for(scheme)`. Builds an
  `hf_hub::Api` with the source's endpoint, cache_dir, and
  auth token. Errors with a useful message naming the
  configured schemes when an unknown scheme is requested.

- `CandleHarness::load_model` parses `spec.model_id` into a
  `ModelSourceId`, substitutes `default_source` for bare ids,
  and threads the parsed source through `preflight`,
  `resolve_files`, `resolve_dense_files`, `load_arch_gguf`,
  `load_arch_dense`, and `load_tp`. The hf-hub `Api::model()`
  call now uses `source_id.repo_path()` so registry calls hit
  the right URL shape regardless of scheme.

- `preflight()` signature gains a `&ModelSourceId` parameter
  (it's the canonical id for log lines and error display);
  `RepoFetchFailed.model_id` etc. now carry the
  scheme-qualified form so operator-visible errors echo
  exactly what was configured.

- `neuron.example.toml` documents the new
  `[harness.candle.sources.*]` table with commented-out
  examples for `huggingface` (explicit override) and `helexa`.

Tests:

- 13 new unit tests in `cortex-core::source` covering parse /
  display round-trip, default-scheme substitution semantics,
  and every `ParseError` variant.
- 6 new unit tests in `neuron::config` covering the
  `effective_sources` synth (legacy `hf_cache` carry-through,
  explicit override preservation, helexa-alongside-huggingface)
  and `effective_default_source` fallback.
- 2 new unit tests in `harness::candle::tests` covering
  multi-scheme `hf_api_for` routing, including the
  "unknown scheme" error path naming configured schemes.
- Preflight integration tests updated to construct
  `ModelSourceId` and assert against the scheme-qualified
  error form.

CI gate: cargo fmt --check, cargo clippy --workspace
--all-targets -- -D warnings, cargo test --workspace (all 24
test groups ok, zero failures).

Out of scope (Phase 3):
- Cortex catalogue `source` field — independent of Phase 1+2,
  ships when the registry comes online.
- `helexa` source endpoint itself — separate project; this
  PR adds the client-side rails only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-06-01 13:42:11 +03:00

rob thijssen

61adff347a

feat(neuron): preflight placement check with structured errors

CI / CUDA type-check (push) Successful in 31s

Details

CI / Format (push) Successful in 30s

Details

build-prerelease / Resolve version stamps (push) Successful in 48s

Details

CI / Test (push) Failing after 1m10s

Details

CI / Clippy (push) Successful in 2m49s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

build-prerelease / Build cortex binary (push) Successful in 4m25s

Details

build-prerelease / Build neuron-blackwell (push) Successful in 5m53s

Details

build-prerelease / Package cortex RPM (push) Successful in 1m20s

Details

build-prerelease / Build neuron-ampere (push) Successful in 8m0s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled

Details

build-prerelease / Build neuron-ada (push) Has been cancelled

Details

Phase 2 of plan-source-aware-loader-preflight. Adds a one-RTT
placement feasibility check that runs before any device allocation,
NCCL handshake, or weight fetch. Replaces today's opaque
"fetch config.json … 404" failure mode (when an operator points
`tensor_parallel = 2` at a GGUF-only repo) with a structured
error that names the failure class and points at the fix.

What lands:

- `crates/neuron/src/harness/preflight.rs` — new module. Classifies
  a repo's siblings listing into `SourceFormat` (Gguf | DenseSafetensors
  | Mixed | Empty), applies the tp/quant feasibility table, returns a
  `PlacementPlan` on success or a typed `PreflightError` on rejection.
  `PreflightError` is `serde::Serialize` so the HTTP layer can emit
  the structured shape verbatim; it's `thiserror::Error` so log lines
  get a single-line Display when downcasting from anyhow. Includes
  best-effort Levenshtein-nearest suggestion for malformed quant names
  (the second sharp edge the HauhauCS scenario surfaced — operator
  writes `q6k` against filenames containing `Q6_K_P`, and today's
  matcher just says "no GGUF file matching quant").
- `CandleHarness::load_model` — calls `preflight(...)` first thing
  after the "already loaded" guard, before any `ensure_device_worker`
  or `resolve_*`. Failure wraps the typed error in `anyhow::Error` so
  the existing trait surface is unchanged; the HTTP handler and the
  startup logger downcast to recover the structured form.
- `crates/neuron/src/api.rs::load_model` handler — maps `PreflightError`
  to 422 Unprocessable Entity with `{"error": {"kind": "...",
  "model_id": "...", "suggestion": "..." }}`. Other failures keep
  the existing 400 + free-form `format!("{e:#}")` shape.
- `crates/neuron/src/startup.rs::load_default_models` — when the
  failure is a preflight rejection, log as `reason=<kind> detail=<msg>`
  instead of the opaque `error=<chain>`, so journalctl on beast will
  now show `reason=tp_requires_safetensors detail="repo is GGUF-only
  (8 .gguf files); TP requires dense safetensors..."` instead of
  `error=fetch config.json from HauhauCS/...: 404 Not Found`.

Tests:

- 18 unit tests in `harness/preflight.rs` covering classifier,
  quant matching, Levenshtein, error serialization, and the full
  feasibility table (gguf+tp rejected, gguf+bad-quant suggests
  nearest, gguf+good-quant ok, dense+tp ok, empty rejected, mixed
  prefers safetensors).
- 7 integration tests in `tests/preflight.rs` exercising the
  network path through an axum mock that serves hf-hub-compatible
  `/api/models/{org}/{name}/revision/main` payloads. Adds `tempfile`
  as a dev-dependency for per-test cache dirs.

Out of scope (deferred to subsequent phases):

- Phase 1 (source-aware loader plumbing — `scheme:org/name` parsing,
  per-scheme `SourceConfig`, cache disambiguation). Preflight runs
  against the single configured HuggingFace source today; the scheme
  threading lands cleanly when Phase 1 ships.
- Phase 3 (cortex catalogue source field).
- GGUF tensor-parallel loading. Preflight rejects this combination
  with `TpRequiresSafetensors`; the underlying loader gap is the
  separate `Helexa` curated-registry / heretic-rs conversation.

Refs #4-#9 architectural follow-up; no specific issue closed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-06-01 13:24:30 +03:00

2 Commits