5731f4c318dd8d5b5f70cd8199cd6bb790831964
5 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
d4e1b05956
|
feat(neuron,cortex-core): source-aware loader (scheme:org/name)
All checks were successful
CI / CUDA type-check (push) Successful in 46s
CI / Format (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 42s
CI / Clippy (push) Successful in 2m40s
build-prerelease / Build cortex binary (push) Successful in 4m23s
CI / Test (push) Successful in 5m28s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 5m39s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Build neuron-ampere (push) Successful in 7m53s
build-prerelease / Build neuron-ada (push) Successful in 5m18s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s
Phase 1 of plan-source-aware-loader-preflight. Makes neuron's
loader treat `huggingface:org/name` and `helexa:org/name` as
first-class distinct sources with per-source endpoint + cache,
while staying backwards-compatible with bare `org/name` ids.
Zero behavior change for existing operator configs.
Motivation: helexa is adding an EU-hosted registry
(`registry.helexa.ai`) alongside HF. Both speak HF-compatible
wire format, but the bytes, jurisdiction, trust root, and cache
namespace are distinct. The loader needs to disambiguate which
registry serves a given model id, and to keep their caches from
colliding on disk when both happen to host the same `org/name`.
What lands:
- `cortex-core::source` — new module. `ModelSourceId { scheme,
org, name }` with `FromStr` accepting both `scheme:org/name`
and bare `org/name`. `Display` round-trips. `repo_path()`
emits the `org/name` half for the hf-hub `Api::model(...)`
call regardless of which scheme/endpoint we're hitting.
Rejects malformed input with typed `ParseError` variants
(empty scheme, missing slash, scheme with `/`, name with
`:`, etc.).
- `neuron::config::CandleHarnessConfig` gains
`default_source: Option<String>` and
`sources: HashMap<String, SourceConfig>`. `SourceConfig`
mirrors what `hf_hub::ApiBuilder` consumes: endpoint URL,
optional `auth_env` (env var name read at startup so secrets
stay out of TOML), and optional cache_dir. Defaults
synthesise a `huggingface` entry pointing at
`https://huggingface.co` with the legacy `hf_cache` field as
its cache_dir — so existing configs that only set `hf_cache`
keep working unchanged.
- `CandleHarness::new(bind_url, &CandleHarnessConfig)` replaces
`CandleHarness::new(bind_url, hf_cache)`. Resolves every
configured source's auth env var and cache dir up front so
`hf_api_for(scheme)` is a pure HashMap lookup on the hot
load path. Only the `huggingface` scheme gets the legacy
`HF_HUB_CACHE`/`HF_HOME` env-var fallback chain; other
schemes resolve to whatever the operator typed.
- `hf_api()` -> `hf_api_for(scheme)`. Builds an
`hf_hub::Api` with the source's endpoint, cache_dir, and
auth token. Errors with a useful message naming the
configured schemes when an unknown scheme is requested.
- `CandleHarness::load_model` parses `spec.model_id` into a
`ModelSourceId`, substitutes `default_source` for bare ids,
and threads the parsed source through `preflight`,
`resolve_files`, `resolve_dense_files`, `load_arch_gguf`,
`load_arch_dense`, and `load_tp`. The hf-hub `Api::model()`
call now uses `source_id.repo_path()` so registry calls hit
the right URL shape regardless of scheme.
- `preflight()` signature gains a `&ModelSourceId` parameter
(it's the canonical id for log lines and error display);
`RepoFetchFailed.model_id` etc. now carry the
scheme-qualified form so operator-visible errors echo
exactly what was configured.
- `neuron.example.toml` documents the new
`[harness.candle.sources.*]` table with commented-out
examples for `huggingface` (explicit override) and `helexa`.
Tests:
- 13 new unit tests in `cortex-core::source` covering parse /
display round-trip, default-scheme substitution semantics,
and every `ParseError` variant.
- 6 new unit tests in `neuron::config` covering the
`effective_sources` synth (legacy `hf_cache` carry-through,
explicit override preservation, helexa-alongside-huggingface)
and `effective_default_source` fallback.
- 2 new unit tests in `harness::candle::tests` covering
multi-scheme `hf_api_for` routing, including the
"unknown scheme" error path naming configured schemes.
- Preflight integration tests updated to construct
`ModelSourceId` and assert against the scheme-qualified
error form.
CI gate: cargo fmt --check, cargo clippy --workspace
--all-targets -- -D warnings, cargo test --workspace (all 24
test groups ok, zero failures).
Out of scope (Phase 3):
- Cortex catalogue `source` field — independent of Phase 1+2,
ships when the registry comes online.
- `helexa` source endpoint itself — separate project; this
PR adds the client-side rails only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|||
|
957f704efa
|
feat(neuron): OpenAI Responses API + ci cuda-check runner label
Some checks failed
build-prerelease / Package cortex RPM (push) Blocked by required conditions
CI / CUDA type-check (push) Failing after 11s
build-prerelease / Resolve version stamps (push) Successful in 30s
CI / Format (push) Successful in 32s
CI / Clippy (push) Successful in 2m31s
build-prerelease / Build cortex binary (push) Successful in 4m32s
CI / Test (push) Successful in 5m42s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 6m8s
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
build-prerelease / Build neuron-ada (push) Has been cancelled
Step 2 of the Responses rollout: native `/v1/responses` endpoint on
neuron that consumes the same InferenceEvent stream as
`/v1/chat/completions` but emits it as the Responses API's named
SSE event family. No gateway-side translation.
## Surface
- `cortex-core::responses` envelope types: `ResponsesRequest`,
`ResponsesInput` (text | items), `ResponsesInputItem` (message |
function_call | function_call_output | reasoning),
`ResponsesContentPart` (input_text | input_image | output_text),
`ResponsesResponse`, `ResponsesOutputItem`, `ResponsesUsage`. Plus
a `events::*` constant module so the projector and the wire shape
stay in sync without string-typos.
- `neuron::wire::openai_responses`:
- `request_to_chat(req)` flattens Responses input + instructions
into a `ChatCompletionRequest` the candle harness already
understands. Text-only Parts collapse to a string; mixed
text+image Parts go to chat's content-array shape; reasoning
items drop; function_call / function_call_output round-trip
via tool_calls / tool_call_id metadata so the surface is
consistent for the day the harness emits tool calls.
- `project_responses_stream(rx, meta)` reads InferenceEvents
and emits the eight named events that compose a Responses
stream: response.created → output_item.added → content_part.added
→ output_text.delta×N → output_text.done → content_part.done
→ output_item.done → response.completed. Synthesises start
frames if the producer skips Start (poisoned model, early
disconnect) so the stream stays coherent.
- `build_response(meta, text, reason, usage)` for the
non-streaming path.
- `CandleHarness::inference_stream(req)` extracted from
`chat_completion_stream`, returning a typed `InferenceStream`
(event receiver + id/created/model_id metadata). Both
`chat_completion_stream` and the new `responses_stream` are now
thin wrappers that pick their wire projection. TP path got the
same treatment (`chat_completion_tp_stream` → `inference_tp_stream`).
- `POST /v1/responses` route on neuron. Non-streaming returns one
buffered `ResponsesResponse`; streaming returns axum SSE with
both event names and JSON data per frame (Responses, unlike
chat completions, uses named `event:` lines). Reused
`inference_error_response` helper hoisted out so the chat and
responses handlers share the InferenceError → HTTP mapping.
## CI
Also bundles the `cuda-check` runner-label fix from feedback on
commit
|
|||
|
e42e8ee81f
|
refactor: cortex talks to neurons instead of mistral.rs directly
Replace NodeConfig (static vram_mb, pinned) with NeuronEndpoint.
Hardware discovery and model pinning now come from neuron API and
models.toml catalogue respectively.
- config.rs: nodes -> neurons, add models_config path
- catalogue.rs: ModelProfile with pinned_on, ModelCatalogue
- poller.rs: poll neuron GET /models (ModelInfo format)
- router.rs: resolve inference endpoint via neuron GET /models/{id}/endpoint
- evictor.rs: call neuron POST /models/unload
- node.rs: remove vram_mb, pinned fields (come from discovery/catalogue)
- All 22 gateway tests updated to mock neuron API
- Remove MistralModelsResponse, ModelLifecycleRequest (no longer needed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|||
|
6dc717ebcd
|
feat: add neuron daemon with GPU discovery and health endpoints
Replace cortex-agent stub with neuron (cortex-neuron binary). cortex-core additions: - discovery.rs: DeviceInfo, DiscoveryResponse, DeviceHealth, HealthResponse - harness.rs: Harness async trait, HarnessConfig, ModelSpec, ModelInfo neuron crate (crates/neuron/): - discovery.rs: nvidia-smi CSV parsing (pure functions) + system discovery via uname/nvidia-smi/nvcc - health.rs: cached GPU health polling every 5s - api.rs: GET /discovery and GET /health axum handlers - main.rs: CLI entrypoint with --port flag (default 9090) - harness stubs for mistralrs (Phase 8) and llamacpp (Phase 11) 12 new tests (9 unit + 3 integration), 35 total. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
|||
|
0da68833af
|
feat: scaffold cortex workspace
Rust reverse-proxy for multi-node mistral.rs inference clusters. Includes crate structure (cortex-core, cortex-gateway, cortex-agent, cortex-cli), config loading, OpenAI/Anthropic translation stubs, model routing, eviction, polling, and streaming proxy scaffolding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |