helexa

Author	SHA1	Message	Date
rob thijssen	d4e1b05956	feat(neuron,cortex-core): source-aware loader (scheme:org/name) All checks were successful CI / CUDA type-check (push) Successful in 46s Details CI / Format (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 42s Details CI / Clippy (push) Successful in 2m40s Details build-prerelease / Build cortex binary (push) Successful in 4m23s Details CI / Test (push) Successful in 5m28s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 5m39s Details build-prerelease / Package cortex RPM (push) Successful in 1m19s Details build-prerelease / Build neuron-ampere (push) Successful in 7m53s Details build-prerelease / Build neuron-ada (push) Successful in 5m18s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s Details Phase 1 of plan-source-aware-loader-preflight. Makes neuron's loader treat `huggingface:org/name` and `helexa:org/name` as first-class distinct sources with per-source endpoint + cache, while staying backwards-compatible with bare `org/name` ids. Zero behavior change for existing operator configs. Motivation: helexa is adding an EU-hosted registry (`registry.helexa.ai`) alongside HF. Both speak HF-compatible wire format, but the bytes, jurisdiction, trust root, and cache namespace are distinct. The loader needs to disambiguate which registry serves a given model id, and to keep their caches from colliding on disk when both happen to host the same `org/name`. What lands: - `cortex-core::source` — new module. `ModelSourceId { scheme, org, name }` with `FromStr` accepting both `scheme:org/name` and bare `org/name`. `Display` round-trips. `repo_path()` emits the `org/name` half for the hf-hub `Api::model(...)` call regardless of which scheme/endpoint we're hitting. Rejects malformed input with typed `ParseError` variants (empty scheme, missing slash, scheme with `/`, name with `:`, etc.). - `neuron::config::CandleHarnessConfig` gains `default_source: Option<String>` and `sources: HashMap<String, SourceConfig>`. `SourceConfig` mirrors what `hf_hub::ApiBuilder` consumes: endpoint URL, optional `auth_env` (env var name read at startup so secrets stay out of TOML), and optional cache_dir. Defaults synthesise a `huggingface` entry pointing at `https://huggingface.co` with the legacy `hf_cache` field as its cache_dir — so existing configs that only set `hf_cache` keep working unchanged. - `CandleHarness::new(bind_url, &CandleHarnessConfig)` replaces `CandleHarness::new(bind_url, hf_cache)`. Resolves every configured source's auth env var and cache dir up front so `hf_api_for(scheme)` is a pure HashMap lookup on the hot load path. Only the `huggingface` scheme gets the legacy `HF_HUB_CACHE`/`HF_HOME` env-var fallback chain; other schemes resolve to whatever the operator typed. - `hf_api()` -> `hf_api_for(scheme)`. Builds an `hf_hub::Api` with the source's endpoint, cache_dir, and auth token. Errors with a useful message naming the configured schemes when an unknown scheme is requested. - `CandleHarness::load_model` parses `spec.model_id` into a `ModelSourceId`, substitutes `default_source` for bare ids, and threads the parsed source through `preflight`, `resolve_files`, `resolve_dense_files`, `load_arch_gguf`, `load_arch_dense`, and `load_tp`. The hf-hub `Api::model()` call now uses `source_id.repo_path()` so registry calls hit the right URL shape regardless of scheme. - `preflight()` signature gains a `&ModelSourceId` parameter (it's the canonical id for log lines and error display); `RepoFetchFailed.model_id` etc. now carry the scheme-qualified form so operator-visible errors echo exactly what was configured. - `neuron.example.toml` documents the new `[harness.candle.sources.*]` table with commented-out examples for `huggingface` (explicit override) and `helexa`. Tests: - 13 new unit tests in `cortex-core::source` covering parse / display round-trip, default-scheme substitution semantics, and every `ParseError` variant. - 6 new unit tests in `neuron::config` covering the `effective_sources` synth (legacy `hf_cache` carry-through, explicit override preservation, helexa-alongside-huggingface) and `effective_default_source` fallback. - 2 new unit tests in `harness::candle::tests` covering multi-scheme `hf_api_for` routing, including the "unknown scheme" error path naming configured schemes. - Preflight integration tests updated to construct `ModelSourceId` and assert against the scheme-qualified error form. CI gate: cargo fmt --check, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace (all 24 test groups ok, zero failures). Out of scope (Phase 3): - Cortex catalogue `source` field — independent of Phase 1+2, ships when the registry comes online. - `helexa` source endpoint itself — separate project; this PR adds the client-side rails only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 13:42:11 +03:00
rob thijssen	957f704efa	feat(neuron): OpenAI Responses API + ci cuda-check runner label Some checks failed build-prerelease / Package cortex RPM (push) Blocked by required conditions Details CI / CUDA type-check (push) Failing after 11s Details build-prerelease / Resolve version stamps (push) Successful in 30s Details CI / Format (push) Successful in 32s Details CI / Clippy (push) Successful in 2m31s Details build-prerelease / Build cortex binary (push) Successful in 4m32s Details CI / Test (push) Successful in 5m42s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 6m8s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details Step 2 of the Responses rollout: native `/v1/responses` endpoint on neuron that consumes the same InferenceEvent stream as `/v1/chat/completions` but emits it as the Responses API's named SSE event family. No gateway-side translation. ## Surface - `cortex-core::responses` envelope types: `ResponsesRequest`, `ResponsesInput` (text \| items), `ResponsesInputItem` (message \| function_call \| function_call_output \| reasoning), `ResponsesContentPart` (input_text \| input_image \| output_text), `ResponsesResponse`, `ResponsesOutputItem`, `ResponsesUsage`. Plus a `events::*` constant module so the projector and the wire shape stay in sync without string-typos. - `neuron::wire::openai_responses`: - `request_to_chat(req)` flattens Responses input + instructions into a `ChatCompletionRequest` the candle harness already understands. Text-only Parts collapse to a string; mixed text+image Parts go to chat's content-array shape; reasoning items drop; function_call / function_call_output round-trip via tool_calls / tool_call_id metadata so the surface is consistent for the day the harness emits tool calls. - `project_responses_stream(rx, meta)` reads InferenceEvents and emits the eight named events that compose a Responses stream: response.created → output_item.added → content_part.added → output_text.delta×N → output_text.done → content_part.done → output_item.done → response.completed. Synthesises start frames if the producer skips Start (poisoned model, early disconnect) so the stream stays coherent. - `build_response(meta, text, reason, usage)` for the non-streaming path. - `CandleHarness::inference_stream(req)` extracted from `chat_completion_stream`, returning a typed `InferenceStream` (event receiver + id/created/model_id metadata). Both `chat_completion_stream` and the new `responses_stream` are now thin wrappers that pick their wire projection. TP path got the same treatment (`chat_completion_tp_stream` → `inference_tp_stream`). - `POST /v1/responses` route on neuron. Non-streaming returns one buffered `ResponsesResponse`; streaming returns axum SSE with both event names and JSON data per frame (Responses, unlike chat completions, uses named `event:` lines). Reused `inference_error_response` helper hoisted out so the chat and responses handlers share the InferenceError → HTTP mapping. ## CI Also bundles the `cuda-check` runner-label fix from feedback on commit `1859777`: `runs-on: rpm` doesn't ship the CUDA toolkit so cudarc's nvcc-version build script blew up. Switched to `runs-on: cuda-13.0` per the existing labels. ## Scope cuts (documented in the modules) - `previous_response_id` rejected at translate time with 400 (`code: chained_conversation_not_supported`) — stateful chained conversations need a persistence layer we haven't built. - Reasoning items dropped (no Qwen3 `<think>` routing yet). - Single output item per response (one `"message"` carrying text); `function_call` items reserved but not synthesised. - Streaming events cover the core set; `response.in_progress` and the web_search / image_generation event families are out-of-scope. 22 new tests: 5 in cortex-core (envelope round-trips), 13 in neuron::wire (request translator + projector + non-streaming builder), 4 in neuron's tests/api.rs (route surface — 503 when no candle, 400 on previous_response_id, 404 on missing model for both stream and non-stream). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 11:13:44 +03:00
rob thijssen	e42e8ee81f	refactor: cortex talks to neurons instead of mistral.rs directly All checks were successful CI / Format, lint, build, test (push) Successful in 2m46s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details Replace NodeConfig (static vram_mb, pinned) with NeuronEndpoint. Hardware discovery and model pinning now come from neuron API and models.toml catalogue respectively. - config.rs: nodes -> neurons, add models_config path - catalogue.rs: ModelProfile with pinned_on, ModelCatalogue - poller.rs: poll neuron GET /models (ModelInfo format) - router.rs: resolve inference endpoint via neuron GET /models/{id}/endpoint - evictor.rs: call neuron POST /models/unload - node.rs: remove vram_mb, pinned fields (come from discovery/catalogue) - All 22 gateway tests updated to mock neuron API - Remove MistralModelsResponse, ModelLifecycleRequest (no longer needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:42:52 +03:00
rob thijssen	6dc717ebcd	feat: add neuron daemon with GPU discovery and health endpoints All checks were successful CI / Format, lint, build, test (push) Successful in 2m29s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details Replace cortex-agent stub with neuron (cortex-neuron binary). cortex-core additions: - discovery.rs: DeviceInfo, DiscoveryResponse, DeviceHealth, HealthResponse - harness.rs: Harness async trait, HarnessConfig, ModelSpec, ModelInfo neuron crate (crates/neuron/): - discovery.rs: nvidia-smi CSV parsing (pure functions) + system discovery via uname/nvidia-smi/nvcc - health.rs: cached GPU health polling every 5s - api.rs: GET /discovery and GET /health axum handlers - main.rs: CLI entrypoint with --port flag (default 9090) - harness stubs for mistralrs (Phase 8) and llamacpp (Phase 11) 12 new tests (9 unit + 3 integration), 35 total. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:23:42 +03:00
rob thijssen	0da68833af	feat: scaffold cortex workspace Rust reverse-proxy for multi-node mistral.rs inference clusters. Includes crate structure (cortex-core, cortex-gateway, cortex-agent, cortex-cli), config loading, OpenAI/Anthropic translation stubs, model routing, eviction, polling, and streaming proxy scaffolding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 18:13:30 +03:00

5 Commits