cortex

Author	SHA1	Message	Date
rob thijssen	60f5598542	build(neuron): bump cudarc fork to 63327a2 (idempotent abort + Comm Send+Sync) Some checks failed build-prerelease / Resolve version stamps (push) Successful in 29s Details CI / CUDA type-check (push) Successful in 31s Details CI / Format (push) Successful in 35s Details CI / Test (push) Failing after 1m9s Details CI / Clippy (push) Successful in 2m36s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 6m10s Details build-prerelease / Build neuron-ampere (push) Successful in 7m35s Details build-prerelease / Build neuron-ada (push) Successful in 5m7s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m53s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m14s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m48s Details build-prerelease / Build cortex binary (push) Successful in 4m33s Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s Details The fork's new commit makes `Comm: Send + Sync` (asserting NCCL's thread-safety invariant upstream) and makes `Comm::abort` idempotent via an `aborted` flag (so abort-then-Drop can't double-free) — strictly better than the previous Drop-no-panic workaround, and the `abort()` signature is unchanged so the watchdog call site is unaffected. Because `Comm` is now `Send + Sync`, `Arc<Comm>` and the `SendComm` / `NcclState` wrappers auto-derive `Send`/`Sync`, which conflicts (E0119) with neuron's manual `unsafe impl`s. Remove the four now-redundant impls — the safety assertion lives upstream in cudarc where it belongs. The conflict is in cuda-gated code, so only the CUDA type-check catches it (non-cuda build + clippy + tests stay green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 16:33:14 +03:00
rob thijssen	99920dd322	feat(neuron): TP step watchdog aborts wedged collectives (#17 Stage 2) Some checks failed CI / CUDA type-check (push) Failing after 47s Details CI / Format (push) Successful in 31s Details CI / Test (push) Failing after 1m3s Details CI / Clippy (push) Successful in 2m44s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Make a hung NCCL collective recoverable instead of a permanent brick. Today a wedged collective hangs the in-process leader thread forever, and even Stage 1's recovery can't help — its unload's DropTp queues behind the stuck thread and hangs too. - Cache the leader's NCCL Comm handle async-side at init (new cuda-gated Job::GetLeaderComm → DeviceWorkerHandle::get_leader_comm → stored on WorkerPool.leader_comm). Fetched while the thread is responsive — a wedged thread can't service the fetch, which is why it's cached up front. - Wrap the leader forward in both generate_step and generate_step_with_images in tokio::time::timeout (default 120s, NEURON_TP_STEP_TIMEOUT_S). On expiry the watchdog calls Comm::abort() (ncclCommAbort) on the cached handle from the async thread — the one NCCL op sanctioned concurrently with an in-flight collective — which unblocks the leader thread, then fails the step WITHOUT draining (workers are wedged too; recovery's unload kills them). The error is a device fault → poison → Stage 1 auto-recovery, which now completes because the leader thread is responsive again. - Bumps the cudarc patch to dbc425a (adds the Drop-must-not-panic fix so the post-abort comm teardown during recovery doesn't double-abort-panic). Logs the whole sequence at ERROR with greppable `tp watchdog:` / `ncclCommAbort` markers so a real-world hang leaves a forensic trail — verification is by inspecting journals after real hangs, not a synthetic harness. cuda-gated → validated by the blackwell build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 14:15:29 +03:00
rob thijssen	c4f239ceb9	build(neuron): patch cudarc to expose Comm::abort/get_async_error (#17 Stage 2) All checks were successful CI / CUDA type-check (push) Successful in 33s Details CI / Format (push) Successful in 35s Details CI / Clippy (push) Successful in 2m34s Details CI / Test (push) Successful in 6m1s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details #17 Stage 2 (TP hang-recovery) needs to call ncclCommAbort on a LIVE communicator from another thread — to unblock a collective wedged on a dead/hung peer so the ranks can resync. No cudarc release (incl. main) exposes this: the safe Comm only aborts in Drop, which can't fire while a stuck thread holds an Arc<Comm> clone. Pin neuron's cudarc 0.19.7 to a fork (grenade/cudarc @ nccl-comm-abort, rev 4dff0be) adding three thin methods — Comm::abort, get_async_error, and a raw comm() accessor — to be submitted upstream. The patch targets 0.19.x only; candle's transitive cudarc 0.17.8 stays on crates.io. Foundation only; the watchdog + abort + comm-rebuild that consume these land in follow-up commits (cuda-gated → validated by the blackwell build). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 13:49:59 +03:00
rob thijssen	e23d5011d0	feat(helexa-acp): scaffold ACP bridge with provider trait + OpenAI chat Adds a new workspace crate `helexa-acp` (binary, Apache-2.0) — the start of "the missing ACP binary" for multi-endpoint LLM setups mixing public APIs, private LAN deployments, and various wire formats. Today it speaks OpenAI /v1/chat/completions; the Provider trait is the seam that lets OpenAI Responses, Anthropic /v1/messages, and other wire formats slot in later without touching the agent loop. The crate is intentionally self-contained — no dependencies on the other workspace crates (cortex-core, cortex-gateway, neuron) — so a future migration to a dedicated GitHub repo is a Cargo.toml-only change. All deps come from crates.io. This commit lands: * `config.rs` — TOML config at $XDG_CONFIG_HOME/helexa-acp/config.toml with multi-endpoint support (each `[[endpoints]]` declares its name, base_url, wire_api, default_model, optional API key / api_key_env). Falls back to env-only single-endpoint config when no TOML exists (HELEXA_ACP_BASE_URL, HELEXA_ACP_MODEL, etc.). The `endpoint:model` selector syntax is validated and tested. * `provider/mod.rs` — `Provider` trait + provider-agnostic types (`CompletionRequest`, `CompletionEvent`, `Message`, `ToolCall`, `ToolSpec`, `Role`, `UsageStats`). Agent loop consumes these without knowing the wire format on the other side. * `provider/openai_chat.rs` — `OpenAIChatProvider` impl. Compatible with cortex, LM Studio, Ollama (compat mode), OpenRouter, OpenAI itself. Streams via reqwest + eventsource-stream + async-stream. Surfaces text deltas, reasoning deltas (for models that emit `reasoning_content`), tool-call lifecycle (start, args-delta, completion), usage, finish reason. Cancellation-token aware. * `main.rs` — tokio + stderr-only tracing-subscriber + Stdio transport. Builds a provider per configured endpoint at startup, surfacing config mistakes before the editor even initializes. Currently responds to `initialize`; everything else stubs to `not implemented yet` until the agent loop lands in the next commit. 12 unit tests pass — encoder shape, decoder shape (text-only, tool-call progressive, cancellation, malformed-chunk recovery), config parsing (multi-endpoint TOML, env fallback, validation). The `#![allow(dead_code)]` on `provider/mod.rs` is temporary — the agent loop in the next commit reads every field. It's noted in the module-level docstring so the next reader knows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 08:13:47 +03:00
rob thijssen	3cccc2c56b	refactor(neuron): cut mistralrs/llamacpp, scaffold candle harness Stage 1 of the candle-native pivot. Replaces the external-process harness model (mistralrs over HTTP, llamacpp placeholder) with an in-process Harness trait whose sole implementation is candle. The trait keeps its shape so future engines slot in additively, but start/stop default to no-ops and HarnessConfig drops endpoint and systemd_unit since no harness needs external supervision. Behaviour is unchanged on the wire: load_model returns a "not implemented yet (Stage 2)" error and list_models is empty. The gateway-side proxy, poller, and router are untouched. CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are marked superseded; the staged plan lives in ~/.claude/plans/create-a-more-aggressive-calm-naur.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:53:04 +03:00
Gitea Actions	b9d8e30058	chore: bump version to 0.1.16	2026-04-16 15:04:21 +00:00
Gitea Actions	9bf987888c	chore: bump version to 0.1.14	2026-04-16 16:57:24 +03:00
Gitea Actions	357f858a29	chore: bump version to 0.1.12	2026-04-16 15:47:21 +03:00
Gitea Actions	7ece281617	chore: bump version to 0.1.10	2026-04-16 15:06:18 +03:00
Gitea Actions	9fa51ad874	chore: bump version to 0.1.8	2026-04-16 10:56:07 +00:00
Gitea Actions	2ce1060cb8	chore: bump version to 0.1.7	2026-04-16 13:25:34 +03:00
Gitea Actions	52c8b4c983	chore: bump version to 0.1.5	2026-04-16 13:01:42 +03:00
Gitea Actions	f161412f91	chore: bump version to 0.1.3	2026-04-16 11:41:11 +03:00
Gitea Actions	7c60af3464	chore: bump version to 0.1.2	2026-04-16 11:03:29 +03:00
rob thijssen	6dc717ebcd	feat: add neuron daemon with GPU discovery and health endpoints All checks were successful CI / Format, lint, build, test (push) Successful in 2m29s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details Replace cortex-agent stub with neuron (cortex-neuron binary). cortex-core additions: - discovery.rs: DeviceInfo, DiscoveryResponse, DeviceHealth, HealthResponse - harness.rs: Harness async trait, HarnessConfig, ModelSpec, ModelInfo neuron crate (crates/neuron/): - discovery.rs: nvidia-smi CSV parsing (pure functions) + system discovery via uname/nvidia-smi/nvcc - health.rs: cached GPU health polling every 5s - api.rs: GET /discovery and GET /health axum handlers - main.rs: CLI entrypoint with --port flag (default 9090) - harness stubs for mistralrs (Phase 8) and llamacpp (Phase 11) 12 new tests (9 unit + 3 integration), 35 total. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:23:42 +03:00
rob thijssen	6bb3004cfc	ci: add Gitea CI, RPM spec, license, and repo hygiene All checks were successful CI / Format, lint, build, test (push) Successful in 2m15s Details CI / Build SRPM (push) Has been skipped Details CI / Publish to COPR (push) Has been skipped Details - Add .gitea/workflows/ci.yml with fmt/clippy/test on all branches and SRPM build + COPR publish on version tags - Add cortex.spec for Fedora RPM packaging - Add GPL-3.0-or-later LICENSE file - Add cortex.example.toml with generic hostnames; gitignore cortex.toml - Scrub infrastructure-specific hostnames from README.md, CLAUDE.md, and doc comments - Fix unused imports and clippy warnings to pass -D warnings - Fix missing deps (bytes, reqwest, serde_json) exposed during build - Run cargo fmt across workspace - Update SPDX license identifier to GPL-3.0-or-later Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 18:24:04 +03:00
rob thijssen	0da68833af	feat: scaffold cortex workspace Rust reverse-proxy for multi-node mistral.rs inference clusters. Includes crate structure (cortex-core, cortex-gateway, cortex-agent, cortex-cli), config loading, OpenAI/Anthropic translation stubs, model routing, eviction, polling, and streaming proxy scaffolding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 18:13:30 +03:00

17 Commits