cortex

Author	SHA1	Message	Date
rob thijssen	1b0e36c119	fix(neuron): cover TpForwardLogitsWithImages in drain_poisoned match All checks were successful CI / CUDA type-check (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 37s Details CI / Format (push) Successful in 37s Details CI / Clippy (push) Successful in 2m41s Details build-prerelease / Build cortex binary (push) Successful in 4m18s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m48s Details build-prerelease / Package cortex RPM (push) Successful in 1m32s Details CI / Test (push) Successful in 6m20s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-ampere (push) Successful in 8m26s Details build-prerelease / Build neuron-ada (push) Successful in 5m21s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m56s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m5s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m0s Details The CUDA type-check caught a non-exhaustive match: drain_poisoned() must reply an error to every Job variant's reply channel, including the new cuda-gated TpForwardLogitsWithImages. The non-cuda build couldn't see it — the variant is #[cfg(feature = "cuda")], so the match is exhaustive without it on CPU. Refs TP-vision plan Stage 2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 15:26:46 +03:00
rob thijssen	ed2d09864e	feat(neuron): TP-vision Stage 3 — wire TP chat + stream vision prefill Some checks failed CI / Format (push) Successful in 30s Details CI / Clippy (push) Successful in 2m51s Details CI / Test (push) Successful in 5m52s Details CI / CUDA type-check (push) Failing after 50s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details End-to-end TP-vision: an image request to a TP-loaded Qwen3.6-27B now conditions on the image across both ranks. - TpLoadedModel carries has_vision / image_token_id / lm_tokens_per_image, populated at load via the shared VisionMeta::from_config_path (same config.json the shards loaded from; Stage 1 materialises the replicated tower on every rank). - LoadedHandle::capabilities() now advertises "vision" for TP loads with a tower (cortex-gateway already unions this into /v1/models via C3). - The TP rejection guards (chat_completion_tp + inference_tp_stream) are now conditional on !has_vision — text-only TP models still 400 cleanly, vision-capable ones fall through. - chat_completion_tp_inner and the streaming orchestration task detect images (request_has_images), expand <\|image_pad\|> to the per-image patch count, and run a single-shot generate_step_with_images prefill (every rank encodes + splices its replicated tower) before the unchanged decode loop. Text requests keep chunked_prefill_tp. - extract_image_data_uris ships the source data URIs to every rank for identical per-rank preprocessing. prompt_tokens now reflects the patch expansion, so usage accounting and KV offsets match the single-GPU baseline. TP entry points are cuda-gated (validated by CI's CUDA type-check); capabilities() + extract_image_data_uris + VisionMeta reuse compile on the non-cuda build. Full workspace test green. Refs TP-vision plan Stage 3. Implements #12. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 15:14:44 +03:00
rob thijssen	4994b94c84	feat(neuron): TP-vision Stage 2 — per-rank image RPC + worker plumbing Carry image content through the TP forward path so every rank encodes and splices locally (replicated tower, no embedding broadcast). - rpc.rs: new WorkerRequest::GenerateStepWithImages carrying the source image data URIs + image_token_id for the single-shot vision prefill; worker still replies GenerateStepOk. Round-trip test added. - tp_qwen3_5.rs: TpQwen3_5ForCausalLM::forward_with_images — encode each preprocessed image through the rank's replicated tower, cat, splice, forward. Shared by leader and worker so every rank runs identical work. - tp/mod.rs: TpLeaderModel::forward_with_images and WorkerPool::generate_step_with_images (mirrors generate_step: fan out GenerateStepWithImages to subprocess ranks, run the leader's image forward on its device worker thread, drain, combine). - worker.rs: WorkerModel::forward_with_images + handle_generate_step_with_images — each subprocess rank preprocesses the same data URIs via the shared deterministic preprocess_data_uri, encodes, splices, forwards. - device_worker: Job::TpForwardLogitsWithImages + tp_forward_logits_with_images dispatch handler + DeviceWorkerHandle::tp_forward_logits_with_images. Determinism: every rank runs the same preprocess on the same source URIs through the same replicated tower, so the spliced hidden state matches across ranks — preserving the replicated-hidden-state invariant the row-parallel AllReduce relies on, with no NCCL broadcast. No caller yet — Stage 3 wires the TP chat/stream entry points to invoke generate_step_with_images for image prefill. cuda-gated plumbing covered by CI's CUDA type-check; rpc/route/forward_with_images compile on the non-cuda build. Refs TP-vision plan Stage 2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 15:08:08 +03:00
rob thijssen	9a24b05866	feat(neuron): TP-vision Stage 1 — replicated vision tower on the TP model Load the full, unsharded model.visual.* vision tower on every TP rank (leader + each subprocess worker mmaps the same local safetensors) when config.vision_config is present. VisionTower::load already takes a ShardedVarBuilder whose plain .get() returns the full replicated tensor, so the tower loads identically regardless of world_size — no sharding, no NCCL broadcast. - TpQwen3_5ForCausalLM gains vision: Option<VisionTower> + image_token_id, plus has_vision/image_token_id/encode_image/forward_with_vision, mirroring the single-GPU Qwen3_5ForCausalLM wrapper. - TpQwen3_5Model::forward_with_vision mirrors the single-GPU forward_inner splice: embed locally, replace rows at image_token_id positions, run the sharded decoder stack. Because every rank encodes the same pixels through its replicated tower, the spliced input embeddings are identical across ranks — preserving the TP replicated-hidden-state invariant the row-parallel AllReduce relies on. - splice_runs is now pub(crate) and shared with the TP model. No caller yet — Stage 2 wires the RPC/worker path that invokes encode_image + forward_with_vision per rank. Most of this compiles on the non-cuda build (only the cuda load variant's tower line is gated); CI's CUDA type-check covers the rest. Refs TP-vision plan Stage 1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 15:00:05 +03:00
rob thijssen	7bb033b4ed	chore: untrack stray .claude/scheduled_tasks.lock and gitignore .claude/ All checks were successful CI / CUDA type-check (push) Successful in 32s Details CI / Format (push) Successful in 30s Details build-prerelease / Resolve version stamps (push) Successful in 30s Details CI / Clippy (push) Successful in 2m45s Details build-prerelease / Build cortex binary (push) Successful in 4m28s Details CI / Test (push) Successful in 6m6s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m11s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m28s Details build-prerelease / Build neuron-ampere (push) Successful in 8m1s Details build-prerelease / Build neuron-ada (push) Successful in 8m9s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m54s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m54s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m52s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s Details A runtime scheduler lock was accidentally swept into the previous commit by `git add -A`. Remove it from tracking (file stays on disk) and ignore the whole `.claude/` dir so local agent runtime state never lands in the repo again. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 14:55:05 +03:00
rob thijssen	f8c0da0ebf	fix(neuron): TP-vision Stage 0 — reject image requests on the TP path Some checks failed build-prerelease / Resolve version stamps (push) Waiting to run Details CI / Format (push) Waiting to run Details CI / CUDA type-check (push) Successful in 32s Details build-prerelease / Build cortex binary (push) Has been cancelled Details build-prerelease / Build neuron-blackwell (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package cortex RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Build cortex SRPM (push) Has been cancelled Details CI / Build neuron SRPM (push) Has been cancelled Details CI / Publish cortex to COPR (push) Has been cancelled Details CI / Publish neuron to COPR (push) Has been cancelled Details CI / Bump version in source (push) Has been cancelled Details The TP inference path has no vision tower, and the TP dispatch in chat_completion / inference_stream returns before the VisionUnsupported guard runs — so an image request to a TP-loaded model (e.g. beast's tp=2 Qwen3.6-27B) was silently dropped and answered from text alone, the exact issue-#3 confident-hallucination pattern Stage C killed for single-GPU. Add the request_has_images → VisionUnsupported guard to both chat_completion_tp and inference_tp_stream, before prefill / before the SSE stream opens, so beast returns a clean 400 vision_unsupported. The guard is unconditional for now (TP has no tower); Stage 3 makes it conditional on the TP model's has_vision once real TP-vision lands. Detection is covered by the existing request_has_images unit test; the guard itself is cuda-gated (validated by CI's CUDA type-check). Refs TP-vision plan Stage 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 14:53:56 +03:00
rob thijssen	dd592d918d	test(neuron): C2 — guard Responses→chat image translation contract All checks were successful CI / CUDA type-check (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 39s Details CI / Format (push) Successful in 44s Details CI / Clippy (push) Successful in 2m51s Details build-prerelease / Build cortex binary (push) Successful in 4m42s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m52s Details CI / Test (push) Successful in 6m16s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-ampere (push) Successful in 8m12s Details build-prerelease / Package cortex RPM (push) Successful in 1m26s Details build-prerelease / Build neuron-ada (push) Successful in 5m34s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m2s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s Details The Responses request translator already emits the chat `image_url` Parts array Stage B5's vision path consumes, and the non-streaming (`chat_completion`) and streaming (`responses_stream` → `inference_stream`, Stage C1) Responses paths both route image content to the vision-aware prefill — so vision works end-to-end through `/v1/responses` with no translator change required. Add a multi-image test asserting order preservation and that the `detail` hint is tolerated (and dropped, since chat image_url has no analogue), locking the translator's output to the exact `image_url.url` shape `extract_images_from_request` walks. Closes part of #16 (Stage C2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 13:57:43 +03:00
rob thijssen	766c20ba47	feat(neuron): C1 — streaming SSE chat completion with vision The streaming worker path now splices image embeddings on prefill, closing the silent text-only degrade for `stream=true` image requests. `inference_stream` gains the same vision-routing block as the non-streaming `chat_completion`: detect `image_url` content, reject it against text-only models with `VisionUnsupported` (before any SSE frame is sent), preprocess each image and expand its `<\|image_pad\|>` sentinel to the per-image patch count, then carry the payload through dispatch. Rather than duplicate the 75-line `route_token!` reasoning/tool-call state machine into a sibling streamer, `stream_inference_via_worker` takes an `Option<(Vec<ImageInput>, u32)>`: when `Some`, prefill is a single-shot `forward_logits_with_images` splice; when `None`, the original chunked text-only prefill. Image embeddings are prefill-only, so every decode step stays on the plain `forward_logits` path and the shared decode loop is untouched. This keeps exactly one copy of the tool-call/reasoning logic to maintain. The Responses API streaming path (`responses_stream`) inherits vision for free since it drives the same `inference_stream`. Unit test covers `request_has_images` (the shared routing gate); the real-weights SSE smoke is the manual curl on beast (cuda-integration). Closes part of #16 (Stage C1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 13:57:02 +03:00
rob thijssen	4972c7d1e7	feat(cortex-gateway): C3 — propagate vision capabilities through /v1/models ModelEntry and CortexModelEntry gain a `capabilities: Vec<String>` field (serde-default for back-compat). The poller copies it verbatim from each neuron's ModelInfo.capabilities; list_models computes the union across every node where a model is loaded so a checkpoint loaded text-only on one neuron and text+vision on another reports both to the fleet. Catalogue-only and mid-prewarm entries default to empty until the catalogue gains a capabilities declaration. Aliases inherit their target's capability union. New gateway test mocks two nodes with differing capability arrays and asserts the unioned /v1/models response. Closes part of #16 (Stage C3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 13:49:54 +03:00
rob thijssen	a26bb9f04b	feat(deploy): capture service startup journal after each restart After both `Start cortex.service` and `Start neuron.service`, sleep 10s and run `journalctl --unit <unit> -I --no-pager` to record the latest invocation's log in the workflow output. Step is guarded by `if: always()` so a failed start still leaves a usable trace. infra-setup.sh now adds gitea_ci to the systemd-journal group during user provisioning, so `journalctl` works without a sudoers entry. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-02 16:48:56 +03:00
rob thijssen	ea1fdf8aa6	chore(deploy): drop deploy.sh and manifest.yml now that workflow runs First end-to-end run of the deploy workflow succeeded (gitea run #289), so the operator-run rolling-deploy script and its YAML manifest are no longer the source of truth — fleet topology lives in .gitea/workflows/deploy.yml and per-host config in script/infra-setup.sh. Per-host neuron config comments updated to point at the new sync path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-02 16:41:04 +03:00
rob thijssen	577781de8d	fix(neuron): derive Clone on ImageInput for the CUDA vision dispatch All checks were successful CI / CUDA type-check (push) Successful in 32s Details CI / Format (push) Successful in 34s Details build-prerelease / Resolve version stamps (push) Successful in 39s Details CI / Clippy (push) Successful in 2m47s Details build-prerelease / Build cortex binary (push) Successful in 4m34s Details CI / Test (push) Successful in 6m14s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 5m58s Details build-prerelease / Package cortex RPM (push) Successful in 1m22s Details build-prerelease / Build neuron-ampere (push) Successful in 8m5s Details build-prerelease / Build neuron-ada (push) Successful in 8m9s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m6s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m5s Details CUDA type-check in CI failed on commit `24968e9` with E0308: error[E0308]: mismatched types --> crates/neuron/src/harness/candle.rs:1707:33 1707 \| images.clone(), \| ^^^^^^^^^^^^^^ expected `Vec<ImageInput>`, found `&Vec<ImageInput>` In Stage B5 the cuda branch of `chat_completion` matches `&vision_route` to keep the `vision_route: Option<...>` alive for both arms, which makes `images` bind as `&Vec<ImageInput>`. The subsequent `images.clone()` call doesn't deep-clone because `ImageInput` doesn't derive `Clone` — rustc falls back to cloning the `&Vec` reference, which has the wrong type for the worker job. The CPU build (non-cuda) compiled fine because that branch is behind `#[cfg(feature = "cuda")]`; the cuda-check job is what catches the regression. Fix: derive `Clone` on `ImageInput`. The clone cost is one pixel-buffer memcpy per image (~2.4 MiB at fixed 448×448), which is fine on the chat-completion hot path — vision requests are rare per second relative to text-only decode. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-02 15:51:57 +03:00
rob thijssen	24968e9233	feat(neuron): Stage B — end-to-end text+image chat for Qwen3.6 Some checks failed build-prerelease / Resolve version stamps (push) Successful in 31s Details CI / Format (push) Successful in 33s Details CI / CUDA type-check (push) Failing after 46s Details CI / Clippy (push) Successful in 2m37s Details build-prerelease / Build cortex binary (push) Successful in 4m32s Details build-prerelease / Build neuron-blackwell (push) Failing after 5m35s Details CI / Test (push) Successful in 6m40s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-ampere (push) Failing after 7m46s Details build-prerelease / Package cortex RPM (push) Successful in 1m22s Details build-prerelease / Build neuron-ada (push) Failing after 4m51s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped Details Stage B of the vision plan (doc/vision-qwen3_6-spec.md). Wires the vision tower from Stage A through to a complete non-streaming chat completion: extract images from the request, preprocess, encode on the worker thread, splice embeddings into the LM input at `<\|image_pad\|>` positions, return coherent text response with `prompt_tokens` reflecting patch tokens. Closes the silent-drop class of failures from issue #3 — vision requests against Qwen3.6 now condition the model on the image instead of producing confident text-only hallucinations. Streaming for vision is Stage C. Deferred items tracked under #12 (TP-vision), #13 (27B production), #14 (dynamic resolution), #15 (numerical validation). What landed: - B1 — `Qwen3_5Model::forward_with_vision`: text-only `forward` unchanged; new method takes `(input_ids, offset, image_embeds, image_token_id)`, embeds tokens, locates `image_token_id` positions, splices via the new `splice_runs` helper. MRoPE applies text-positions to image tokens for Stage B (spatial MRoPE is the issue #15 numerical-validation follow-up). 2 unit tests for `splice_runs` covering contiguous + non-contiguous runs. - B2 — `ModelArch::forward_with_vision` dispatch: routes Qwen3_5Dense to the new method; other arches return an error. Defence-in-depth — the HTTP layer (B6) already rejects image content for non-vision models. - B3 — `Job::ForwardLogitsWithImages`: new worker variant carrying tokens + per-image `(pixels, c, h, w)` payloads. The dispatcher encodes each image (device-resident), concatenates the resulting embeddings, calls `arch.forward_with_vision`, and returns CPU logits. Image embeddings never copy back to CPU — the "tensors don't escape the worker" invariant from the per-device worker refactor still holds. Poisoned-worker drain path handles the new variant. - B4 — Prompt builder: - `request_has_images` detects image content cheaply. - `extract_images_from_request(request, profile)` walks `MessageContent::Parts`, decodes data URIs, runs `harness::preprocess::preprocess` per image, returns `Vec<ImageInput>` in request order. - `expand_image_pad_tokens(input_ids, image_token_id, patches_per_image)` walks the tokenized prompt and replaces each `<\|image_pad\|>` (id 248056 for Qwen3.6) with N copies matching the per-image patch count. 4 unit tests. - `VisionMeta::from_config_path` peeks `config.json` at load time for `image_token_id`, vision_config patch/merge sizes, and derives `lm_tokens_per_image` for the Stage B fixed resolution. - B5 — `chat_completion` vision routing: detects image content, validates the loaded model has vision, expands the prompt, and calls a new `run_inference_with_images_via_worker` helper that does single-shot prefill + standard decode loop (KV cache holds the post-splice hidden states from prefill, so decode steps don't re-splice). Stage B skips chunked prefill for vision — at 448×448 fixed resolution the budget stays well under the activation-memory threshold. Long-vision chunking is Stage D follow-up. - B6 — `InferenceError::VisionUnsupported`: structured 400 with `code=vision_unsupported, model_id, suggestion` when an image request hits a non-vision model. Closes the agent0 failure mode where vision requests degraded silently. - B7 — `ModelInfo.capabilities`: per-model array (`["text"]` vs `["text", "vision"]`) in `/v1/models` and forwarded verbatim by cortex-gateway. Lets clients (litellm, agent0) gate image_url submission on the declared capability set. Optional in the wire format; defaults to empty for older clients. CI gate: cargo fmt --check, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace (all 28 test groups ok, 124 lib tests). New unit-test counts: +2 splice_runs, +4 expand_image_pad. Manual verification (after RPMs deploy on beast): curl http://hanzalova.internal:31313/v1/chat/completions \ -H 'Content-Type: application/json' \ -d "{\"model\":\"Qwen/Qwen3.6-27B\", \"messages\":[{\"role\":\"user\",\"content\":[ {\"type\":\"text\",\"text\":\"What's in this image?\"}, {\"type\":\"image_url\",\"image_url\":{\"url\":\"data:image/jpeg;base64,...\"}} ]}], \"max_tokens\":120}" \| jq Expect prompt_tokens > 196 (text + 196 patch tokens) and a response that references actual image content. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-02 15:33:00 +03:00
rob thijssen	7df84fed8f	feat(neuron): Stage A — vision tower load + preprocessor for Qwen3.6 All checks were successful CI / CUDA type-check (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 30s Details CI / Format (push) Successful in 28s Details CI / Clippy (push) Successful in 2m35s Details build-prerelease / Build cortex binary (push) Successful in 5m13s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m23s Details build-prerelease / Build neuron-ampere (push) Successful in 7m56s Details CI / Test (push) Successful in 7m11s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m19s Details build-prerelease / Build neuron-ada (push) Successful in 5m30s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m56s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 4m25s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s Details Stage A of the vision implementation plan (doc/vision-qwen3_6-spec.md). Builds the vision tower scaffolding that today's silent-drop failure mode (issue #3) needs — the Qwen3.6 ViT loads from `model.visual.`, runs forward producing post-merger LM-side image embeddings, and routes through the device worker via a new `Job::EncodeImage`. No LM splice yet — that's Stage B. Refs #3 (umbrella). Deferred sub-stages tracked as #12 (TP-vision), #13 (27B production deploy), #14 (dynamic resolution), #15 (numerical validation). What landed: - A0 — investigation: pulled config.json, preprocessor_config.json, chat_template.jinja, and safetensors index from beast's local Qwen3.6-27B cache. Documented in doc/vision-qwen3_6-spec.md with exact tensor shapes for every `model.visual.` weight. Confirms 27-block ViT with `hidden_size=1152`, `patch_size=16`, `spatial_merge_size=2`, `out_hidden_size=5120`. Vision tower lives in 2 of the 15 safetensors shards. - A1 — deps + scaffolding: added `image = "0.25"` (default- features off, PNG/JPEG/WebP/BMP/GIF) and `base64 = "0.22"` to crates/neuron/Cargo.toml. Created `harness::preprocess` and `harness::arch::qwen3_5::vision` modules. - A2 — preprocess.rs: `decode_data_uri` strips `data:image/...;base64,...` → image bytes → `image::DynamicImage` (rejecting `http(s)://` URLs to avoid SSRF/recursion); `preprocess` resizes to a fixed `PreprocessProfile::qwen3_6()` (448×448), normalises to `[-1, 1]` per the model's mean/std=0.5, emits row-major `(3, H, W)` f32. 9 unit tests covering data URI parse, decode failure paths, grayscale-to-RGB promotion, and the exact-value normalisation contract. - A3 — vision.rs: `VisionTower` struct with `patch_embed: Conv2d`, learned `pos_embed: Embedding`, 27 `VisionBlock`s (pre-LN + multi-head self-attention with fused QKV + GELU-tanh MLP + residuals), and `VisionMerger` (LayerNorm → 2×2 spatial concat → linear_fc1 → GELU-tanh → linear_fc2 to LM hidden_size). Includes the Conv3d→Conv2d fold trick documented at the top of the file — the published patch_embed.proj.weight is 5D `(1152, 3, 2, 16, 16)` but candle 0.10 has no Conv3d; for static images we sum-collapse the temporal axis. Video would need real Conv3d. 5 unit tests including the exact `gelu_pytorch_tanh` reference values from PyTorch. - A4 — wire vision into Qwen3_5ForCausalLM: extended `Config` with optional `vision_config: Option<VisionConfig>` and `image_token_id`; `Qwen3_5ForCausalLM::new` now loads the vision tower when present, exposes `has_vision()` and `vision()` so the HTTP layer can advertise capability and so the encode path can reach it. - A5 — device worker `Job::EncodeImage`: new job variant carrying CPU-side `(C, H, W)` pixels. Dispatch handler reconstructs the tensor on the worker's device, calls `arch.encode_image(image)`, copies the result back to CPU as flat `Vec<f32>`. Keeps the "tensors don't escape the worker" invariant. Poisoned-worker drain path handles the new variant. - A6 — dispatch round-trip test: `encode_image_routes_to_dispatch_ and_errors_on_unknown_handle` proves the channel/dispatch wiring works end-to-end via the CPU device worker (errors on unknown ArchHandle, which is the expected behaviour without a loaded model — real-weights validation happens in Stage B when the LM splice path exists). CI gate: cargo fmt --check, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace (all 28 test groups ok, zero failures). New test counts: +9 in preprocess, +5 in vision, +1 in device_worker. Out of scope (deferred): - LM-side splice of image embeddings at `<\|image_pad\|>` positions → Stage B. - Streaming SSE for vision-bearing chat completions → Stage C. - Reject `image_url` with HTTP 400 for non-vision models / advertise `capabilities` in /v1/models → Stage C. - TP-vision (#12), 27B production deploy (#13), dynamic resolution (#14), numerical validation (#15). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-02 11:40:47 +03:00
rob thijssen	5c520c7e90	feat(deploy): gitea workflow for rolling RPM deploys + host bootstrap Replace operator-run script/deploy.sh with a CI-driven rolling deploy: - .gitea/workflows/deploy.yml fires on build-prerelease success (and is re-runnable via workflow_dispatch). Cortex upgrades first on hanzalova.internal; the three neuron hosts upgrade in parallel under fail-fast: false so one failing host doesn't sink the rest. Concurrency-grouped to serialize overlapping deploys, never cancelling in-flight runs (a half-applied dnf transaction is worse than a stale deploy). - asset/sudoers.d/{cortex,neuron}-host.conf are the canonical source for the scoped privileges gitea_ci needs on each host kind, installed as /etc/sudoers.d/helexa_gitea_ci. URLs and = signs are backslash-escaped per sudoers reserved-character rules. - script/infra-setup.sh idempotently provisions the gitea_ci user, installs the runner pubkey, drops in the appropriate sudoers fragment with visudo verification, and syncs cortex.toml / models.toml / per-host asset/neuron/<short>.toml — config still ships from operator workstations rather than CI because the first two are gitignored. The CI-only secret is RSYNC_SSH_KEY (already configured for the repo); the matching pubkey is ~/.ssh/id_gitea_ci.pub on the operator's box. script/deploy.sh and asset/manifest.yml are left in place until the first end-to-end deploy workflow run succeeds, then removed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 14:58:23 +03:00
rob thijssen	d0292ed377	feat(cortex): catalogue source field + scheme-qualified /models/load Some checks failed CI / CUDA type-check (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 40s Details CI / Format (push) Successful in 40s Details CI / Test (push) Failing after 1m3s Details CI / Clippy (push) Successful in 2m43s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 6m13s Details build-prerelease / Build neuron-ampere (push) Successful in 7m31s Details build-prerelease / Build neuron-ada (push) Successful in 8m16s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m56s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m21s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s Details build-prerelease / Build cortex binary (push) Successful in 4m5s Details build-prerelease / Package cortex RPM (push) Successful in 1m30s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s Details Phase 3 of plan-source-aware-loader-preflight. Adds an optional `source` field to `ModelProfile` and threads it through the router's cold-load path so a profile pointing at the helexa registry forwards `helexa:<id>` to neuron's `/models/load` instead of leaving neuron to substitute its `default_source` (typically `huggingface`). Without this, an operator who declares `source = "helexa"` in models.toml would still see neuron fetch from HuggingFace — the catalogue → ModelSpec translation in `profile_to_spec` was dropping the scheme on the floor. What lands: - `cortex-core::catalogue::ModelProfile.source: Option<String>`. None is the default and preserves pre-Phase-3 behaviour. - `cortex-gateway::router::qualified_model_id(profile)` — small pure helper, extracted from `profile_to_spec` so it can be unit-tested. Empty-string `source` is treated as None so operators who blank out a previously-set value don't trip a scheme-with-no-scheme failure mode in neuron. - `models.example.toml` documents the new field with a commented-out helexa-scheme example pointing back at neuron.example.toml's matching sources block. Tests: - 2 new unit tests in `cortex-core::catalogue`: source-absent round-trip and source-present round-trip through TOML. - 3 new unit tests in `cortex-gateway::router`: pass-through when None, prefix when Some, pass-through on empty-string source. - ModelProfile literal in catalogue's existing test updated to carry `source: None`. CI gate: cargo fmt --check, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace (24 test groups ok, zero failures). Completes Phase 3. With Phases 1+2+3 landed: - neuron parses `scheme:org/name`, routes per-source hf-hub Api with disambiguated cache. - preflight returns structured errors before any device allocation. - cortex catalogue declares per-model source jurisdiction and forwards it to neuron. The registry itself (registry.helexa.ai service, MinIO, nginx, mirror fabric) is the next moving piece — landing under a separate project per the design discussion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 14:53:58 +03:00
rob thijssen	d4e1b05956	feat(neuron,cortex-core): source-aware loader (scheme:org/name) All checks were successful CI / CUDA type-check (push) Successful in 46s Details CI / Format (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 42s Details CI / Clippy (push) Successful in 2m40s Details build-prerelease / Build cortex binary (push) Successful in 4m23s Details CI / Test (push) Successful in 5m28s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 5m39s Details build-prerelease / Package cortex RPM (push) Successful in 1m19s Details build-prerelease / Build neuron-ampere (push) Successful in 7m53s Details build-prerelease / Build neuron-ada (push) Successful in 5m18s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s Details Phase 1 of plan-source-aware-loader-preflight. Makes neuron's loader treat `huggingface:org/name` and `helexa:org/name` as first-class distinct sources with per-source endpoint + cache, while staying backwards-compatible with bare `org/name` ids. Zero behavior change for existing operator configs. Motivation: helexa is adding an EU-hosted registry (`registry.helexa.ai`) alongside HF. Both speak HF-compatible wire format, but the bytes, jurisdiction, trust root, and cache namespace are distinct. The loader needs to disambiguate which registry serves a given model id, and to keep their caches from colliding on disk when both happen to host the same `org/name`. What lands: - `cortex-core::source` — new module. `ModelSourceId { scheme, org, name }` with `FromStr` accepting both `scheme:org/name` and bare `org/name`. `Display` round-trips. `repo_path()` emits the `org/name` half for the hf-hub `Api::model(...)` call regardless of which scheme/endpoint we're hitting. Rejects malformed input with typed `ParseError` variants (empty scheme, missing slash, scheme with `/`, name with `:`, etc.). - `neuron::config::CandleHarnessConfig` gains `default_source: Option<String>` and `sources: HashMap<String, SourceConfig>`. `SourceConfig` mirrors what `hf_hub::ApiBuilder` consumes: endpoint URL, optional `auth_env` (env var name read at startup so secrets stay out of TOML), and optional cache_dir. Defaults synthesise a `huggingface` entry pointing at `https://huggingface.co` with the legacy `hf_cache` field as its cache_dir — so existing configs that only set `hf_cache` keep working unchanged. - `CandleHarness::new(bind_url, &CandleHarnessConfig)` replaces `CandleHarness::new(bind_url, hf_cache)`. Resolves every configured source's auth env var and cache dir up front so `hf_api_for(scheme)` is a pure HashMap lookup on the hot load path. Only the `huggingface` scheme gets the legacy `HF_HUB_CACHE`/`HF_HOME` env-var fallback chain; other schemes resolve to whatever the operator typed. - `hf_api()` -> `hf_api_for(scheme)`. Builds an `hf_hub::Api` with the source's endpoint, cache_dir, and auth token. Errors with a useful message naming the configured schemes when an unknown scheme is requested. - `CandleHarness::load_model` parses `spec.model_id` into a `ModelSourceId`, substitutes `default_source` for bare ids, and threads the parsed source through `preflight`, `resolve_files`, `resolve_dense_files`, `load_arch_gguf`, `load_arch_dense`, and `load_tp`. The hf-hub `Api::model()` call now uses `source_id.repo_path()` so registry calls hit the right URL shape regardless of scheme. - `preflight()` signature gains a `&ModelSourceId` parameter (it's the canonical id for log lines and error display); `RepoFetchFailed.model_id` etc. now carry the scheme-qualified form so operator-visible errors echo exactly what was configured. - `neuron.example.toml` documents the new `[harness.candle.sources.*]` table with commented-out examples for `huggingface` (explicit override) and `helexa`. Tests: - 13 new unit tests in `cortex-core::source` covering parse / display round-trip, default-scheme substitution semantics, and every `ParseError` variant. - 6 new unit tests in `neuron::config` covering the `effective_sources` synth (legacy `hf_cache` carry-through, explicit override preservation, helexa-alongside-huggingface) and `effective_default_source` fallback. - 2 new unit tests in `harness::candle::tests` covering multi-scheme `hf_api_for` routing, including the "unknown scheme" error path naming configured schemes. - Preflight integration tests updated to construct `ModelSourceId` and assert against the scheme-qualified error form. CI gate: cargo fmt --check, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace (all 24 test groups ok, zero failures). Out of scope (Phase 3): - Cortex catalogue `source` field — independent of Phase 1+2, ships when the registry comes online. - `helexa` source endpoint itself — separate project; this PR adds the client-side rails only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 13:42:11 +03:00
rob thijssen	61adff347a	feat(neuron): preflight placement check with structured errors Some checks failed CI / CUDA type-check (push) Successful in 31s Details CI / Format (push) Successful in 30s Details build-prerelease / Resolve version stamps (push) Successful in 48s Details CI / Test (push) Failing after 1m10s Details CI / Clippy (push) Successful in 2m49s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m25s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m53s Details build-prerelease / Package cortex RPM (push) Successful in 1m20s Details build-prerelease / Build neuron-ampere (push) Successful in 8m0s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details Phase 2 of plan-source-aware-loader-preflight. Adds a one-RTT placement feasibility check that runs before any device allocation, NCCL handshake, or weight fetch. Replaces today's opaque "fetch config.json … 404" failure mode (when an operator points `tensor_parallel = 2` at a GGUF-only repo) with a structured error that names the failure class and points at the fix. What lands: - `crates/neuron/src/harness/preflight.rs` — new module. Classifies a repo's siblings listing into `SourceFormat` (Gguf \| DenseSafetensors \| Mixed \| Empty), applies the tp/quant feasibility table, returns a `PlacementPlan` on success or a typed `PreflightError` on rejection. `PreflightError` is `serde::Serialize` so the HTTP layer can emit the structured shape verbatim; it's `thiserror::Error` so log lines get a single-line Display when downcasting from anyhow. Includes best-effort Levenshtein-nearest suggestion for malformed quant names (the second sharp edge the HauhauCS scenario surfaced — operator writes `q6k` against filenames containing `Q6_K_P`, and today's matcher just says "no GGUF file matching quant"). - `CandleHarness::load_model` — calls `preflight(...)` first thing after the "already loaded" guard, before any `ensure_device_worker` or `resolve_*`. Failure wraps the typed error in `anyhow::Error` so the existing trait surface is unchanged; the HTTP handler and the startup logger downcast to recover the structured form. - `crates/neuron/src/api.rs::load_model` handler — maps `PreflightError` to 422 Unprocessable Entity with `{"error": {"kind": "...", "model_id": "...", "suggestion": "..." }}`. Other failures keep the existing 400 + free-form `format!("{e:#}")` shape. - `crates/neuron/src/startup.rs::load_default_models` — when the failure is a preflight rejection, log as `reason=<kind> detail=<msg>` instead of the opaque `error=<chain>`, so journalctl on beast will now show `reason=tp_requires_safetensors detail="repo is GGUF-only (8 .gguf files); TP requires dense safetensors..."` instead of `error=fetch config.json from HauhauCS/...: 404 Not Found`. Tests: - 18 unit tests in `harness/preflight.rs` covering classifier, quant matching, Levenshtein, error serialization, and the full feasibility table (gguf+tp rejected, gguf+bad-quant suggests nearest, gguf+good-quant ok, dense+tp ok, empty rejected, mixed prefers safetensors). - 7 integration tests in `tests/preflight.rs` exercising the network path through an axum mock that serves hf-hub-compatible `/api/models/{org}/{name}/revision/main` payloads. Adds `tempfile` as a dev-dependency for per-test cache dirs. Out of scope (deferred to subsequent phases): - Phase 1 (source-aware loader plumbing — `scheme:org/name` parsing, per-scheme `SourceConfig`, cache disambiguation). Preflight runs against the single configured HuggingFace source today; the scheme threading lands cleanly when Phase 1 ships. - Phase 3 (cortex catalogue source field). - GGUF tensor-parallel loading. Preflight rejects this combination with `TpRequiresSafetensors`; the underlying loader gap is the separate `Helexa` curated-registry / heretic-rs conversation. Refs #4-#9 architectural follow-up; no specific issue closed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 13:24:30 +03:00
rob thijssen	0af8c8d6e7	chore(ci): enable colored logs for readability	2026-06-01 09:06:28 +03:00
rob thijssen	435fd10902	fix(neuron): macro-ify CUDA single-GPU route_token so DecodeStream type stays inferred All checks were successful CI / CUDA type-check (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 29s Details CI / Format (push) Successful in 29s Details CI / Clippy (push) Successful in 2m47s Details build-prerelease / Build cortex binary (push) Successful in 4m27s Details CI / Test (push) Successful in 5m40s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m47s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Build neuron-ampere (push) Successful in 8m30s Details build-prerelease / Build neuron-ada (push) Successful in 5m39s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m2s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m11s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 4m1s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m5s Details Prerelease build (run 270) failed on commit `cb30383` with: error[E0107]: struct takes 5 generic arguments but 0 generic arguments were supplied --> crates/neuron/src/harness/candle.rs:3554:41 \| 3554 \| decode_stream: &mut tokenizers::DecodeStream<'_>, \| ^^^^^^^^^^^^ The Step-2-era refactor for #6's tool-call extraction added a nested `async fn route_token` inside `stream_inference_via_worker` that named `tokenizers::DecodeStream<'_>` as a parameter type. `DecodeStream` actually has five generic parameters (`'tok, M, N, PT, PP, D`) which makes naming it explicitly painful — the working approach the CPU path uses is a macro, where the body expands inline at the call site and the decoder type stays inferred. This commit replicates the CPU-side macro for the CUDA worker path. Same shape, just with `.await` calls inside (macros tolerate that since they expand inline into the enclosing async context). Control flow uses a labelled-block + `consumer_alive` flag rather than `return` so the macro stays generic over the surrounding return type. The CPU build (default-feature workspace, what `clippy` and `test` jobs exercise) doesn't compile this `#[cfg(feature = "cuda")]` branch, which is why local CI green-lit it. The cuda-check job should catch this category of breakage now that #cb30383+CI-fix landed; this commit just resolves the actual breakage on the prerelease workflow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 08:59:56 +03:00
rob thijssen	cb303832bc	feat(neuron): render the model's chat_template with chat_template_kwargs Some checks failed CI / CUDA type-check (push) Failing after 58s Details build-prerelease / Resolve version stamps (push) Successful in 39s Details CI / Format (push) Successful in 40s Details build-prerelease / Build neuron-ampere (push) Failing after 1s Details CI / Clippy (push) Successful in 2m37s Details build-prerelease / Build cortex binary (push) Successful in 4m47s Details CI / Test (push) Successful in 6m13s Details build-prerelease / Build neuron-blackwell (push) Failing after 5m34s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m27s Details build-prerelease / Build neuron-ada (push) Failing after 7m20s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped Details Closes #9. Replaces the hardcoded `format_qwen3_prompt` ChatML glue with `minijinja`-driven rendering of the model's own `chat_template` from `tokenizer_config.json`. The request's `chat_template_kwargs` flow into the Jinja context so model-specific levers (Qwen3's `enable_thinking: false`, etc.) actually take effect. ## Implementation - New `harness::chat_template` module with three entry points: - `load_chat_template_alongside(tokenizer_json_path)` — probes `tokenizer_config.json` in the same hf-hub snapshot directory. Supports both the canonical string-form `chat_template` and the array-form some tokenizers ship (multi-template models). - `render_chat_template(template, messages, tools, kwargs)` — renders via `minijinja`. Messages flatten into the `[{role, content}]` shape HF templates iterate, with per-message extras (`tool_calls`, `tool_call_id`) preserved. `tools` and `kwargs` add into the Jinja context so templates that reference them work without us interpreting their shape. - `chat_templates_enabled()` reads `NEURON_USE_CHAT_TEMPLATE` (default true). Falsy values force the fallback path everywhere — a kill switch for emergency rollback without a rebuild. - `LoadedModel.chat_template: Option<String>` and the TP equivalent are populated once at load time. `None` (no tokenizer_config.json, parse error, missing field) routes the fallback path silently; logs go through `tracing::debug`/`warn` per condition. - New `build_prompt_for_request(chat_template, request)` wraps the decision: when both the template is present AND the kill switch is off, render with kwargs from `request.extra` (looks up `chat_template_kwargs` and `tools` lazily). On render error → warn + fallback to `format_qwen3_prompt`. Wired into all four current prompt-build sites (single-GPU stream + non-stream, TP stream + non-stream). ## Dependency `minijinja = "2"` with the `builtins`, `json`, and `serde` features. Pure-Rust Jinja2 implementation, ~80KB compiled. Used internally by HF's `tokenizers-rs` for its own chat templating; the API surface we touch (`Environment::add_template` + `Template::render(serde_value)`) is stable. ## Validation strategy I can't byte-compare the new path's output against `format_qwen3_prompt` for live models without GPU (CI doesn't have one). The fallback path and kill switch are the mitigations — a deploy can flip `NEURON_USE_CHAT_TEMPLATE=false` in the neuron service env if the chat template renders surprisingly on Qwen3-8B in production. The legacy formatter stays the fail-closed default. ## Scope cuts (documented in module header) - Tool-definition lifting from helexa-acp's system-prompt injection into the chat_template's native tools block is deferred. Today the request's `tools` array threads into the Jinja context, but helexa-acp continues to inject Hermes-format tool descriptions into the system prompt for backwards-compat with non-cortex endpoints. ## Tests 9 unit tests in `chat_template`: kill-switch matrix (truthy / falsy / unset), template loading (string form, array form, missing file, unparseable JSON, missing field), rendering (basic conversation threading, kwargs forwarding, message-extras threading for tool_calls). 215 workspace tests pass; clippy + fmt clean across all workspace features (default). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 23:43:11 +03:00
rob thijssen	44008358c5	feat(neuron): emit response.in_progress between created and output_item.added Some checks failed build-prerelease / Resolve version stamps (push) Successful in 40s Details CI / Format (push) Successful in 44s Details CI / Test (push) Failing after 1m5s Details CI / Clippy (push) Successful in 2m36s Details CI / CUDA type-check (push) Failing after 52s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m32s Details build-prerelease / Package cortex RPM (push) Successful in 1m20s Details build-prerelease / Build neuron-blackwell (push) Failing after 5m42s Details build-prerelease / Build neuron-ampere (push) Failing after 7m14s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details Refs #7. OpenAI's Responses API spec emits `response.in_progress` between `response.created` and the first output-item event to mark "request validated, model is generating". Some Responses-API clients distinguish loading-spinner vs streaming-spinner UI based on which event arrived last; emitting both keeps the wire shape matched. Carries the same shell as `response.created` (status=in_progress, empty output, no usage yet) — both events are payload-light bookkeeping, distinguished only by the event name. The hosted-tool event families remaining in #7 (web_search_call, code_interpreter_call, file_search_call, image_generation_call) stay deferred until the underlying tools exist in neuron. Updated `full_stream_emits_expected_event_sequence` to assert the new event lands in position 1; downstream indexing shifted by one across the existing test assertions. CI green, fmt + clippy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 23:30:34 +03:00
rob thijssen	2f387f33f8	ci: export CUDA paths in cuda-check so cudarc build.rs finds nvcc Some checks failed build-prerelease / Build cortex binary (push) Blocked by required conditions Details CI / Format (push) Successful in 34s Details build-prerelease / Resolve version stamps (push) Successful in 41s Details CI / Clippy (push) Failing after 1m7s Details CI / Test (push) Failing after 56s Details build-prerelease / Build neuron-blackwell (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package cortex RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details CI / CUDA type-check (push) Has been cancelled Details CI / Build cortex SRPM (push) Has been cancelled Details CI / Build neuron SRPM (push) Has been cancelled Details CI / Publish cortex to COPR (push) Has been cancelled Details CI / Publish neuron to COPR (push) Has been cancelled Details CI / Bump version in source (push) Has been cancelled Details act launches step shells without sourcing /etc/profile, so the gitea_runner user's PATH lacks /usr/local/cuda-13.0/bin. cudarc's build.rs panics with ENOENT on `nvcc --version` under the neuron crate's cuda-version-from-build-system feature. build-prerelease.yml already does this export — mirror it here. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 23:28:04 +03:00
rob thijssen	fc9a8c42a3	feat(neuron): extract `<tool_call>` blocks to structured tool_calls deltas Some checks failed build-prerelease / Build cortex binary (push) Blocked by required conditions Details CI / Clippy (push) Waiting to run Details CI / Test (push) Waiting to run Details CI / CUDA type-check (push) Failing after 17s Details build-prerelease / Resolve version stamps (push) Successful in 32s Details CI / Format (push) Successful in 32s Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package cortex RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-blackwell (push) Has been cancelled Details CI / Build cortex SRPM (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details CI / Build neuron SRPM (push) Has been cancelled Details CI / Publish cortex to COPR (push) Has been cancelled Details CI / Publish neuron to COPR (push) Has been cancelled Details CI / Bump version in source (push) Has been cancelled Details Closes #6. Same model-agnostic seam as #8 but for tool-call markers (`<tool_call>` / `</tool_call>` on Qwen3-Coder, Hermes-format, DeepSeek-Coder, gpt-oss, …). Lets Zed's tool-use feature and any other vanilla OpenAI chat client get structured `tool_calls` deltas out of cortex without having to parse markers themselves. ## Implementation 1. Tokenizer probe at load time (`detect_tool_call_token_pair` in `wire::event`) — same shape as the reasoning-marker probe from #8. Both open AND close must resolve to single token ids; non-tool-use models get `None` and pass through unchanged. Stored on `LoadedModel.tool_call_tokens` and the TP analogue. 2. New `InferenceEvent::ToolCall` variant — carries `index` (call slot, per-turn counter), generated `id` (`call_<hex>_<idx>`), `name`, and the complete `arguments` JSON string. One event per parsed call. 3. Token-level state machine in all three streaming paths (CPU `run_inference_streaming`, CUDA single-GPU `stream_inference_via_worker`, CUDA TP `chat_completion_tp_stream`) layered on top of #8's reasoning routing: - `<tool_call>` token → enter buffering state, clear buffer. - Tokens while buffering → accumulate into `tool_call_buf` via the decoder (so multi-byte UTF-8 still buffers correctly) without emitting anything visible. - `</tool_call>` token → take the buffer, parse with `parse_tool_call_body` (extract `name` + `arguments`), emit a structured `ToolCall` event with a fresh `call_<hex>` id and the parsed fields. - On parse failure → fall back to re-emitting the original `<tool_call>{buf}</tool_call>` block as plain text content so helexa-acp's existing `ToolCallParser` repair passes still have a chance to recover the call. 4. OpenAI chat projector emits the OpenAI streaming `tool_calls` delta shape on `InferenceEvent::ToolCall` — `{tool_calls: [{index, id, type:"function", function:{name, arguments}}]}`. One chunk per call slot. 5. OpenAI Responses projector drops `ToolCall` events for now (Responses-side function_call event family routing tracked under #7); the chat path is what unblocks Zed's tool use today. ## Acceptance - Vanilla OpenAI chat clients (Zed's tool-use feature, any other OpenAI-compatible tool-call consumer) get structured tool_calls deltas against cortex+neuron without having to parse `<tool_call>` markers in content. - helexa-acp continues to work — when neuron parses cleanly, it consumes the structured deltas through its existing decoder. When the model emits malformed JSON, neuron falls back to text pass-through and helexa-acp's `ToolCallParser` recovers via the same path it always did. - Models without tool-call markers in their tokenizer pass through unchanged. - No hardcoded model knowledge — entirely driven by tokenizer metadata. ## Tests 2 new detection tests in `wire::event` (Qwen3-style marker detection, no-marker case). The streaming paths themselves stay covered by the existing chat-completions integration tests; full end-to-end exercise of the new path requires GPU-loaded models and lives outside the CI test surface. 215 workspace tests pass; clippy + fmt clean across the workspace. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 23:26:31 +03:00
rob thijssen	7733eecba5	feat(neuron): strip reasoning from chat completions by default Some checks failed CI / CUDA type-check (push) Failing after 18s Details build-prerelease / Resolve version stamps (push) Successful in 32s Details CI / Format (push) Successful in 32s Details CI / Clippy (push) Successful in 2m36s Details build-prerelease / Build cortex binary (push) Successful in 4m29s Details CI / Test (push) Successful in 5m19s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 5m56s Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Build neuron-ampere (push) Successful in 7m45s Details build-prerelease / Build neuron-ada (push) Successful in 5m24s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m53s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m0s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m43s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s Details Closes #8. Reasoning-capable models (Qwen3, DeepSeek-R1, gpt-oss, Mistral Magistral, …) emit `<think>...</think>` blocks inline in their content stream. The chat-completions wire format has no slot for reasoning, so until this change every consumer either parsed the markers themselves (helexa-acp) or wrote the raw scratchpad content into their UI (Zed's commit-message generator — visible as the leaked reasoning block on every generated commit message against benjy's Qwen3-8B). ## Implementation, model-agnostic by design The neuron side now does token-level routing without any hardcoded model knowledge: 1. At load time (`detect_reasoning_token_pair` in `wire::event`), probe the tokenizer's vocabulary for a known reasoning-marker pair: `<think>` / `</think>` (Qwen3, DeepSeek-R1, gpt-oss), `[THINK]` / `[/THINK]` (Mistral Magistral), and a couple of derivatives. Each marker must resolve to a single token id; if both open and close resolve, stash on `LoadedModel.reasoning_tokens` (similarly `TpLoadedModel`). Non-reasoning models get `None` and pass through unchanged. 2. At inference time, the three streaming paths (`run_inference_streaming` CPU, `stream_inference_via_worker` CUDA single-GPU, `chat_completion_tp_stream` CUDA TP) now check each sampled token against the pair via the new `handle_reasoning_marker` helper before feeding it to the detokeniser. Open marker → set `in_reasoning = true`, drop the marker. Close marker → unset, drop. Other tokens go through `emit_delta(_blocking)` which now picks `ReasoningDelta` or `TextDelta` based on state. Markers never appear in the streamed output. 3. In `wire::openai_chat`, the projector splits into: - `project_chat_stream` (unchanged signature; default behaviour — drops `ReasoningDelta`) - `project_chat_stream_with(rx, …, ChatProjectionConfig)` — when `include_thinking: true` and `reasoning_markers: Some(_)`, re-wraps reasoning content with the literal open/close marker text and emits as content deltas. Preserves the on-the-wire shape that helexa-acp's `ThinkParser` expects. 4. HTTP handler reads `x-include-thinking: true` (case- insensitive `1`/`true`/`yes`) from the request headers and threads it into the projection config. cortex-gateway already forwards arbitrary headers verbatim, so the opt-in works end-to-end without gateway changes. 5. helexa-acp's `openai_chat` provider sets `x-include-thinking: true` on every request so its existing `ThinkParser` keeps receiving the marked content stream. `ThinkParser` itself is unchanged — needed for endpoints that aren't reasoning-aware (OpenRouter, OpenAI directly, etc.). ## Acceptance - Zed's commit-message generator (vanilla chat-completions client, no `x-include-thinking`) gets clean commit messages with no `<think>` block. - helexa-acp sessions continue to render thinking in Zed's thought UI via the opt-in path. - Models without reasoning tokens declared in their tokenizer pass through unchanged. - Implementation contains zero references to "qwen3" or any specific model — entirely driven by tokenizer metadata. ## Tests 9 new tests in `wire::event` (token-pair detection across 4 marker conventions, edge cases) and `wire::openai_chat` (default drop, opt-in re-wrap with multi-chunk reasoning, close-marker on Finish, fallback when markers absent, off-switch with markers present). All 213 workspace tests pass; fmt + clippy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 17:55:04 +03:00
rob thijssen	fdc0adb738	docs(helexa-acp): README + example config for end-user onboarding Some checks failed CI / CUDA type-check (push) Failing after 18s Details CI / Format (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 35s Details CI / Clippy (push) Successful in 2m36s Details build-prerelease / Build cortex binary (push) Successful in 4m13s Details CI / Test (push) Successful in 5m6s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 5m40s Details build-prerelease / Package cortex RPM (push) Successful in 1m19s Details build-prerelease / Build neuron-ampere (push) Successful in 7m53s Details build-prerelease / Build neuron-ada (push) Successful in 5m12s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m55s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m4s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m43s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m0s Details Stage 7. Walks a new user from "never heard of helexa-acp" to "chatting via Zed against helexa or a public API in 10 minutes": - crates/helexa-acp/README.md — install (from source / COPR), quick-start env-var path, multi-endpoint TOML, full Zed setup, endpoint cookbook (cortex/neuron, OpenAI, Anthropic, OpenRouter, LM Studio, multi-cortex), three session modes (Default / Bypass / Plan) with their tool tables, tool surface + path-handling rules, session resume, context compaction, troubleshooting for the five failure modes a new user is likely to hit, and architecture reference for contributors. - helexa-acp.example.toml — copy-paste-and-edit starter config at the repo root, mirroring the existing cortex.example.toml / neuron.example.toml pattern. No code changes. fmt + clippy clean as a sanity check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 14:25:56 +03:00
rob thijssen	8fa1d1962e	feat(helexa-acp): anthropic-messages provider Some checks failed CI / CUDA type-check (push) Failing after 18s Details CI / Format (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 35s Details CI / Test (push) Failing after 59s Details CI / Clippy (push) Successful in 2m28s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m17s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m32s Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Build neuron-ampere (push) Successful in 7m50s Details build-prerelease / Build neuron-ada (push) Successful in 5m55s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m55s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m2s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m52s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m4s Details Stage 6b. Third provider impl, completing the wire-format trio (openai-chat, openai-responses, anthropic-messages). Lets a helexa-acp endpoint configured with `wire_api = "anthropic-messages"` drive Claude models — either against Anthropic directly or via cortex's /v1/messages translation surface. ## Encoder (CompletionRequest → Anthropic body) - System messages flatten to the top-level `system` field (concatenated with blank lines when there are multiple). - User text → `{role:"user", content:"..."}`. - User MultiPart (text + images) → `content` array with Anthropic's distinct image shape: `{type:"image", source:{type:"base64", media_type, data}}` — structurally different from OpenAI's `image_url` data URI. - Assistant text → `{role:"assistant", content:"..."}`. - Assistant tool_calls → `content` array with optional `{type:"text"}` block plus one `{type:"tool_use", id, name, input:<parsed json>}` per call. The internal arguments JSON string is parsed back to a Value before encoding (Anthropic requires the parsed form); malformed JSON falls back to a String input so the request body still serialises. - Tool result → `{role:"user", content:[{type:"tool_result", tool_use_id, content}]}` per Anthropic's convention (no separate `tool` role). - `max_tokens` is required by Anthropic; defaults to 8192 when the request doesn't specify. ## Decoder (Anthropic SSE → CompletionEvent) Named SSE events: - `message_start` → captures input_tokens from `usage` for the eventual UsageStats. - `content_block_start` (type=text) → TextDelta (initial text, if any). - `content_block_start` (type=tool_use) → ToolCallStart; if a pre-buffered `input` is present, also emits a single ToolCallArgsDelta. - `content_block_start` (type=thinking, for extended-thinking models) → ReasoningDelta. - `content_block_delta` (text_delta) → TextDelta. - `content_block_delta` (input_json_delta) → ToolCallArgsDelta, correlated by block index. - `content_block_delta` (thinking_delta) → ReasoningDelta. - `message_delta` → Usage (final output_tokens) + Finish with stop_reason mapped: end_turn/stop_sequence → "stop", max_tokens → "length", tool_use → "tool_calls". - `message_stop` → stream terminates. - `ping` ignored (Anthropic's keep-alive). - `error` → yields Err and ends the stream. ## Wiring - Authentication: `x-api-key` + `anthropic-version: 2023-06-01` headers (not Bearer). Both ship when api_key is configured; servers that don't care (cortex) ignore them. - `WireApi::AnthropicMessages` in build_provider now constructs the provider instead of erroring "reserved for future". - `provider::mod.rs` registers the new module. 18 new unit tests: encoder (system collapse, multi-system concat, default max_tokens, multipart with image, tool_use blocks, tool results, malformed JSON arg fallback), decoder (text streaming, tool_use lifecycle, max_tokens→length mapping, empty deltas, ping events, error events, cancellation, malformed payload skip, thinking blocks). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 14:01:59 +03:00
rob thijssen	cad7552104	ci: clear sccache env on cuda-check so cargo doesn't try to wrap rustc Some checks failed CI / Test (push) Waiting to run Details CI / CUDA type-check (push) Failing after 18s Details build-prerelease / Resolve version stamps (push) Successful in 30s Details CI / Format (push) Successful in 31s Details CI / Clippy (push) Successful in 2m25s Details build-prerelease / Build cortex binary (push) Successful in 5m19s Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package cortex RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-blackwell (push) Has been cancelled Details CI / Build cortex SRPM (push) Has been cancelled Details CI / Build neuron SRPM (push) Has been cancelled Details CI / Publish cortex to COPR (push) Has been cancelled Details CI / Publish neuron to COPR (push) Has been cancelled Details CI / Bump version in source (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details CI run 255 job 3 (CUDA type-check) fails with: error: could not execute process `* rustc -vV` (never executed) Caused by: No such file or directory (os error 2) The redacted `` is `sccache`. The ci.yml workflow-level env block sets `RUSTC_WRAPPER: sccache` because the generic `rust` runner has sccache installed and routes the cache to caveman.kosherinata.internal. The new `cuda-check` job runs on `cuda-13.0` (where nvcc lives), and that runner doesn't carry sccache on PATH — so cargo's first action (`sccache rustc -vV` to probe the compiler version) fails before borrow-check even starts. `build-prerelease.yml`, which uses the same `cuda-13.0` runner for the actual release neuron builds, deliberately does NOT set RUSTC_WRAPPER. That's the pattern this commit applies. Fix: override `RUSTC_WRAPPER` (plus the SCCACHE_ and AWS_* env locally on the job. We lose caching on the cuda-check job (it's borrow-check-only and finishes in a couple minutes anyway), but the gate runs. The job's purpose — fail fast on `#[cfg(feature = "cuda")]` borrowck errors that the default-feature gate misses — is what matters, and that purpose was undermined by the env inheritance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 13:55:18 +03:00
rob thijssen	1818dfb337	feat(helexa-acp): openai-responses provider Some checks failed CI / Format (push) Successful in 38s Details build-prerelease / Resolve version stamps (push) Successful in 45s Details CI / Clippy (push) Successful in 2m35s Details CI / CUDA type-check (push) Failing after 12s Details CI / Test (push) Successful in 5m54s Details build-prerelease / Build cortex binary (push) Successful in 5m9s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m20s Details build-prerelease / Build neuron-blackwell (push) Successful in 4m36s Details build-prerelease / Build neuron-ampere (push) Successful in 7m11s Details build-prerelease / Build neuron-ada (push) Successful in 6m33s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m55s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m56s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 59s Details Stage 6a. Implements the `Provider` trait for OpenAI's Responses API surface, parallel to the existing `OpenAIChatProvider`. Lets a helexa-acp endpoint configured with `wire_api = "openai-responses"` drive a `/v1/responses` server (today: neuron through cortex; later: OpenAI directly) using the same agent-loop machinery the chat provider already supports. ## Encoder (CompletionRequest → Responses body) - System messages collapse into a single top-level `instructions` string. Multiple system messages concatenate with blank lines so ordering is preserved. - User messages become `{type:"message", role:"user", content:…}` input items. Text content stays a bare string; MultiPart content (text + images, post-Stage 5) becomes a `[{type:"input_text"}, {type:"input_image"}]` array with images encoded as `data:{mime};base64,{data}` URIs — exactly the shape neuron's `wire::openai_responses::request_to_chat` accepts. - Assistant text turns become an `output_text` content part inside a `message` item. - Assistant tool-call turns become `function_call` input items. - Tool result turns become `function_call_output` input items. - `max_tokens` translates to `max_output_tokens`. ## Decoder (Responses SSE → CompletionEvent) Reads named events on the SSE `event:` line: - `response.output_text.delta` → `CompletionEvent::TextDelta` - `response.output_item.added` with `type:"function_call"` → `CompletionEvent::ToolCallStart` (and, when the upstream pre-buffers fully, a single `ToolCallArgsDelta`) - `response.function_call_arguments.delta` → `CompletionEvent::ToolCallArgsDelta`, correlated back to the tool-call slot by output_index. - `response.completed` → `CompletionEvent::Usage` (if present) + `CompletionEvent::Finish` with reason mapped from `status`: `"completed"` → `"stop"`, `"incomplete"` → `"length"`. - Bookkeeping events (`response.created`, `response.in_progress`, `.content_part.`, `.output_text.done`, `.output_item.done`, `.function_call_arguments.done`, reasoning_) are skipped. ## Wiring - `EndpointConfig::responses_url()` joins `{base_url}/responses`. - `WireApi::OpenAiResponses` in `build_provider` constructs the new provider (was previously a "reserved for future" error). - `provider::mod.rs` registers the new module. ## Cuts (carried over from neuron-side issues) - The decoder's `ToolCall` handling fires correctly when the upstream emits `function_call` items, but the neuron candle harness doesn't yet (Refs #6). Real tool-call testing against cortex+neuron stays on the chat path until #6 lands. - Reasoning events (`response.reasoning_`) are deliberately dropped today; once neuron emits `InferenceEvent::ReasoningDelta` (Refs #5) the projector on the neuron side will start firing the reasoning event family and this decoder will need a matching case to route them to `CompletionEvent::ReasoningDelta`. 13 new unit tests cover encoder (system collapse, multipart user input, assistant output_text encoding, tool-call round-trip via function_call items) and decoder (text streaming, empty deltas dropped, length finish, function_call lifecycle, inline-arguments shape, cancellation, malformed payload skip). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 11:30:25 +03:00
rob thijssen	5ed1140c97	feat(cortex-gateway): proxy /v1/responses to neuron Some checks failed CI / CUDA type-check (push) Failing after 12s Details build-prerelease / Resolve version stamps (push) Successful in 33s Details CI / Format (push) Successful in 37s Details CI / Clippy (push) Failing after 1m5s Details build-prerelease / Build cortex binary (push) Successful in 4m26s Details CI / Test (push) Successful in 5m17s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 5m39s Details build-prerelease / Package cortex RPM (push) Successful in 1m24s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details Step 3 of the Responses rollout: plain proxy route on the gateway, no translation. Neuron speaks the Responses API natively after Step 2 (commit `957f704`), so the gateway just needs the same routing shape it uses for /v1/chat/completions — extract `model`, resolve via router::resolve, forward verbatim. - New `POST /v1/responses` handler in handlers.rs::responses. - Mock neuron under tests/common/mod.rs gains a `/v1/responses` endpoint that mirrors the ResponsesResponse shape neuron emits. - New integration test file `tests/responses.rs` exercises: - Happy path (200, body round-trips, ResponsesUsage shape). - Unknown model → 404 (matches chat-completions error shape). - Missing `model` field → 400 (same extract_model helper). Streaming proxy works through the same path as chat completions — the upstream Content-Type (`text/event-stream` for stream:true, `application/json` otherwise) propagates through proxy_with_metrics unchanged. Live-stream integration tests against a streaming mock deferred until we exercise the path against a real neuron, since the chat-completions streaming test already covers the proxy's SSE forwarding mechanics. Three new tests; clippy + fmt clean across the workspace. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 11:21:43 +03:00
rob thijssen	957f704efa	feat(neuron): OpenAI Responses API + ci cuda-check runner label Some checks failed build-prerelease / Package cortex RPM (push) Blocked by required conditions Details CI / CUDA type-check (push) Failing after 11s Details build-prerelease / Resolve version stamps (push) Successful in 30s Details CI / Format (push) Successful in 32s Details CI / Clippy (push) Successful in 2m31s Details build-prerelease / Build cortex binary (push) Successful in 4m32s Details CI / Test (push) Successful in 5m42s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 6m8s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details Step 2 of the Responses rollout: native `/v1/responses` endpoint on neuron that consumes the same InferenceEvent stream as `/v1/chat/completions` but emits it as the Responses API's named SSE event family. No gateway-side translation. ## Surface - `cortex-core::responses` envelope types: `ResponsesRequest`, `ResponsesInput` (text \| items), `ResponsesInputItem` (message \| function_call \| function_call_output \| reasoning), `ResponsesContentPart` (input_text \| input_image \| output_text), `ResponsesResponse`, `ResponsesOutputItem`, `ResponsesUsage`. Plus a `events::*` constant module so the projector and the wire shape stay in sync without string-typos. - `neuron::wire::openai_responses`: - `request_to_chat(req)` flattens Responses input + instructions into a `ChatCompletionRequest` the candle harness already understands. Text-only Parts collapse to a string; mixed text+image Parts go to chat's content-array shape; reasoning items drop; function_call / function_call_output round-trip via tool_calls / tool_call_id metadata so the surface is consistent for the day the harness emits tool calls. - `project_responses_stream(rx, meta)` reads InferenceEvents and emits the eight named events that compose a Responses stream: response.created → output_item.added → content_part.added → output_text.delta×N → output_text.done → content_part.done → output_item.done → response.completed. Synthesises start frames if the producer skips Start (poisoned model, early disconnect) so the stream stays coherent. - `build_response(meta, text, reason, usage)` for the non-streaming path. - `CandleHarness::inference_stream(req)` extracted from `chat_completion_stream`, returning a typed `InferenceStream` (event receiver + id/created/model_id metadata). Both `chat_completion_stream` and the new `responses_stream` are now thin wrappers that pick their wire projection. TP path got the same treatment (`chat_completion_tp_stream` → `inference_tp_stream`). - `POST /v1/responses` route on neuron. Non-streaming returns one buffered `ResponsesResponse`; streaming returns axum SSE with both event names and JSON data per frame (Responses, unlike chat completions, uses named `event:` lines). Reused `inference_error_response` helper hoisted out so the chat and responses handlers share the InferenceError → HTTP mapping. ## CI Also bundles the `cuda-check` runner-label fix from feedback on commit `1859777`: `runs-on: rpm` doesn't ship the CUDA toolkit so cudarc's nvcc-version build script blew up. Switched to `runs-on: cuda-13.0` per the existing labels. ## Scope cuts (documented in the modules) - `previous_response_id` rejected at translate time with 400 (`code: chained_conversation_not_supported`) — stateful chained conversations need a persistence layer we haven't built. - Reasoning items dropped (no Qwen3 `<think>` routing yet). - Single output item per response (one `"message"` carrying text); `function_call` items reserved but not synthesised. - Streaming events cover the core set; `response.in_progress` and the web_search / image_generation event families are out-of-scope. 22 new tests: 5 in cortex-core (envelope round-trips), 13 in neuron::wire (request translator + projector + non-streaming builder), 4 in neuron's tests/api.rs (route surface — 503 when no candle, 400 on previous_response_id, 404 on missing model for both stream and non-stream). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 11:13:44 +03:00
rob thijssen	1859777332	ci: add cuda type-check job so CUDA-only borrowck errors fail fast Some checks failed build-prerelease / Resolve version stamps (push) Successful in 30s Details CI / Format (push) Successful in 37s Details CI / CUDA type-check (push) Failing after 3m8s Details CI / Clippy (push) Successful in 2m27s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m46s Details build-prerelease / Build cortex binary (push) Successful in 5m0s Details build-prerelease / Build neuron-ampere (push) Successful in 7m39s Details CI / Test (push) Successful in 5m37s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m33s Details build-prerelease / Build neuron-ada (push) Successful in 5m12s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m0s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m8s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m43s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m9s Details Run 244 caught a use-of-moved-value in a `#[cfg(feature = "cuda")]` block that the default-feature workspace clippy/test gate had no chance of seeing. The error appeared only when the RPM build workflow compiled with `--features cuda` — 30+ minutes after push. Add a `cuda-check` job to ci.yml that runs `cargo check -p neuron --features cuda --all-targets` on the rpm runner (where nvcc / cudarc build deps live; the generic `rust` runner doesn't have them). Borrow-check only — we never run tests here, the runner has no GPU. Same retry pattern as clippy/test. Both SRPM jobs (`srpm-cortex`, `srpm-neuron`) now gate on `cuda-check` so a CUDA build break can't reach the release pipeline. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 09:49:51 +03:00
rob thijssen	6927286cab	fix(neuron): clone id/model_id before TP spawn so wire projector can use them Some checks failed build-prerelease / Package helexa-neuron-ada RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-ampere RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Blocked by required conditions Details CI / Format (push) Successful in 39s Details build-prerelease / Resolve version stamps (push) Successful in 40s Details CI / Clippy (push) Successful in 2m34s Details CI / Test (push) Successful in 5m40s Details build-prerelease / Build cortex binary (push) Successful in 5m16s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 5m49s Details build-prerelease / Package cortex RPM (push) Successful in 1m25s Details build-prerelease / Build neuron-ampere (push) Successful in 7m38s Details build-prerelease / Build neuron-ada (push) Successful in 5m34s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details The Step 1 refactor moved the InferenceEvent receiver wrap to after the orchestration spawn in chat_completion_tp_stream, but the spawn moves both `id` and `model_id` into its async closure (used heavily by acquire_pool_lock, NCCL ops, and tracing). Result: borrowck error E0382 use-of-moved-value on the wire_chat::project_chat_stream call. The non-CUDA build doesn't exercise this branch (it lives behind `#[cfg(feature = "cuda")]`) which is why the workspace clippy/test gate passed locally and on the regular CI workflow. The RPM build workflow, which compiles with --features cuda, caught it (run 244 jobs 2/3/4 against beast / ampere / ada respectively, all the same error). Fix: snapshot `id` and `model_id` into `projector_id` / `projector_model_id` before the spawn, use those at the projector call site. The originals stay free to be moved into the closure. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 09:37:10 +03:00
rob thijssen	302ccfb982	refactor(neuron): introduce InferenceEvent + wire projection layer Some checks failed build-prerelease / Resolve version stamps (push) Successful in 31s Details CI / Format (push) Successful in 38s Details CI / Clippy (push) Successful in 3m28s Details build-prerelease / Build neuron-blackwell (push) Failing after 6m4s Details build-prerelease / Build neuron-ampere (push) Failing after 7m20s Details CI / Test (push) Successful in 7m29s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-ada (push) Failing after 4m57s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m19s Details build-prerelease / Package cortex RPM (push) Successful in 1m24s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped Details Step 1 of the OpenAI Responses API rollout. Pure refactor — no new endpoints, no behaviour change on the wire. Lays the seam for emitting Responses-shaped streaming events from the same harness output as chat completions in Step 2. - New `neuron::wire` module tree: - `wire::event::InferenceEvent` — format-agnostic enum (Start, TextDelta, ReasoningDelta, Finish) the candle harness now emits as its native streaming currency. - `wire::event::FinishReason` — typed reason that maps cleanly onto OpenAI `finish_reason`, OpenAI Responses `status`, and Anthropic `stop_reason` strings. - `wire::openai_chat::project_chat_stream` — async task that consumes an InferenceEvent receiver and produces a ChatCompletionChunk receiver, stamping per-request metadata (id, created, model_id) onto every chunk. Output matches the pre-refactor wire shape bit-for-bit. - candle.rs refactored to emit InferenceEvent on its internal channel through all three streaming paths (CPU run_inference_streaming, CUDA single-GPU stream_inference_via_worker, CUDA TP chat_completion_tp_stream). The streaming functions lost their id/created/model_id parameters since wire-format metadata now lives in the projector. - emit_delta + emit_delta_blocking simplified to single-purpose TextDelta emitters with no wire-format coupling. - chat_completion_stream wraps the InferenceEvent receiver in wire_chat::project_chat_stream before returning so the /v1/chat/completions HTTP handler keeps consuming ChatCompletionChunks unchanged. External signature preserved. Also fixes a pre-existing helexa-acp test race (three modules each declared their own static LOCK for HOME mutation, so cross-module parallelism flaked tests that read HOME at runtime). Consolidated onto a single crate-wide path_util::ENV_LOCK. 122 helexa-acp tests + 44 neuron tests pass (5 new wire projection tests). fmt + clippy --workspace -- -D warnings clean. Ran helexa-acp suite 3x to confirm the env race is closed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 11:30:17 +03:00
rob thijssen	df0abfe4d4	feat(helexa-acp): image input for vision-capable models All checks were successful build-prerelease / Resolve version stamps (push) Successful in 34s Details CI / Format (push) Successful in 37s Details CI / Clippy (push) Successful in 2m33s Details CI / Test (push) Successful in 5m4s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 6m2s Details build-prerelease / Build neuron-ampere (push) Successful in 7m49s Details build-prerelease / Build neuron-ada (push) Successful in 5m27s Details build-prerelease / Build cortex binary (push) Successful in 4m16s Details build-prerelease / Package cortex RPM (push) Successful in 1m19s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m2s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m10s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m47s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s Details Stage 5. Zed clipboard/DnD images get forwarded as OpenAI content-array messages on user turns. - New MessageContent::MultiPart variant + MessagePart (Text\|Image) + ImageData struct (mime_type, base64 data, optional uri). - flatten_prompt now produces structured content: collapses to Text when every block is text (some upstreams treat array-form as vision-only and refuse on text-only models), otherwise produces MultiPart preserving block order. - OpenAI encoder emits `[{type:"text",text:…}, {type:"image_url", image_url:{url:"data:{mime};base64,{data}"}}]` for MultiPart user messages. Data URIs are used over remote `uri` because they round-trip through every upstream we care about. - prompt_capabilities.image = true at initialize so Zed actually sends image blocks. - compaction estimates ~512 tokens per image (the middle of the Qwen3-VL / OpenAI detail range) so the budget tracker doesn't pretend images are free. - session/load replays image-bearing user turns by surfacing the text parts verbatim and rendering each image as a "[image: {mime} ({n} bytes)]" placeholder chunk — Zed can show the prior text context even though re-uploading the bytes through ACP isn't meaningful for resume. - 4 new tests: flatten produces MultiPart in block order, image-only prompts still flatten to MultiPart, encoder emits the correct array shape, text-only encoding stays as the string form. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 09:43:00 +03:00
rob thijssen	b9016571f6	feat(helexa-acp): expand ~ / $HOME and fall back to local fs on ACP read errors Some checks failed build-prerelease / Package helexa-neuron-ada RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-ampere RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Blocked by required conditions Details build-prerelease / Resolve version stamps (push) Successful in 44s Details CI / Format (push) Successful in 50s Details CI / Clippy (push) Successful in 2m34s Details build-prerelease / Build cortex binary (push) Successful in 4m29s Details CI / Test (push) Successful in 5m13s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m18s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m4s Details build-prerelease / Build neuron-ampere (push) Successful in 8m15s Details build-prerelease / Build neuron-ada (push) Successful in 5m23s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details Two related polish fixes for daily use: - New `path_util` module expands `~`, `~/…`, `$HOME`, and `$HOME/…` prefixes in every tool that takes a path (read_file, write_file, edit_file, list_dir, bash cwd). The expansion is also applied to the plan-mode write gate so `~/.local/share/helexa-acp/plans/…` comparisons behave correctly regardless of which form the model emits. - `read_file` now falls back to `std::fs::read_to_string` when ACP's `fs/read_text_file` errors out. Zed's workspace-scoped read was the source of "model can't see ~/git/architecture/generic.md" when the session cwd is a different project; the fallback lets the agent pull in shared material that lives outside the active workspace, the same way `list_dir` already does via local `std::fs::read_dir`. Local fallback honours line/limit args. The fallback also produces a combined error message when both ACP and local-fs reads fail, so the model sees what actually broke rather than just the ACP-side error. 14 new unit tests cover path_util's prefix matrix, fallback success/failure paths, and the line/limit slicing in fallback. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 09:28:58 +03:00
rob thijssen	adbc52bfcd	feat(helexa-acp): model picker + session/set_model handler All checks were successful build-prerelease / Resolve version stamps (push) Successful in 37s Details CI / Format (push) Successful in 41s Details CI / Clippy (push) Successful in 2m32s Details build-prerelease / Build cortex binary (push) Successful in 4m45s Details CI / Test (push) Successful in 5m52s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m59s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-ampere (push) Successful in 7m21s Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Build neuron-ada (push) Successful in 4m54s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m54s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m58s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m48s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s Details Stage 4. Zed's model dropdown now lists every model from every configured endpoint, and switching it routes the next prompt to a new endpoint+model. - Enable `unstable_session_model` on the agent-client-protocol dep so SessionModelState / SetSessionModelRequest / ModelInfo are available. - Agent::new becomes async and calls Provider::list_models on every provider at startup; per-endpoint failures warn-and-skip instead of aborting the agent. - With a single endpoint configured, model ids appear bare; with multiple endpoints every id carries the `endpoint:` prefix so the picker is unambiguous and parse_model_selector routes correctly. - NewSessionResponse and LoadSessionResponse attach SessionModelState with the session's current model id + the aggregated catalogue. - session/set_model: validates the requested model id against resolve_provider, mutates session.model_id, and persists so the on-disk transcript reflects the new model. Three new aggregate_models tests cover the prefixing rule (bare vs multi-endpoint) and warn-and-skip on a failing endpoint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 09:10:16 +03:00
rob thijssen	537a0fe7f2	feat(helexa-acp): context compaction for small-context local models All checks were successful build-prerelease / Resolve version stamps (push) Successful in 26s Details CI / Format (push) Successful in 29s Details CI / Clippy (push) Successful in 2m26s Details build-prerelease / Build cortex binary (push) Successful in 5m17s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m51s Details CI / Test (push) Successful in 5m53s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Build neuron-ampere (push) Successful in 7m58s Details build-prerelease / Build neuron-ada (push) Successful in 5m30s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m57s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m7s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m40s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m0s Details A new src/compaction.rs module projects rolling conversation history into a token budget before each completion. Older tool results and assistant prose get elided to one-line markers; system prompts, user turns, and the last KEEP_TAIL=4 messages stay verbatim. tool_call_id pairing is preserved so OpenAI strict-schema providers keep working. Driven by a new per-endpoint `context_window` config field (also HELEXA_ACP_CONTEXT_WINDOW for the env-only single-endpoint case). When set, prompt budget = context_window - max_tokens - 512_safety; when unset, behaviour is unchanged. Without this, a 32 K Qwen3 dies with `prompt_too_long` after the first few read_file results pile up in history — the symptom seen in plan-mode dogfooding on beat. 10 new unit tests cover the compaction strategy and the prompt budget arithmetic. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 08:22:01 +03:00
rob thijssen	cbadfcf112	feat(helexa-acp): plan mode — third session mode for read-and-plan-only flows Some checks failed build-prerelease / Package helexa-neuron-ada RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-ampere RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Blocked by required conditions Details build-prerelease / Resolve version stamps (push) Successful in 37s Details CI / Format (push) Successful in 36s Details CI / Clippy (push) Successful in 2m44s Details CI / Test (push) Successful in 5m3s Details build-prerelease / Build cortex binary (push) Successful in 4m36s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m27s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m37s Details build-prerelease / Build neuron-ampere (push) Successful in 8m12s Details build-prerelease / Build neuron-ada (push) Successful in 5m32s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details Plan mode is the most restrictive of the three session modes: bash is disabled outright, writes are confined to a per-project plan directory under $XDG_DATA_HOME/helexa-acp/plans/<basename>-<8hex>/, and reads / list_dir are unrestricted. The system prompt is rebuilt at the top of every round so a mid-turn switch into (or out of) plan mode takes effect on the next streaming round, and plan mode appends a 3-option menu instructing the model to stop and let the user pick how to proceed once the plan is complete. The project id is basename + FNV-1a-32 of the cwd so it stays stable across runs (SipHash's DefaultHasher reseeds per process), while still disambiguating multiple checkouts that share a final path component. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 08:06:25 +03:00
rob thijssen	3ecbb21ece	fix(helexa-acp): persist per round, cancel previous prompt, log loop All checks were successful build-prerelease / Resolve version stamps (push) Successful in 34s Details CI / Format (push) Successful in 35s Details CI / Clippy (push) Successful in 2m32s Details CI / Test (push) Successful in 5m8s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 6m4s Details build-prerelease / Build neuron-ampere (push) Successful in 8m13s Details build-prerelease / Build neuron-ada (push) Successful in 5m18s Details build-prerelease / Build cortex binary (push) Successful in 16m12s Details build-prerelease / Package cortex RPM (push) Successful in 1m15s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m57s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m2s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m39s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s Details Three changes addressing "session stops mid-turn and disk store doesn't update": 1. Per-round persistence. drive_prompt previously called store::save() once at the very end of the turn. If the loop stalled in a later round (long-running bash, upstream SSE that never finished, wedged ACP roundtrip), earlier successful rounds lived only in the spawned task's `new_turns` and never reached disk. Move the extend-history + save into a helper (extend_and_persist) and call it at the end of every loop iteration. The post-loop save catches whatever the break paths leave behind. Failure is logged not propagated. 2. Cancel previous in-flight prompt on new session/prompt. The handler used to overwrite SessionState.cancel with a fresh token without firing the old one. A wedged prior prompt would then live forever, holding session-state references and never persisting. Now we fire the existing cancel under the lock before installing the new token — the old task observes is_cancelled() on its next .await and unwinds. 3. Per-round and per-tool log lines. drive_prompt now emits: - INFO prompt round: streaming { round, of, history_turns } - INFO dispatch tool { tool, tool_call_id } - INFO dispatch tool complete { tool_call_id, is_error } - INFO prompt round complete; persisting { round, turns } - INFO prompt complete { stop_reason } so the next hang shows up by line number in /tmp/helexa-acp.log instead of as silence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 16:29:22 +03:00
rob thijssen	0d841a4981	feat(helexa-acp): replay session history on session/load Some checks failed CI / Format (push) Successful in 31s Details build-prerelease / Resolve version stamps (push) Successful in 48s Details CI / Test (push) Failing after 1m19s Details CI / Clippy (push) Successful in 2m56s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m17s Details build-prerelease / Package cortex RPM (push) Successful in 1m26s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m52s Details build-prerelease / Build neuron-ampere (push) Successful in 7m49s Details build-prerelease / Build neuron-ada (push) Successful in 5m8s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m57s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m0s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s Details session/list and session/load were both implemented but clicking a session in Zed's thread picker still left the agent panel empty. Zed (and ACP clients in general) doesn't cache the transcript for custom agent_servers entries — it only owns conversation state for first-party agents. For custom agents the expectation is that session/load returns successfully and the agent then re-emits the conversation as a stream of session/update notifications so the client can rebuild its view. Implement that replay path: - handle_load_session now returns (LoadSessionResponse, Vec<Message>) so the caller has the history available after the in-memory hydration finishes. - The session/load closure responds to the request first, then spawns a task that calls replay_history off the dispatch loop. - replay_history walks the persisted history and emits one session/update per turn: Role::User → UserMessageChunk(text) Role::Assistant text → AgentMessageChunk(text) Role::Assistant tool → AgentMessageChunk for any accompanying text + one ToolCall card per call (with kind/title/raw_input rendered the same way as the live dispatch path) Role::Tool result → ToolCallUpdate matching the assistant's call id, status: Completed, content set to the result text Role::System → skipped (system prompts aren't shown) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 16:02:00 +03:00
rob thijssen	0bbb9b752d	feat(helexa-acp): session/list so Zed can discover sessions to resume All checks were successful build-prerelease / Resolve version stamps (push) Successful in 28s Details CI / Format (push) Successful in 28s Details CI / Clippy (push) Successful in 2m45s Details build-prerelease / Build cortex binary (push) Successful in 4m41s Details CI / Test (push) Successful in 4m58s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m4s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Build neuron-ampere (push) Successful in 7m36s Details build-prerelease / Build neuron-ada (push) Successful in 5m40s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m57s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m3s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m40s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s Details Stage 3b only implemented the trailing half of resume: write sessions to disk + handle session/load. But Zed (and any ACP client) needs `session/list` to discover which session belongs to the workspace it's reopening — without it, the client only knows how to mint new sessions and resume never fires even though the JSON sits ready on disk. Add the missing pieces: - store::list / list_in_dir — enumerate {id}.json under sessions_dir(), optionally filter by cwd, sort recent-first. Skips unparseable files with a warn rather than aborting. - store::unix_to_iso8601 — RFC 3339 formatter for SessionInfo.updated_at; pulls chrono in directly (already in the dep tree transitively). - agent::handle_list_sessions — wires the request to the store, builds SessionInfo entries with derived titles (first user turn, truncated to 60 chars). - agent::initialize_response — advertise session_capabilities.list = {} alongside the existing load_session: true. Verified end-to-end against the user's real hxa-1.json (60-turn beat conversation): `session/list` returns the entry with cwd, derived title, and ISO 8601 timestamp. 4 new store unit tests for list filtering, missing-dir handling, unparseable-file skipping, and ISO 8601 formatting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 14:34:41 +03:00
rob thijssen	5aac1ffc59	feat(helexa-acp): session resume via session/load All checks were successful CI / Format (push) Successful in 31s Details build-prerelease / Resolve version stamps (push) Successful in 40s Details CI / Clippy (push) Successful in 2m37s Details CI / Test (push) Successful in 4m59s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m35s Details build-prerelease / Package cortex RPM (push) Successful in 1m19s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m4s Details build-prerelease / Build neuron-ampere (push) Successful in 7m45s Details build-prerelease / Build neuron-ada (push) Successful in 5m31s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m53s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m0s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m43s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s Details Zed restarts (frequent during helexa-acp dogfooding) used to lose every conversation because we'd ignore the load_session capability and treat every project-reopen as a fresh session/new. Persist sessions to disk and honour session/load so the agent panel comes back where it left off. Storage layout: $XDG_DATA_HOME/helexa-acp/sessions/{session_id}.json Each file holds session_id, cwd, model_id, mode_id, full Message history, plus created/updated timestamps. Atomic save via tempfile+rename so a crash mid-write can't corrupt the store. Touch points: - src/store.rs (new) — sessions_dir() resolution, save/load via default and explicit-dir entry points (so unit tests don't have to race on XDG_DATA_HOME). 5 unit tests cover round-trip, not-found errors, atomic overwrite, tool-call/result preservation, and the filename sanitiser's path-traversal handling. - src/provider/mod.rs — Serialize/Deserialize on Role, Message, MessageContent, ToolCall. MessageContent::Text turned into a struct variant ({text: ...}) so internally-tagged JSON works. - src/agent.rs — initialize_response advertises load_session: true; handle_load_session reads the file, snapshots in-memory state, returns LoadSessionResponse with the persisted mode preselected; drive_prompt persists at the end of every prompt round under the session lock with the I/O outside the lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 13:34:42 +03:00
rob thijssen	ec2b6450b2	feat(helexa-acp): infer tool name from arg shape when model omits it Some checks are pending build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Blocked by required conditions Details build-prerelease / Resolve version stamps (push) Successful in 33s Details CI / Format (push) Successful in 36s Details CI / Clippy (push) Successful in 2m33s Details build-prerelease / Build cortex binary (push) Successful in 4m20s Details CI / Test (push) Successful in 5m4s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 5m40s Details build-prerelease / Build neuron-ampere (push) Successful in 7m53s Details build-prerelease / Build neuron-ada (push) Successful in 5m33s Details build-prerelease / Package cortex RPM (push) Successful in 8m20s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m56s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m57s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m46s Details Qwen3.6-27B occasionally emits a <tool_call> body with the right arguments but no top-level `name` field — observed in the field as mkdir-style bash calls like {"arguments":{"command":"mkdir -p .../doc/plan/{01-discovery,...}"}} with no `name`. The agent had no tool to dispatch and surfaced a Failed card; the model would then hang or retry the same shape. Add a shape-based inference layer: - tools::infer_tool_name(arguments) — given an `arguments` object alone, return Some(name) when the key set uniquely identifies one tool: `{command}` or `{command,cwd}` → bash, `{path,content}` → write_file, `{path,old_text,new_text}` → edit_file. Ambiguous shapes (`{path}` alone — could be read_file or list_dir) return None so the agent still emits a Failed card rather than guessing. - agent::try_repair_missing_name(raw) — parses a malformed body, applies infer_tool_name, returns (name, args_json) on success. - drive_prompt sweeps malformed_calls through this repair before the Failed-card path. Recovered calls go into tool_buckets at the next free index and dispatch through the normal tool loop. 10 new unit tests in tools::tests cover the inference table plus the verbatim mkdir failure from the field log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 13:14:50 +03:00
rob thijssen	a494c8d43c	feat(helexa-acp): repair malformed tool calls and render failures as cards Some checks failed build-prerelease / Package helexa-neuron-blackwell RPM (push) Blocked by required conditions Details build-prerelease / Resolve version stamps (push) Successful in 28s Details CI / Format (push) Successful in 4m7s Details CI / Test (push) Failing after 1m2s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m10s Details CI / Clippy (push) Successful in 2m37s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m24s Details build-prerelease / Build neuron-ampere (push) Successful in 8m18s Details build-prerelease / Package cortex RPM (push) Successful in 1m22s Details build-prerelease / Build neuron-ada (push) Successful in 5m23s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m54s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m56s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details Two related fixes for cases where Qwen3 sometimes emits slightly-off JSON inside <tool_call> blocks: 1. JSON repair pass in qwen3::parse_tool_call_body — strip up to three trailing extra `}` characters (model overshoots its closing braces), and hoist `name` out of `arguments` when it lands nested instead of as a sibling. Both observed in the field; both trivially repairable; both now dispatch as normal tool calls instead of falling back to the malformed path. 2. New CompletionEvent::MalformedToolCall variant for the cases repair can't fix. decode_stream now emits it instead of wrapping the raw body in a TextDelta, and agent.rs surfaces each one as a Failed SessionUpdate::ToolCall card (so Zed renders it as a structured failure UI element rather than dumping the body inline) plus a synthetic tool-call/tool-result history pair so the model gets clear feedback for self-correction on the next round. Empty <tool_call></tool_call> blocks are now a no-op too (no Malformed event), matching the existing empty-<think> behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 12:58:51 +03:00
rob thijssen	abbedf8d8a	chore(neuron): bump default max_tokens from 512 to 8192 All checks were successful build-prerelease / Resolve version stamps (push) Successful in 44s Details CI / Format (push) Successful in 45s Details CI / Clippy (push) Successful in 2m41s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m35s Details build-prerelease / Build cortex binary (push) Successful in 4m32s Details CI / Test (push) Successful in 5m29s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m20s Details build-prerelease / Build neuron-ampere (push) Successful in 8m6s Details build-prerelease / Build neuron-ada (push) Successful in 5m19s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m55s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m57s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s Details 512 is too low for any modern coding model — clients that don't explicitly set max_tokens get clipped responses with no diagnostic. Bump the fallback at all four inference call sites (single-GPU streaming + non-streaming, TP leader + non-leader) to 8192, which fits comfortably within Qwen3-class context windows after a typical agent prompt and lines up with what helexa-acp / a0 / curl clients reasonably expect. Clients that explicitly set max_tokens (now including helexa-acp via HELEXA_ACP_MAX_TOKENS / per-endpoint TOML) override this. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 12:38:28 +03:00
rob thijssen	6cc14e925c	feat(helexa-acp): per-endpoint max_tokens config Some checks failed CI / Format (push) Successful in 34s Details build-prerelease / Resolve version stamps (push) Successful in 35s Details CI / Clippy (push) Failing after 1m3s Details CI / Test (push) Failing after 1m4s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package cortex RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-blackwell (push) Has been cancelled Details The agent was sending max_tokens: None, letting cortex/neuron pick its own default — which trips Zed's "Output Limit Reached" on long turns. Add a per-endpoint max_tokens option in EndpointConfig (TOML key and HELEXA_ACP_MAX_TOKENS env var for the single-endpoint fallback) that the agent threads into every CompletionRequest by endpoint name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 12:34:23 +03:00
rob thijssen	1c16732668	feat(helexa-acp): route Qwen3 inline <think> blocks to reasoning Some checks failed build-prerelease / Build cortex binary (push) Blocked by required conditions Details CI / Test (push) Waiting to run Details CI / Format (push) Successful in 26s Details build-prerelease / Resolve version stamps (push) Successful in 30s Details CI / Clippy (push) Successful in 2m40s Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package cortex RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-blackwell (push) Has been cancelled Details CI / Build cortex SRPM (push) Has been cancelled Details CI / Build neuron SRPM (push) Has been cancelled Details CI / Publish cortex to COPR (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details CI / Publish neuron to COPR (push) Has been cancelled Details CI / Bump version in source (push) Has been cancelled Details Qwen3 emits chain-of-thought as literal <think>...</think> tags inside delta.content rather than via the separate reasoning_content field — so without parsing the markers, the thinking shows up in the message pane as ordinary text. Add a small ThinkParser in qwen3.rs (same chunk-boundary discipline as ToolCallParser) and stage it after the tool-call parser in decode_stream: text events from the tool-call parser are fed in and split into TextDelta / ReasoningDelta. Zed now renders thinking in its dedicated thought UI; visible answer text stays in the message pane. The parking-lot entry from the plan is now closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 12:30:25 +03:00
rob thijssen	5a0861d639	fix(helexa-acp): forward Dispatch::Response to its awaiting router Some checks failed build-prerelease / Package helexa-neuron-ada RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-ampere RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Blocked by required conditions Details build-prerelease / Resolve version stamps (push) Successful in 39s Details CI / Format (push) Successful in 41s Details CI / Clippy (push) Successful in 2m31s Details build-prerelease / Build cortex binary (push) Successful in 4m36s Details CI / Test (push) Successful in 5m31s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 5m51s Details build-prerelease / Package cortex RPM (push) Successful in 1m29s Details build-prerelease / Build neuron-ampere (push) Successful in 7m18s Details build-prerelease / Build neuron-ada (push) Successful in 5m6s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details The catch-all on_receive_dispatch handler was applying respond_with_error to every Dispatch variant, including Response. For Response variants, that call routes the error to the ResponseRouter for the outgoing request — silently overwriting the real reply from Zed with "Internal error: not implemented yet". Every ACP roundtrip we issue (fs/read_text_file, fs/write_text_file, session/request_permission, terminal/*) was therefore returning an error to the tool runner regardless of what Zed actually responded. The model saw uniformly-failing tools, gave up, and confabulated plausible explanations. Fix: pattern-match the Dispatch. Response → forward to its router via respond_with_result. Request / Notification → keep the "not implemented yet" error response as before. Found via debug logs showing WARN helexa_acp::agent: unhandled ACP message method="fs/read_text_file" right before every tool failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 12:16:21 +03:00
rob thijssen	33652ac651	feat(helexa-acp): HELEXA_ACP_LOG_FILE env for editor-host logging All checks were successful build-prerelease / Resolve version stamps (push) Successful in 37s Details CI / Format (push) Successful in 37s Details CI / Clippy (push) Successful in 2m44s Details CI / Test (push) Successful in 5m3s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m36s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m1s Details build-prerelease / Package cortex RPM (push) Successful in 1m22s Details build-prerelease / Build neuron-ampere (push) Successful in 8m23s Details build-prerelease / Build neuron-ada (push) Successful in 5m26s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m57s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m48s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 6m43s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 59s Details Editors that launch ACP agents (Zed today) don't reliably surface the child's stderr — and `args` in an `agent_servers` config is exec-args, not shell, so the usual `&>>` redirect trick doesn't work. Add a HELEXA_ACP_LOG_FILE env var that, when set to an absolute path, routes the tracing subscriber to append-write that file (ANSI off) instead of stderr. RUST_LOG still controls levels. Unopenable paths fall back to stderr with a warning so a typo doesn't silence the agent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 11:47:28 +03:00

1 2 3 4 5

209 Commits