diff --git a/crates/helexa-acp/README.md b/crates/helexa-acp/README.md new file mode 100644 index 0000000..29350cc --- /dev/null +++ b/crates/helexa-acp/README.md @@ -0,0 +1,546 @@ +# helexa-acp + +ACP (Agent Client Protocol) bridge for editors like +[Zed](https://zed.dev). Lets you point your editor's agent panel at +**any combination** of OpenAI-compatible, OpenAI Responses, and +Anthropic Messages endpoints — public APIs, private LAN deployments, +local Ollama / LM Studio — and switch between them per session via a +model dropdown. + +The "missing ACP binary" for users who don't want to be locked into +one vendor's agent client. + +``` + ┌───────────────────────────────────┐ + │ Zed (or any ACP editor client) │ + └────────────┬──────────────────────┘ + │ stdio JSON-RPC (ACP) + ▼ + ┌─────────────────┐ + │ helexa-acp │ ← one binary, multi-endpoint + └─────┬───────────┘ + │ HTTP / SSE + ┌────────┼─────────────┬──────────────┬──────────────┐ + ▼ ▼ ▼ ▼ ▼ + cortex/ OpenAI Anthropic OpenRouter LM Studio + neuron Responses Messages + (self- (gpt-5,…) (Claude) + hosted) +``` + +## What it does + +- **Speaks ACP** over stdio to editor clients (Zed today; any future + ACP client tomorrow). +- **Multi-endpoint** — one config file lists every LLM endpoint + you want available; pick one per session via the model dropdown + (`endpoint:model` selector). +- **Three wire formats**: `openai-chat` (the broadly compatible + default), `openai-responses` (newer OpenAI surface), and + `anthropic-messages` (Claude). Each is a separate provider impl + in `src/provider/`; adding a fourth (Gemini, Ollama native, …) is + one file plus a `WireApi` enum variant. +- **Built-in tools**: `read_file`, `write_file`, `edit_file`, + `list_dir`, `bash`. Permission-gated by default; the editor user + approves writes/shell per-call. +- **Three session modes**: Default (gated), Bypass Permissions + (auto-allow), and Plan (write-only-to-plan-dir, no shell). +- **Vision** — drag-drop images into the agent panel against any + vision-capable model. +- **Session resume** — multi-day conversations survive editor + restarts via on-disk transcript persistence. +- **Context compaction** — rolling history stays inside the model's + context window automatically so long sessions on small-context + local models don't fall over. + +## Install + +### From source + +```sh +git clone https://git.lair.cafe/helexa/cortex.git +cd cortex +cargo install --path crates/helexa-acp +# Binary lands at ~/.cargo/bin/helexa-acp +``` + +### Pre-built RPM (Fedora 43) + +```sh +dnf copr enable helexa/helexa +dnf install helexa-acp +``` + +The COPR project bundles helexa-acp alongside the cortex gateway +and helexa-neuron flavours; install only the package(s) you need. + +## Quick start + +The fastest path: env-var single-endpoint config. + +```sh +export HELEXA_ACP_BASE_URL=http://hanzalova.internal:31313/v1 +export HELEXA_ACP_MODEL=Qwen/Qwen3.6-27B +helexa-acp # speaks ACP over stdin/stdout; not interactive +``` + +Then in Zed (`~/.config/zed/settings.json`): + +```jsonc +{ + "agent_servers": { + "helexa": { + "command": "helexa-acp", + "args": [] + } + } +} +``` + +Restart Zed → open the agent panel → pick "helexa" → start +chatting. Tool calls (file reads, writes, bash) prompt for +permission per-call in Default mode. + +That's the minimum. The full config story below is what unlocks +the multi-endpoint dropdown. + +## Multi-endpoint config + +Copy `helexa-acp.example.toml` from this repo to +`$XDG_CONFIG_HOME/helexa-acp/config.toml` (typically +`~/.config/helexa-acp/config.toml`) and edit: + +```toml +default_endpoint = "helexa" + +[[endpoints]] +name = "helexa" +base_url = "http://hanzalova.internal:31313/v1" +wire_api = "openai-chat" +default_model = "Qwen/Qwen3.6-27B" +max_tokens = 8192 +context_window = 32768 + +[[endpoints]] +name = "openrouter" +base_url = "https://openrouter.ai/api/v1" +wire_api = "openai-chat" +api_key_env = "OPENROUTER_API_KEY" +default_model = "anthropic/claude-opus-4" + +[[endpoints]] +name = "anthropic" +base_url = "https://api.anthropic.com/v1" +wire_api = "anthropic-messages" +api_key_env = "ANTHROPIC_API_KEY" +default_model = "claude-opus-4" +``` + +Restart Zed. The model dropdown lists every model from every +configured endpoint with the `endpoint:model` selector +(`helexa:Qwen/Qwen3.6-27B`, `openrouter:anthropic/claude-opus-4`, +…). Switch mid-session; the next prompt routes to the new endpoint. + +When only one endpoint is configured the prefix is dropped (model +ids appear bare). + +### Selector syntax + +The `model` field on every internal request is parsed as +`:`: + +- `openrouter:gpt-4o` → routes to the `openrouter` endpoint, + model `gpt-4o`. +- `helexa/large` → no colon → falls through to whichever endpoint + is named in `default_endpoint`, model `helexa/large`. +- `:gpt-5` → leading colon → also falls through to default. + +## Endpoint cookbook + +Copy-pasteable blocks. Mix and match. + +### cortex / neuron (self-hosted) + +```toml +[[endpoints]] +name = "helexa" +base_url = "http://hanzalova.internal:31313/v1" +wire_api = "openai-chat" +default_model = "Qwen/Qwen3.6-27B" +max_tokens = 8192 +context_window = 32768 +``` + +Use `openai-responses` instead of `openai-chat` once cortex 0.1.16+ +is deployed and you want the Responses API surface (vision item +shape, structured reasoning items, etc.). + +### OpenAI directly + +```toml +[[endpoints]] +name = "openai" +base_url = "https://api.openai.com/v1" +wire_api = "openai-responses" +api_key_env = "OPENAI_API_KEY" +default_model = "gpt-5" +``` + +`openai-responses` is the right choice for current OpenAI models; +`openai-chat` works against legacy GPT-3.5/4 deployments and +anything labelled "chat completions". + +### Anthropic directly + +```toml +[[endpoints]] +name = "anthropic" +base_url = "https://api.anthropic.com/v1" +wire_api = "anthropic-messages" +api_key_env = "ANTHROPIC_API_KEY" +default_model = "claude-opus-4" +``` + +helexa-acp sends `x-api-key` + `anthropic-version: 2023-06-01` +automatically. The `api_key_env` indirection keeps your key out of +the config file. + +### OpenRouter (multi-vendor proxy) + +```toml +[[endpoints]] +name = "openrouter" +base_url = "https://openrouter.ai/api/v1" +wire_api = "openai-chat" +api_key_env = "OPENROUTER_API_KEY" +default_model = "anthropic/claude-opus-4" +``` + +OpenRouter speaks OpenAI-compat for every model it fronts, so +`openai-chat` is the right wire format regardless of the +underlying vendor. + +### LM Studio (local) + +```toml +[[endpoints]] +name = "lmstudio" +base_url = "http://localhost:1234/v1" +wire_api = "openai-chat" +default_model = "auto" +``` + +LM Studio's "auto" model id picks whatever's loaded. Same shape +works for Ollama in compat mode (`http://localhost:11434/v1`) and +vLLM. + +### Multiple cortex deployments + +```toml +[[endpoints]] +name = "lan" +base_url = "http://hanzalova.internal:31313/v1" +wire_api = "openai-chat" +default_model = "Qwen/Qwen3.6-27B" + +[[endpoints]] +name = "cloud" +base_url = "https://cortex.example.com/v1" +wire_api = "openai-chat" +api_key_env = "CLOUD_CORTEX_KEY" +default_model = "Qwen/Qwen3-VL-8B" +``` + +Use the `endpoint:model` selector to switch between them mid-session. + +## Zed setup + +`~/.config/zed/settings.json`: + +```jsonc +{ + "agent_servers": { + "helexa": { + "command": "helexa-acp" + } + } +} +``` + +Optional environment overrides for the binary: + +```jsonc +{ + "agent_servers": { + "helexa": { + "command": "helexa-acp", + "env": { + "HELEXA_ACP_LOG_FILE": "/tmp/helexa-acp.log", + "RUST_LOG": "helexa_acp=debug" + } + } + } +} +``` + +`HELEXA_ACP_LOG_FILE` is the one you actually want — Zed doesn't +surface the agent's stderr, so without that env var debug output is +invisible. Point it at a file you can `tail -f`. + +After restarting Zed: ⌘+? (or wherever your "Open Agent Panel" +binding is) → select "helexa" → the model dropdown populates from +your config → start prompting. + +## Modes + +Three session modes ship; the user picks via Zed's mode dropdown +on the agent panel. + +| Mode | Reads | Writes | Bash | Permission prompts | +|------|-------|--------|------|--------------------| +| **Default** | ✓ | with prompt | with prompt | per call | +| **Bypass Permissions** | ✓ | ✓ | ✓ | never | +| **Plan** | ✓ | only into plan dir | disabled | never (plan-dir writes auto-allow) | + +### Default + +Reads are always allowed (`read_file`, `list_dir` are +unrestricted). Writes and shell commands prompt the user before +running. The intended baseline for any session where the agent +might do something you'd rather review first. + +### Bypass Permissions + +Auto-allow every tool call. Use for agentic loops you trust — bulk +edits across many files, scripted workflows, prepared session +templates. Never for code the agent hasn't seen before. + +### Plan + +The "draft an implementation plan before you write code" mode. +Available tools: + +- `read_file`, `list_dir`: unrestricted (read the codebase). +- `write_file`, `edit_file`: allowed *only* under + `$XDG_DATA_HOME/helexa-acp/plans//`. Any path + outside that returns "plan mode: writes are restricted to …" + back to the model so it self-corrects. +- `bash`: disabled outright. Returns "plan mode: shell execution + is disabled" if attempted. + +When the plan is complete, the model presents a 3-option menu: + +1. **Bypass Permissions** — implement the plan now, no prompts. +2. **Default** — implement now with per-tool prompts. +3. **Plan** (stay here) — refine the plan with more guidance. + +Switch the mode dropdown to your preference and reply to proceed. + +## Tools + +Five tools, defined in `src/tools.rs`: + +| Tool | Args | Gated in Default? | +|------|------|-------------------| +| `read_file` | `path`, `line?`, `limit?` | no | +| `list_dir` | `path` | no | +| `write_file` | `path`, `content` | yes | +| `edit_file` | `path`, `old_text`, `new_text` | yes | +| `bash` | `command`, `cwd?` | yes | + +### Path handling + +`~`, `~/`, `$HOME`, and `$HOME/` are expanded server-side before +the path reaches ACP or local fs. Lets the model emit +`~/git/repo/file.rs` and have it Just Work. + +`read_file` first tries the editor's filesystem (ACP's +`fs/read_text_file` — respects open buffers, workspace overlays, +etc.). If that fails — typically because the path is outside Zed's +workspace boundary — it falls back to `std::fs::read_to_string`. +This lets the agent pull in shared material like +`~/git/architecture/generic.md` from a different project's +session. + +The fallback is logged at warn level so you can see when it kicks +in. + +### Tool dispatch + +Tool descriptions reach the model through a Qwen3 Hermes-format +`# Tools` block injected into the system prompt — cortex/neuron +pass the OpenAI `tools` request field through to the encoder +unread, so we work the model into emitting `{json}` +markers it then parses out of the content stream. This applies to +the helexa wire format; OpenAI / Anthropic endpoints with native +tool support would use their own paths once they're wired in. + +The parser is tolerant: malformed JSON (trailing braces, missing +`name`, name nested in `arguments`) gets a repair pass; if that +fails the call surfaces as a "Malformed tool call" card in Zed and +the model gets a synthetic error result so it can self-correct. + +## Session resume + +helexa-acp persists every session to +`$XDG_DATA_HOME/helexa-acp/sessions/.json`. Zed's `session/list` +RPC asks helexa-acp to enumerate them on workspace open; +`session/load` rehydrates and replays the transcript as +`session/update` notifications so the agent panel renders the +prior conversation. + +Behaviour: + +- Persisted per-round, so a mid-turn agent stall (long bash, wedged + ACP roundtrip) doesn't lose earlier rounds. +- Survives editor restart and the helexa-acp binary upgrading + between versions. +- Project-scoped: only sessions whose `cwd` matches the workspace + are listed. + +To wipe history: `rm -rf $XDG_DATA_HOME/helexa-acp/sessions/`. + +## Context compaction + +When an endpoint sets `context_window`, helexa-acp projects the +rolling history into a token budget before each request — old +`ToolResult` content (read_file payloads are the worst offenders) +gets elided to one-line markers, preserving `tool_call_id` pairing +so the wire schema stays valid. + +System prompts, user turns, and the most recent ~4 messages are +never elided. The full history stays on disk; compaction is a +per-request projection, not a destructive edit. + +Set `context_window = 32768` for a 32 K Qwen3, `131072` for a +modern Claude, etc. With `max_tokens` also set, the budget is +`context_window - max_tokens - 512_safety`. + +## Troubleshooting + +### "default endpoint 'helexa' has no usable provider — check config" + +The named default endpoint failed to construct. Usually: + +- `api_key_env` references a variable that isn't set in the env + Zed launched helexa-acp with. +- The TOML's `wire_api` is misspelled (only `openai-chat`, + `openai-responses`, `anthropic-messages` are accepted). + +Test by running `helexa-acp` directly from a shell — startup +errors land on stderr. + +### Model dropdown is empty + +Each provider's `list_models` failed at startup. Look at +`HELEXA_ACP_LOG_FILE` for "list_models failed; this endpoint's +models won't appear in the picker". Likely the endpoint URL is +wrong, the API key is invalid, or the upstream `/v1/models` +endpoint isn't responding. + +The agent still works against `default_model` even when the +dropdown is empty — list-models is for picking, not routing. + +### "prompt_too_long" / agent stalls mid-conversation + +You hit the model's context window. Set `context_window` on the +endpoint and helexa-acp will compact before sending. The log line +`context compaction applied` confirms it's running; if it fires +but the upstream still rejects, the compaction heuristic +under-counted and the budget needs tuning down. + +### Reading files outside the workspace returns "not found" + +Zed's `fs/read_text_file` is workspace-scoped. helexa-acp falls +back to local `std::fs` automatically when that fails — look for +`fs/read_text_file failed; falling back to local std::fs` in the +log. If even local read fails, the file genuinely doesn't exist +or the user process lacks permissions. + +### Tool calls render as text instead of structured cards + +The model is emitting `` markers that the parser can't +decode. Two common causes: + +1. The system prompt isn't reaching the model (cortex/neuron's + tool-block injection didn't fire). Confirm with + `RUST_LOG=helexa_acp=debug` and look at the outgoing + `POST /chat/completions` body. +2. The model itself is too small / undertrained to follow the + Hermes format reliably. helexa-acp has shape-based name + inference and JSON repair, but there's a floor below which + nothing helps. + +### Plan-mode writes refused even inside the plan dir + +The path comparison is byte-for-byte. If the model emits a path +with `~` and the plan_dir has the expanded form, expansion runs +*before* the comparison — but resolved-vs-symlinked-path +mismatches can still bite. The error message names the attempted +path and the expected prefix so you can compare directly. + +## Architecture + +Source layout under `crates/helexa-acp/src/`: + +| File | Responsibility | +|------|----------------| +| `main.rs` | tokio + Stdio transport. Builds providers, hands off to `agent::Agent` | +| `config.rs` | TOML + env-fallback config, endpoint resolver | +| `agent.rs` | ACP handlers (initialize, session/new, session/prompt, session/cancel, session/set_mode, session/set_model, session/load, session/list), prompt loop with tool-call recursion | +| `session.rs` | Per-session state map (Arc>>) | +| `store.rs` | On-disk session persistence, plan-dir resolution | +| `prompt.rs` | System-prompt assembly, plan-mode addendum | +| `tools.rs` | Tool schemas + shape-based name inference | +| `tool_runner.rs` | Dispatch a single tool call through ACP client RPCs; permission gate | +| `qwen3.rs` | Qwen3 Hermes tool-format parser (`` / `` markers) | +| `compaction.rs` | Token-budget compaction for the rolling history | +| `path_util.rs` | `~` / `$HOME` expansion shared across every path-taking tool | +| `provider/openai_chat.rs` | OpenAI chat completions provider | +| `provider/openai_responses.rs` | OpenAI Responses API provider | +| `provider/anthropic_messages.rs` | Anthropic Messages API provider | + +### Adding a new wire format + +1. New file under `src/provider/` implementing the `Provider` + trait (encoder + SSE decoder). +2. Add a `WireApi` variant in `config.rs`. +3. Wire it into `build_provider` in `main.rs`. +4. Done — every other module is wire-format-agnostic. + +### Concurrency + +- `Arc>>>>` — + per-session mutex so concurrent requests across sessions don't + contend; the map's RwLock is read-mostly. +- Every tool call dispatched serially within a session (parallel + dispatch would require Zed to handle interleaved permission + prompts). +- Provider streams are back-pressured by the consumer (bounded + mpsc channels). + +### Self-contained + +The crate has no workspace-internal dependencies (no +`cortex-core`, no `cortex-gateway`). Migration to a dedicated +GitHub repo for cross-platform CI / cargo-dist binaries is +Cargo.toml-only. + +## Status + +- Stages 1–6 shipped: scaffold, agent loop, tools, modes, session + resume, image input, model picker, three wire formats. +- Stage 8 (RPM + multi-platform CI) tracked in the canonical plan; + Linux x86_64 RPM ships today via the cortex monorepo's Gitea + Actions. + +## Contributing + +Repository: https://git.lair.cafe/helexa/cortex (`crates/helexa-acp/`). +Issues / PRs welcome. The canonical staged plan is in +`~/.claude/plans/plan-the-per-device-worker-abstract-micali.md` on +the maintainer's machine; the substages 3a–3e and 6a/6b that the +canonical plan didn't anticipate are documented in commit messages. + +CI: `cargo fmt --check --all`, `cargo clippy --workspace -- -D +warnings`, `cargo test --workspace` must all pass before merge. diff --git a/helexa-acp.example.toml b/helexa-acp.example.toml new file mode 100644 index 0000000..50b7d1c --- /dev/null +++ b/helexa-acp.example.toml @@ -0,0 +1,85 @@ +# helexa-acp.example.toml — example configuration +# +# Copy to $XDG_CONFIG_HOME/helexa-acp/config.toml (typically +# ~/.config/helexa-acp/config.toml) and adjust for your environment. +# +# helexa-acp is the ACP (Agent Client Protocol) bridge that connects +# editors like Zed to multiple LLM endpoints. Each endpoint speaks a +# specific wire format (openai-chat, openai-responses, or +# anthropic-messages); helexa-acp picks the right provider at runtime +# based on the `wire_api` field. +# +# Selecting a model from the editor follows the `endpoint:model` +# syntax — e.g. `openrouter:anthropic/claude-opus-4` routes the +# request to the `openrouter` endpoint with model +# `anthropic/claude-opus-4`. A bare `` (no colon) falls +# through to whichever endpoint is named in `default_endpoint`. + +default_endpoint = "helexa" + +# Optional: override the built-in system prompt with a file of your own. +# When unset, helexa-acp uses a concise coder prompt from src/prompt.rs. +# `{cwd}` in the file gets substituted with the session's working +# directory at request time. +# system_prompt_path = "/home/me/.config/helexa-acp/system-prompt.md" + +# ── helexa (cortex/neuron, self-hosted) ──────────────────────────── +# +# The canonical default. Drives cortex's reverse-proxy / fleet +# gateway, which routes to whichever neuron has the model loaded. +# `openai-chat` works against any cortex deployment; for vision +# models or reasoning surface, switch to `openai-responses` (cortex +# 0.1.16+). + +[[endpoints]] +name = "helexa" +base_url = "http://hanzalova.internal:31313/v1" +wire_api = "openai-chat" +default_model = "Qwen/Qwen3.6-27B" +max_tokens = 8192 +# Compaction kicks in when the rolling history grows past this token +# budget. Set to your model's context window. Disable by removing +# the field entirely. +context_window = 32768 + +# ── OpenRouter (proxy for OpenAI/Anthropic/Google/etc.) ──────────── + +[[endpoints]] +name = "openrouter" +base_url = "https://openrouter.ai/api/v1" +wire_api = "openai-chat" +api_key_env = "OPENROUTER_API_KEY" +default_model = "anthropic/claude-opus-4" + +# ── OpenAI directly (Responses API) ──────────────────────────────── +# +# Use `openai-responses` for the o-series and any model that +# benefits from the newer Responses API surface (web search, +# computer use, reasoning effort, etc.). + +[[endpoints]] +name = "openai" +base_url = "https://api.openai.com/v1" +wire_api = "openai-responses" +api_key_env = "OPENAI_API_KEY" +default_model = "gpt-5" + +# ── Anthropic directly ───────────────────────────────────────────── + +[[endpoints]] +name = "anthropic" +base_url = "https://api.anthropic.com/v1" +wire_api = "anthropic-messages" +api_key_env = "ANTHROPIC_API_KEY" +default_model = "claude-opus-4" + +# ── Local LM Studio / Ollama (compat mode) ───────────────────────── +# +# Most local-LLM servers expose OpenAI-compatible chat completions. +# Use `wire_api = "openai-chat"` and point at the local port. + +# [[endpoints]] +# name = "lmstudio" +# base_url = "http://localhost:1234/v1" +# wire_api = "openai-chat" +# default_model = "auto"