Files
cortex/crates/helexa-acp/README.md
rob thijssen fdc0adb738
Some checks failed
CI / CUDA type-check (push) Failing after 18s
CI / Format (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 35s
CI / Clippy (push) Successful in 2m36s
build-prerelease / Build cortex binary (push) Successful in 4m13s
CI / Test (push) Successful in 5m6s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 5m40s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Build neuron-ampere (push) Successful in 7m53s
build-prerelease / Build neuron-ada (push) Successful in 5m12s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m55s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m4s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m43s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m0s
docs(helexa-acp): README + example config for end-user onboarding
Stage 7. Walks a new user from "never heard of helexa-acp" to
"chatting via Zed against helexa or a public API in 10 minutes":

- crates/helexa-acp/README.md — install (from source / COPR),
  quick-start env-var path, multi-endpoint TOML, full Zed setup,
  endpoint cookbook (cortex/neuron, OpenAI, Anthropic, OpenRouter,
  LM Studio, multi-cortex), three session modes (Default / Bypass /
  Plan) with their tool tables, tool surface + path-handling rules,
  session resume, context compaction, troubleshooting for the
  five failure modes a new user is likely to hit, and architecture
  reference for contributors.

- helexa-acp.example.toml — copy-paste-and-edit starter config at
  the repo root, mirroring the existing cortex.example.toml /
  neuron.example.toml pattern.

No code changes. fmt + clippy clean as a sanity check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-31 14:25:56 +03:00

547 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# helexa-acp
ACP (Agent Client Protocol) bridge for editors like
[Zed](https://zed.dev). Lets you point your editor's agent panel at
**any combination** of OpenAI-compatible, OpenAI Responses, and
Anthropic Messages endpoints — public APIs, private LAN deployments,
local Ollama / LM Studio — and switch between them per session via a
model dropdown.
The "missing ACP binary" for users who don't want to be locked into
one vendor's agent client.
```
┌───────────────────────────────────┐
│ Zed (or any ACP editor client) │
└────────────┬──────────────────────┘
│ stdio JSON-RPC (ACP)
┌─────────────────┐
│ helexa-acp │ ← one binary, multi-endpoint
└─────┬───────────┘
│ HTTP / SSE
┌────────┼─────────────┬──────────────┬──────────────┐
▼ ▼ ▼ ▼ ▼
cortex/ OpenAI Anthropic OpenRouter LM Studio
neuron Responses Messages
(self- (gpt-5,…) (Claude)
hosted)
```
## What it does
- **Speaks ACP** over stdio to editor clients (Zed today; any future
ACP client tomorrow).
- **Multi-endpoint** — one config file lists every LLM endpoint
you want available; pick one per session via the model dropdown
(`endpoint:model` selector).
- **Three wire formats**: `openai-chat` (the broadly compatible
default), `openai-responses` (newer OpenAI surface), and
`anthropic-messages` (Claude). Each is a separate provider impl
in `src/provider/`; adding a fourth (Gemini, Ollama native, …) is
one file plus a `WireApi` enum variant.
- **Built-in tools**: `read_file`, `write_file`, `edit_file`,
`list_dir`, `bash`. Permission-gated by default; the editor user
approves writes/shell per-call.
- **Three session modes**: Default (gated), Bypass Permissions
(auto-allow), and Plan (write-only-to-plan-dir, no shell).
- **Vision** — drag-drop images into the agent panel against any
vision-capable model.
- **Session resume** — multi-day conversations survive editor
restarts via on-disk transcript persistence.
- **Context compaction** — rolling history stays inside the model's
context window automatically so long sessions on small-context
local models don't fall over.
## Install
### From source
```sh
git clone https://git.lair.cafe/helexa/cortex.git
cd cortex
cargo install --path crates/helexa-acp
# Binary lands at ~/.cargo/bin/helexa-acp
```
### Pre-built RPM (Fedora 43)
```sh
dnf copr enable helexa/helexa
dnf install helexa-acp
```
The COPR project bundles helexa-acp alongside the cortex gateway
and helexa-neuron flavours; install only the package(s) you need.
## Quick start
The fastest path: env-var single-endpoint config.
```sh
export HELEXA_ACP_BASE_URL=http://hanzalova.internal:31313/v1
export HELEXA_ACP_MODEL=Qwen/Qwen3.6-27B
helexa-acp # speaks ACP over stdin/stdout; not interactive
```
Then in Zed (`~/.config/zed/settings.json`):
```jsonc
{
"agent_servers": {
"helexa": {
"command": "helexa-acp",
"args": []
}
}
}
```
Restart Zed → open the agent panel → pick "helexa" → start
chatting. Tool calls (file reads, writes, bash) prompt for
permission per-call in Default mode.
That's the minimum. The full config story below is what unlocks
the multi-endpoint dropdown.
## Multi-endpoint config
Copy `helexa-acp.example.toml` from this repo to
`$XDG_CONFIG_HOME/helexa-acp/config.toml` (typically
`~/.config/helexa-acp/config.toml`) and edit:
```toml
default_endpoint = "helexa"
[[endpoints]]
name = "helexa"
base_url = "http://hanzalova.internal:31313/v1"
wire_api = "openai-chat"
default_model = "Qwen/Qwen3.6-27B"
max_tokens = 8192
context_window = 32768
[[endpoints]]
name = "openrouter"
base_url = "https://openrouter.ai/api/v1"
wire_api = "openai-chat"
api_key_env = "OPENROUTER_API_KEY"
default_model = "anthropic/claude-opus-4"
[[endpoints]]
name = "anthropic"
base_url = "https://api.anthropic.com/v1"
wire_api = "anthropic-messages"
api_key_env = "ANTHROPIC_API_KEY"
default_model = "claude-opus-4"
```
Restart Zed. The model dropdown lists every model from every
configured endpoint with the `endpoint:model` selector
(`helexa:Qwen/Qwen3.6-27B`, `openrouter:anthropic/claude-opus-4`,
…). Switch mid-session; the next prompt routes to the new endpoint.
When only one endpoint is configured the prefix is dropped (model
ids appear bare).
### Selector syntax
The `model` field on every internal request is parsed as
`<endpoint>:<model>`:
- `openrouter:gpt-4o` → routes to the `openrouter` endpoint,
model `gpt-4o`.
- `helexa/large` → no colon → falls through to whichever endpoint
is named in `default_endpoint`, model `helexa/large`.
- `:gpt-5` → leading colon → also falls through to default.
## Endpoint cookbook
Copy-pasteable blocks. Mix and match.
### cortex / neuron (self-hosted)
```toml
[[endpoints]]
name = "helexa"
base_url = "http://hanzalova.internal:31313/v1"
wire_api = "openai-chat"
default_model = "Qwen/Qwen3.6-27B"
max_tokens = 8192
context_window = 32768
```
Use `openai-responses` instead of `openai-chat` once cortex 0.1.16+
is deployed and you want the Responses API surface (vision item
shape, structured reasoning items, etc.).
### OpenAI directly
```toml
[[endpoints]]
name = "openai"
base_url = "https://api.openai.com/v1"
wire_api = "openai-responses"
api_key_env = "OPENAI_API_KEY"
default_model = "gpt-5"
```
`openai-responses` is the right choice for current OpenAI models;
`openai-chat` works against legacy GPT-3.5/4 deployments and
anything labelled "chat completions".
### Anthropic directly
```toml
[[endpoints]]
name = "anthropic"
base_url = "https://api.anthropic.com/v1"
wire_api = "anthropic-messages"
api_key_env = "ANTHROPIC_API_KEY"
default_model = "claude-opus-4"
```
helexa-acp sends `x-api-key` + `anthropic-version: 2023-06-01`
automatically. The `api_key_env` indirection keeps your key out of
the config file.
### OpenRouter (multi-vendor proxy)
```toml
[[endpoints]]
name = "openrouter"
base_url = "https://openrouter.ai/api/v1"
wire_api = "openai-chat"
api_key_env = "OPENROUTER_API_KEY"
default_model = "anthropic/claude-opus-4"
```
OpenRouter speaks OpenAI-compat for every model it fronts, so
`openai-chat` is the right wire format regardless of the
underlying vendor.
### LM Studio (local)
```toml
[[endpoints]]
name = "lmstudio"
base_url = "http://localhost:1234/v1"
wire_api = "openai-chat"
default_model = "auto"
```
LM Studio's "auto" model id picks whatever's loaded. Same shape
works for Ollama in compat mode (`http://localhost:11434/v1`) and
vLLM.
### Multiple cortex deployments
```toml
[[endpoints]]
name = "lan"
base_url = "http://hanzalova.internal:31313/v1"
wire_api = "openai-chat"
default_model = "Qwen/Qwen3.6-27B"
[[endpoints]]
name = "cloud"
base_url = "https://cortex.example.com/v1"
wire_api = "openai-chat"
api_key_env = "CLOUD_CORTEX_KEY"
default_model = "Qwen/Qwen3-VL-8B"
```
Use the `endpoint:model` selector to switch between them mid-session.
## Zed setup
`~/.config/zed/settings.json`:
```jsonc
{
"agent_servers": {
"helexa": {
"command": "helexa-acp"
}
}
}
```
Optional environment overrides for the binary:
```jsonc
{
"agent_servers": {
"helexa": {
"command": "helexa-acp",
"env": {
"HELEXA_ACP_LOG_FILE": "/tmp/helexa-acp.log",
"RUST_LOG": "helexa_acp=debug"
}
}
}
}
```
`HELEXA_ACP_LOG_FILE` is the one you actually want — Zed doesn't
surface the agent's stderr, so without that env var debug output is
invisible. Point it at a file you can `tail -f`.
After restarting Zed: ⌘+? (or wherever your "Open Agent Panel"
binding is) → select "helexa" → the model dropdown populates from
your config → start prompting.
## Modes
Three session modes ship; the user picks via Zed's mode dropdown
on the agent panel.
| Mode | Reads | Writes | Bash | Permission prompts |
|------|-------|--------|------|--------------------|
| **Default** | ✓ | with prompt | with prompt | per call |
| **Bypass Permissions** | ✓ | ✓ | ✓ | never |
| **Plan** | ✓ | only into plan dir | disabled | never (plan-dir writes auto-allow) |
### Default
Reads are always allowed (`read_file`, `list_dir` are
unrestricted). Writes and shell commands prompt the user before
running. The intended baseline for any session where the agent
might do something you'd rather review first.
### Bypass Permissions
Auto-allow every tool call. Use for agentic loops you trust — bulk
edits across many files, scripted workflows, prepared session
templates. Never for code the agent hasn't seen before.
### Plan
The "draft an implementation plan before you write code" mode.
Available tools:
- `read_file`, `list_dir`: unrestricted (read the codebase).
- `write_file`, `edit_file`: allowed *only* under
`$XDG_DATA_HOME/helexa-acp/plans/<project-id>/`. Any path
outside that returns "plan mode: writes are restricted to …"
back to the model so it self-corrects.
- `bash`: disabled outright. Returns "plan mode: shell execution
is disabled" if attempted.
When the plan is complete, the model presents a 3-option menu:
1. **Bypass Permissions** — implement the plan now, no prompts.
2. **Default** — implement now with per-tool prompts.
3. **Plan** (stay here) — refine the plan with more guidance.
Switch the mode dropdown to your preference and reply to proceed.
## Tools
Five tools, defined in `src/tools.rs`:
| Tool | Args | Gated in Default? |
|------|------|-------------------|
| `read_file` | `path`, `line?`, `limit?` | no |
| `list_dir` | `path` | no |
| `write_file` | `path`, `content` | yes |
| `edit_file` | `path`, `old_text`, `new_text` | yes |
| `bash` | `command`, `cwd?` | yes |
### Path handling
`~`, `~/`, `$HOME`, and `$HOME/` are expanded server-side before
the path reaches ACP or local fs. Lets the model emit
`~/git/repo/file.rs` and have it Just Work.
`read_file` first tries the editor's filesystem (ACP's
`fs/read_text_file` — respects open buffers, workspace overlays,
etc.). If that fails — typically because the path is outside Zed's
workspace boundary — it falls back to `std::fs::read_to_string`.
This lets the agent pull in shared material like
`~/git/architecture/generic.md` from a different project's
session.
The fallback is logged at warn level so you can see when it kicks
in.
### Tool dispatch
Tool descriptions reach the model through a Qwen3 Hermes-format
`# Tools` block injected into the system prompt — cortex/neuron
pass the OpenAI `tools` request field through to the encoder
unread, so we work the model into emitting `<tool_call>{json}</tool_call>`
markers it then parses out of the content stream. This applies to
the helexa wire format; OpenAI / Anthropic endpoints with native
tool support would use their own paths once they're wired in.
The parser is tolerant: malformed JSON (trailing braces, missing
`name`, name nested in `arguments`) gets a repair pass; if that
fails the call surfaces as a "Malformed tool call" card in Zed and
the model gets a synthetic error result so it can self-correct.
## Session resume
helexa-acp persists every session to
`$XDG_DATA_HOME/helexa-acp/sessions/<id>.json`. Zed's `session/list`
RPC asks helexa-acp to enumerate them on workspace open;
`session/load` rehydrates and replays the transcript as
`session/update` notifications so the agent panel renders the
prior conversation.
Behaviour:
- Persisted per-round, so a mid-turn agent stall (long bash, wedged
ACP roundtrip) doesn't lose earlier rounds.
- Survives editor restart and the helexa-acp binary upgrading
between versions.
- Project-scoped: only sessions whose `cwd` matches the workspace
are listed.
To wipe history: `rm -rf $XDG_DATA_HOME/helexa-acp/sessions/`.
## Context compaction
When an endpoint sets `context_window`, helexa-acp projects the
rolling history into a token budget before each request — old
`ToolResult` content (read_file payloads are the worst offenders)
gets elided to one-line markers, preserving `tool_call_id` pairing
so the wire schema stays valid.
System prompts, user turns, and the most recent ~4 messages are
never elided. The full history stays on disk; compaction is a
per-request projection, not a destructive edit.
Set `context_window = 32768` for a 32 K Qwen3, `131072` for a
modern Claude, etc. With `max_tokens` also set, the budget is
`context_window - max_tokens - 512_safety`.
## Troubleshooting
### "default endpoint 'helexa' has no usable provider — check config"
The named default endpoint failed to construct. Usually:
- `api_key_env` references a variable that isn't set in the env
Zed launched helexa-acp with.
- The TOML's `wire_api` is misspelled (only `openai-chat`,
`openai-responses`, `anthropic-messages` are accepted).
Test by running `helexa-acp` directly from a shell — startup
errors land on stderr.
### Model dropdown is empty
Each provider's `list_models` failed at startup. Look at
`HELEXA_ACP_LOG_FILE` for "list_models failed; this endpoint's
models won't appear in the picker". Likely the endpoint URL is
wrong, the API key is invalid, or the upstream `/v1/models`
endpoint isn't responding.
The agent still works against `default_model` even when the
dropdown is empty — list-models is for picking, not routing.
### "prompt_too_long" / agent stalls mid-conversation
You hit the model's context window. Set `context_window` on the
endpoint and helexa-acp will compact before sending. The log line
`context compaction applied` confirms it's running; if it fires
but the upstream still rejects, the compaction heuristic
under-counted and the budget needs tuning down.
### Reading files outside the workspace returns "not found"
Zed's `fs/read_text_file` is workspace-scoped. helexa-acp falls
back to local `std::fs` automatically when that fails — look for
`fs/read_text_file failed; falling back to local std::fs` in the
log. If even local read fails, the file genuinely doesn't exist
or the user process lacks permissions.
### Tool calls render as text instead of structured cards
The model is emitting `<tool_call>` markers that the parser can't
decode. Two common causes:
1. The system prompt isn't reaching the model (cortex/neuron's
tool-block injection didn't fire). Confirm with
`RUST_LOG=helexa_acp=debug` and look at the outgoing
`POST /chat/completions` body.
2. The model itself is too small / undertrained to follow the
Hermes format reliably. helexa-acp has shape-based name
inference and JSON repair, but there's a floor below which
nothing helps.
### Plan-mode writes refused even inside the plan dir
The path comparison is byte-for-byte. If the model emits a path
with `~` and the plan_dir has the expanded form, expansion runs
*before* the comparison — but resolved-vs-symlinked-path
mismatches can still bite. The error message names the attempted
path and the expected prefix so you can compare directly.
## Architecture
Source layout under `crates/helexa-acp/src/`:
| File | Responsibility |
|------|----------------|
| `main.rs` | tokio + Stdio transport. Builds providers, hands off to `agent::Agent` |
| `config.rs` | TOML + env-fallback config, endpoint resolver |
| `agent.rs` | ACP handlers (initialize, session/new, session/prompt, session/cancel, session/set_mode, session/set_model, session/load, session/list), prompt loop with tool-call recursion |
| `session.rs` | Per-session state map (Arc<RwLock<HashMap<…>>>) |
| `store.rs` | On-disk session persistence, plan-dir resolution |
| `prompt.rs` | System-prompt assembly, plan-mode addendum |
| `tools.rs` | Tool schemas + shape-based name inference |
| `tool_runner.rs` | Dispatch a single tool call through ACP client RPCs; permission gate |
| `qwen3.rs` | Qwen3 Hermes tool-format parser (`<tool_call>` / `<think>` markers) |
| `compaction.rs` | Token-budget compaction for the rolling history |
| `path_util.rs` | `~` / `$HOME` expansion shared across every path-taking tool |
| `provider/openai_chat.rs` | OpenAI chat completions provider |
| `provider/openai_responses.rs` | OpenAI Responses API provider |
| `provider/anthropic_messages.rs` | Anthropic Messages API provider |
### Adding a new wire format
1. New file under `src/provider/` implementing the `Provider`
trait (encoder + SSE decoder).
2. Add a `WireApi` variant in `config.rs`.
3. Wire it into `build_provider` in `main.rs`.
4. Done — every other module is wire-format-agnostic.
### Concurrency
- `Arc<RwLock<HashMap<SessionId, Arc<Mutex<SessionState>>>>>`
per-session mutex so concurrent requests across sessions don't
contend; the map's RwLock is read-mostly.
- Every tool call dispatched serially within a session (parallel
dispatch would require Zed to handle interleaved permission
prompts).
- Provider streams are back-pressured by the consumer (bounded
mpsc channels).
### Self-contained
The crate has no workspace-internal dependencies (no
`cortex-core`, no `cortex-gateway`). Migration to a dedicated
GitHub repo for cross-platform CI / cargo-dist binaries is
Cargo.toml-only.
## Status
- Stages 16 shipped: scaffold, agent loop, tools, modes, session
resume, image input, model picker, three wire formats.
- Stage 8 (RPM + multi-platform CI) tracked in the canonical plan;
Linux x86_64 RPM ships today via the cortex monorepo's Gitea
Actions.
## Contributing
Repository: https://git.lair.cafe/helexa/cortex (`crates/helexa-acp/`).
Issues / PRs welcome. The canonical staged plan is in
`~/.claude/plans/plan-the-per-device-worker-abstract-micali.md` on
the maintainer's machine; the substages 3a3e and 6a/6b that the
canonical plan didn't anticipate are documented in commit messages.
CI: `cargo fmt --check --all`, `cargo clippy --workspace -- -D
warnings`, `cargo test --workspace` must all pass before merge.