Closes #8. Reasoning-capable models (Qwen3, DeepSeek-R1, gpt-oss, Mistral Magistral, …) emit `<think>...</think>` blocks inline in their content stream. The chat-completions wire format has no slot for reasoning, so until this change every consumer either parsed the markers themselves (helexa-acp) or wrote the raw scratchpad content into their UI (Zed's commit-message generator — visible as the leaked reasoning block on every generated commit message against benjy's Qwen3-8B). ## Implementation, model-agnostic by design The neuron side now does token-level routing without any hardcoded model knowledge: 1. **At load time** (`detect_reasoning_token_pair` in `wire::event`), probe the tokenizer's vocabulary for a known reasoning-marker pair: `<think>` / `</think>` (Qwen3, DeepSeek-R1, gpt-oss), `[THINK]` / `[/THINK]` (Mistral Magistral), and a couple of derivatives. Each marker must resolve to a single token id; if both open and close resolve, stash on `LoadedModel.reasoning_tokens` (similarly `TpLoadedModel`). Non-reasoning models get `None` and pass through unchanged. 2. **At inference time**, the three streaming paths (`run_inference_streaming` CPU, `stream_inference_via_worker` CUDA single-GPU, `chat_completion_tp_stream` CUDA TP) now check each sampled token against the pair via the new `handle_reasoning_marker` helper before feeding it to the detokeniser. Open marker → set `in_reasoning = true`, drop the marker. Close marker → unset, drop. Other tokens go through `emit_delta(_blocking)` which now picks `ReasoningDelta` or `TextDelta` based on state. Markers never appear in the streamed output. 3. **In `wire::openai_chat`**, the projector splits into: - `project_chat_stream` (unchanged signature; default behaviour — drops `ReasoningDelta`) - `project_chat_stream_with(rx, …, ChatProjectionConfig)` — when `include_thinking: true` and `reasoning_markers: Some(_)`, re-wraps reasoning content with the literal open/close marker text and emits as content deltas. Preserves the on-the-wire shape that helexa-acp's `ThinkParser` expects. 4. **HTTP handler** reads `x-include-thinking: true` (case- insensitive `1`/`true`/`yes`) from the request headers and threads it into the projection config. cortex-gateway already forwards arbitrary headers verbatim, so the opt-in works end-to-end without gateway changes. 5. **helexa-acp's `openai_chat` provider** sets `x-include-thinking: true` on every request so its existing `ThinkParser` keeps receiving the marked content stream. `ThinkParser` itself is unchanged — needed for endpoints that aren't reasoning-aware (OpenRouter, OpenAI directly, etc.). ## Acceptance - Zed's commit-message generator (vanilla chat-completions client, no `x-include-thinking`) gets clean commit messages with no `<think>` block. - helexa-acp sessions continue to render thinking in Zed's thought UI via the opt-in path. - Models without reasoning tokens declared in their tokenizer pass through unchanged. - Implementation contains zero references to "qwen3" or any specific model — entirely driven by tokenizer metadata. ## Tests 9 new tests in `wire::event` (token-pair detection across 4 marker conventions, edge cases) and `wire::openai_chat` (default drop, opt-in re-wrap with multi-chunk reasoning, close-marker on Finish, fallback when markers absent, off-switch with markers present). All 213 workspace tests pass; fmt + clippy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
helexa-acp
ACP (Agent Client Protocol) bridge for editors like Zed. Lets you point your editor's agent panel at any combination of OpenAI-compatible, OpenAI Responses, and Anthropic Messages endpoints — public APIs, private LAN deployments, local Ollama / LM Studio — and switch between them per session via a model dropdown.
The "missing ACP binary" for users who don't want to be locked into one vendor's agent client.
┌───────────────────────────────────┐
│ Zed (or any ACP editor client) │
└────────────┬──────────────────────┘
│ stdio JSON-RPC (ACP)
▼
┌─────────────────┐
│ helexa-acp │ ← one binary, multi-endpoint
└─────┬───────────┘
│ HTTP / SSE
┌────────┼─────────────┬──────────────┬──────────────┐
▼ ▼ ▼ ▼ ▼
cortex/ OpenAI Anthropic OpenRouter LM Studio
neuron Responses Messages
(self- (gpt-5,…) (Claude)
hosted)
What it does
- Speaks ACP over stdio to editor clients (Zed today; any future ACP client tomorrow).
- Multi-endpoint — one config file lists every LLM endpoint
you want available; pick one per session via the model dropdown
(
endpoint:modelselector). - Three wire formats:
openai-chat(the broadly compatible default),openai-responses(newer OpenAI surface), andanthropic-messages(Claude). Each is a separate provider impl insrc/provider/; adding a fourth (Gemini, Ollama native, …) is one file plus aWireApienum variant. - Built-in tools:
read_file,write_file,edit_file,list_dir,bash. Permission-gated by default; the editor user approves writes/shell per-call. - Three session modes: Default (gated), Bypass Permissions (auto-allow), and Plan (write-only-to-plan-dir, no shell).
- Vision — drag-drop images into the agent panel against any vision-capable model.
- Session resume — multi-day conversations survive editor restarts via on-disk transcript persistence.
- Context compaction — rolling history stays inside the model's context window automatically so long sessions on small-context local models don't fall over.
Install
From source
git clone https://git.lair.cafe/helexa/cortex.git
cd cortex
cargo install --path crates/helexa-acp
# Binary lands at ~/.cargo/bin/helexa-acp
Pre-built RPM (Fedora 43)
dnf copr enable helexa/helexa
dnf install helexa-acp
The COPR project bundles helexa-acp alongside the cortex gateway and helexa-neuron flavours; install only the package(s) you need.
Quick start
The fastest path: env-var single-endpoint config.
export HELEXA_ACP_BASE_URL=http://hanzalova.internal:31313/v1
export HELEXA_ACP_MODEL=Qwen/Qwen3.6-27B
helexa-acp # speaks ACP over stdin/stdout; not interactive
Then in Zed (~/.config/zed/settings.json):
{
"agent_servers": {
"helexa": {
"command": "helexa-acp",
"args": []
}
}
}
Restart Zed → open the agent panel → pick "helexa" → start chatting. Tool calls (file reads, writes, bash) prompt for permission per-call in Default mode.
That's the minimum. The full config story below is what unlocks the multi-endpoint dropdown.
Multi-endpoint config
Copy helexa-acp.example.toml from this repo to
$XDG_CONFIG_HOME/helexa-acp/config.toml (typically
~/.config/helexa-acp/config.toml) and edit:
default_endpoint = "helexa"
[[endpoints]]
name = "helexa"
base_url = "http://hanzalova.internal:31313/v1"
wire_api = "openai-chat"
default_model = "Qwen/Qwen3.6-27B"
max_tokens = 8192
context_window = 32768
[[endpoints]]
name = "openrouter"
base_url = "https://openrouter.ai/api/v1"
wire_api = "openai-chat"
api_key_env = "OPENROUTER_API_KEY"
default_model = "anthropic/claude-opus-4"
[[endpoints]]
name = "anthropic"
base_url = "https://api.anthropic.com/v1"
wire_api = "anthropic-messages"
api_key_env = "ANTHROPIC_API_KEY"
default_model = "claude-opus-4"
Restart Zed. The model dropdown lists every model from every
configured endpoint with the endpoint:model selector
(helexa:Qwen/Qwen3.6-27B, openrouter:anthropic/claude-opus-4,
…). Switch mid-session; the next prompt routes to the new endpoint.
When only one endpoint is configured the prefix is dropped (model ids appear bare).
Selector syntax
The model field on every internal request is parsed as
<endpoint>:<model>:
openrouter:gpt-4o→ routes to theopenrouterendpoint, modelgpt-4o.helexa/large→ no colon → falls through to whichever endpoint is named indefault_endpoint, modelhelexa/large.:gpt-5→ leading colon → also falls through to default.
Endpoint cookbook
Copy-pasteable blocks. Mix and match.
cortex / neuron (self-hosted)
[[endpoints]]
name = "helexa"
base_url = "http://hanzalova.internal:31313/v1"
wire_api = "openai-chat"
default_model = "Qwen/Qwen3.6-27B"
max_tokens = 8192
context_window = 32768
Use openai-responses instead of openai-chat once cortex 0.1.16+
is deployed and you want the Responses API surface (vision item
shape, structured reasoning items, etc.).
OpenAI directly
[[endpoints]]
name = "openai"
base_url = "https://api.openai.com/v1"
wire_api = "openai-responses"
api_key_env = "OPENAI_API_KEY"
default_model = "gpt-5"
openai-responses is the right choice for current OpenAI models;
openai-chat works against legacy GPT-3.5/4 deployments and
anything labelled "chat completions".
Anthropic directly
[[endpoints]]
name = "anthropic"
base_url = "https://api.anthropic.com/v1"
wire_api = "anthropic-messages"
api_key_env = "ANTHROPIC_API_KEY"
default_model = "claude-opus-4"
helexa-acp sends x-api-key + anthropic-version: 2023-06-01
automatically. The api_key_env indirection keeps your key out of
the config file.
OpenRouter (multi-vendor proxy)
[[endpoints]]
name = "openrouter"
base_url = "https://openrouter.ai/api/v1"
wire_api = "openai-chat"
api_key_env = "OPENROUTER_API_KEY"
default_model = "anthropic/claude-opus-4"
OpenRouter speaks OpenAI-compat for every model it fronts, so
openai-chat is the right wire format regardless of the
underlying vendor.
LM Studio (local)
[[endpoints]]
name = "lmstudio"
base_url = "http://localhost:1234/v1"
wire_api = "openai-chat"
default_model = "auto"
LM Studio's "auto" model id picks whatever's loaded. Same shape
works for Ollama in compat mode (http://localhost:11434/v1) and
vLLM.
Multiple cortex deployments
[[endpoints]]
name = "lan"
base_url = "http://hanzalova.internal:31313/v1"
wire_api = "openai-chat"
default_model = "Qwen/Qwen3.6-27B"
[[endpoints]]
name = "cloud"
base_url = "https://cortex.example.com/v1"
wire_api = "openai-chat"
api_key_env = "CLOUD_CORTEX_KEY"
default_model = "Qwen/Qwen3-VL-8B"
Use the endpoint:model selector to switch between them mid-session.
Zed setup
~/.config/zed/settings.json:
{
"agent_servers": {
"helexa": {
"command": "helexa-acp"
}
}
}
Optional environment overrides for the binary:
{
"agent_servers": {
"helexa": {
"command": "helexa-acp",
"env": {
"HELEXA_ACP_LOG_FILE": "/tmp/helexa-acp.log",
"RUST_LOG": "helexa_acp=debug"
}
}
}
}
HELEXA_ACP_LOG_FILE is the one you actually want — Zed doesn't
surface the agent's stderr, so without that env var debug output is
invisible. Point it at a file you can tail -f.
After restarting Zed: ⌘+? (or wherever your "Open Agent Panel" binding is) → select "helexa" → the model dropdown populates from your config → start prompting.
Modes
Three session modes ship; the user picks via Zed's mode dropdown on the agent panel.
| Mode | Reads | Writes | Bash | Permission prompts |
|---|---|---|---|---|
| Default | ✓ | with prompt | with prompt | per call |
| Bypass Permissions | ✓ | ✓ | ✓ | never |
| Plan | ✓ | only into plan dir | disabled | never (plan-dir writes auto-allow) |
Default
Reads are always allowed (read_file, list_dir are
unrestricted). Writes and shell commands prompt the user before
running. The intended baseline for any session where the agent
might do something you'd rather review first.
Bypass Permissions
Auto-allow every tool call. Use for agentic loops you trust — bulk edits across many files, scripted workflows, prepared session templates. Never for code the agent hasn't seen before.
Plan
The "draft an implementation plan before you write code" mode. Available tools:
read_file,list_dir: unrestricted (read the codebase).write_file,edit_file: allowed only under$XDG_DATA_HOME/helexa-acp/plans/<project-id>/. Any path outside that returns "plan mode: writes are restricted to …" back to the model so it self-corrects.bash: disabled outright. Returns "plan mode: shell execution is disabled" if attempted.
When the plan is complete, the model presents a 3-option menu:
- Bypass Permissions — implement the plan now, no prompts.
- Default — implement now with per-tool prompts.
- Plan (stay here) — refine the plan with more guidance.
Switch the mode dropdown to your preference and reply to proceed.
Tools
Five tools, defined in src/tools.rs:
| Tool | Args | Gated in Default? |
|---|---|---|
read_file |
path, line?, limit? |
no |
list_dir |
path |
no |
write_file |
path, content |
yes |
edit_file |
path, old_text, new_text |
yes |
bash |
command, cwd? |
yes |
Path handling
~, ~/, $HOME, and $HOME/ are expanded server-side before
the path reaches ACP or local fs. Lets the model emit
~/git/repo/file.rs and have it Just Work.
read_file first tries the editor's filesystem (ACP's
fs/read_text_file — respects open buffers, workspace overlays,
etc.). If that fails — typically because the path is outside Zed's
workspace boundary — it falls back to std::fs::read_to_string.
This lets the agent pull in shared material like
~/git/architecture/generic.md from a different project's
session.
The fallback is logged at warn level so you can see when it kicks in.
Tool dispatch
Tool descriptions reach the model through a Qwen3 Hermes-format
# Tools block injected into the system prompt — cortex/neuron
pass the OpenAI tools request field through to the encoder
unread, so we work the model into emitting <tool_call>{json}</tool_call>
markers it then parses out of the content stream. This applies to
the helexa wire format; OpenAI / Anthropic endpoints with native
tool support would use their own paths once they're wired in.
The parser is tolerant: malformed JSON (trailing braces, missing
name, name nested in arguments) gets a repair pass; if that
fails the call surfaces as a "Malformed tool call" card in Zed and
the model gets a synthetic error result so it can self-correct.
Session resume
helexa-acp persists every session to
$XDG_DATA_HOME/helexa-acp/sessions/<id>.json. Zed's session/list
RPC asks helexa-acp to enumerate them on workspace open;
session/load rehydrates and replays the transcript as
session/update notifications so the agent panel renders the
prior conversation.
Behaviour:
- Persisted per-round, so a mid-turn agent stall (long bash, wedged ACP roundtrip) doesn't lose earlier rounds.
- Survives editor restart and the helexa-acp binary upgrading between versions.
- Project-scoped: only sessions whose
cwdmatches the workspace are listed.
To wipe history: rm -rf $XDG_DATA_HOME/helexa-acp/sessions/.
Context compaction
When an endpoint sets context_window, helexa-acp projects the
rolling history into a token budget before each request — old
ToolResult content (read_file payloads are the worst offenders)
gets elided to one-line markers, preserving tool_call_id pairing
so the wire schema stays valid.
System prompts, user turns, and the most recent ~4 messages are never elided. The full history stays on disk; compaction is a per-request projection, not a destructive edit.
Set context_window = 32768 for a 32 K Qwen3, 131072 for a
modern Claude, etc. With max_tokens also set, the budget is
context_window - max_tokens - 512_safety.
Troubleshooting
"default endpoint 'helexa' has no usable provider — check config"
The named default endpoint failed to construct. Usually:
api_key_envreferences a variable that isn't set in the env Zed launched helexa-acp with.- The TOML's
wire_apiis misspelled (onlyopenai-chat,openai-responses,anthropic-messagesare accepted).
Test by running helexa-acp directly from a shell — startup
errors land on stderr.
Model dropdown is empty
Each provider's list_models failed at startup. Look at
HELEXA_ACP_LOG_FILE for "list_models failed; this endpoint's
models won't appear in the picker". Likely the endpoint URL is
wrong, the API key is invalid, or the upstream /v1/models
endpoint isn't responding.
The agent still works against default_model even when the
dropdown is empty — list-models is for picking, not routing.
"prompt_too_long" / agent stalls mid-conversation
You hit the model's context window. Set context_window on the
endpoint and helexa-acp will compact before sending. The log line
context compaction applied confirms it's running; if it fires
but the upstream still rejects, the compaction heuristic
under-counted and the budget needs tuning down.
Reading files outside the workspace returns "not found"
Zed's fs/read_text_file is workspace-scoped. helexa-acp falls
back to local std::fs automatically when that fails — look for
fs/read_text_file failed; falling back to local std::fs in the
log. If even local read fails, the file genuinely doesn't exist
or the user process lacks permissions.
Tool calls render as text instead of structured cards
The model is emitting <tool_call> markers that the parser can't
decode. Two common causes:
- The system prompt isn't reaching the model (cortex/neuron's
tool-block injection didn't fire). Confirm with
RUST_LOG=helexa_acp=debugand look at the outgoingPOST /chat/completionsbody. - The model itself is too small / undertrained to follow the Hermes format reliably. helexa-acp has shape-based name inference and JSON repair, but there's a floor below which nothing helps.
Plan-mode writes refused even inside the plan dir
The path comparison is byte-for-byte. If the model emits a path
with ~ and the plan_dir has the expanded form, expansion runs
before the comparison — but resolved-vs-symlinked-path
mismatches can still bite. The error message names the attempted
path and the expected prefix so you can compare directly.
Architecture
Source layout under crates/helexa-acp/src/:
| File | Responsibility |
|---|---|
main.rs |
tokio + Stdio transport. Builds providers, hands off to agent::Agent |
config.rs |
TOML + env-fallback config, endpoint resolver |
agent.rs |
ACP handlers (initialize, session/new, session/prompt, session/cancel, session/set_mode, session/set_model, session/load, session/list), prompt loop with tool-call recursion |
session.rs |
Per-session state map (Arc<RwLock<HashMap<…>>>) |
store.rs |
On-disk session persistence, plan-dir resolution |
prompt.rs |
System-prompt assembly, plan-mode addendum |
tools.rs |
Tool schemas + shape-based name inference |
tool_runner.rs |
Dispatch a single tool call through ACP client RPCs; permission gate |
qwen3.rs |
Qwen3 Hermes tool-format parser (<tool_call> / <think> markers) |
compaction.rs |
Token-budget compaction for the rolling history |
path_util.rs |
~ / $HOME expansion shared across every path-taking tool |
provider/openai_chat.rs |
OpenAI chat completions provider |
provider/openai_responses.rs |
OpenAI Responses API provider |
provider/anthropic_messages.rs |
Anthropic Messages API provider |
Adding a new wire format
- New file under
src/provider/implementing theProvidertrait (encoder + SSE decoder). - Add a
WireApivariant inconfig.rs. - Wire it into
build_providerinmain.rs. - Done — every other module is wire-format-agnostic.
Concurrency
Arc<RwLock<HashMap<SessionId, Arc<Mutex<SessionState>>>>>— per-session mutex so concurrent requests across sessions don't contend; the map's RwLock is read-mostly.- Every tool call dispatched serially within a session (parallel dispatch would require Zed to handle interleaved permission prompts).
- Provider streams are back-pressured by the consumer (bounded mpsc channels).
Self-contained
The crate has no workspace-internal dependencies (no
cortex-core, no cortex-gateway). Migration to a dedicated
GitHub repo for cross-platform CI / cargo-dist binaries is
Cargo.toml-only.
Status
- Stages 1–6 shipped: scaffold, agent loop, tools, modes, session resume, image input, model picker, three wire formats.
- Stage 8 (RPM + multi-platform CI) tracked in the canonical plan; Linux x86_64 RPM ships today via the cortex monorepo's Gitea Actions.
Contributing
Repository: https://git.lair.cafe/helexa/cortex (crates/helexa-acp/).
Issues / PRs welcome. The canonical staged plan is in
~/.claude/plans/plan-the-per-device-worker-abstract-micali.md on
the maintainer's machine; the substages 3a–3e and 6a/6b that the
canonical plan didn't anticipate are documented in commit messages.
CI: cargo fmt --check --all, cargo clippy --workspace -- -D warnings, cargo test --workspace must all pass before merge.