helexa/cortex

Fork 0

Files

History

rob thijssen 7733eecba5

CI / CUDA type-check (push) Failing after 18s

Details

build-prerelease / Resolve version stamps (push) Successful in 32s

Details

CI / Format (push) Successful in 32s

Details

CI / Clippy (push) Successful in 2m36s

Details

build-prerelease / Build cortex binary (push) Successful in 4m29s

Details

CI / Test (push) Successful in 5m19s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

build-prerelease / Build neuron-blackwell (push) Successful in 5m56s

Details

build-prerelease / Package cortex RPM (push) Successful in 1m21s

Details

build-prerelease / Build neuron-ampere (push) Successful in 7m45s

Details

build-prerelease / Build neuron-ada (push) Successful in 5m24s

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m53s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m0s

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m43s

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s

Details

feat(neuron): strip reasoning from chat completions by default

Closes #8.

Reasoning-capable models (Qwen3, DeepSeek-R1, gpt-oss, Mistral
Magistral, …) emit `<think>...</think>` blocks inline in their
content stream. The chat-completions wire format has no slot for
reasoning, so until this change every consumer either parsed the
markers themselves (helexa-acp) or wrote the raw scratchpad
content into their UI (Zed's commit-message generator — visible
as the leaked reasoning block on every generated commit message
against benjy's Qwen3-8B).

## Implementation, model-agnostic by design

The neuron side now does token-level routing without any
hardcoded model knowledge:

1. **At load time** (`detect_reasoning_token_pair` in
   `wire::event`), probe the tokenizer's vocabulary for a known
   reasoning-marker pair: `<think>` / `</think>` (Qwen3,
   DeepSeek-R1, gpt-oss), `[THINK]` / `[/THINK]` (Mistral
   Magistral), and a couple of derivatives. Each marker must
   resolve to a single token id; if both open and close resolve,
   stash on `LoadedModel.reasoning_tokens` (similarly
   `TpLoadedModel`). Non-reasoning models get `None` and pass
   through unchanged.

2. **At inference time**, the three streaming paths
   (`run_inference_streaming` CPU, `stream_inference_via_worker`
   CUDA single-GPU, `chat_completion_tp_stream` CUDA TP) now
   check each sampled token against the pair via the new
   `handle_reasoning_marker` helper before feeding it to the
   detokeniser. Open marker → set `in_reasoning = true`, drop
   the marker. Close marker → unset, drop. Other tokens go
   through `emit_delta(_blocking)` which now picks
   `ReasoningDelta` or `TextDelta` based on state. Markers
   never appear in the streamed output.

3. **In `wire::openai_chat`**, the projector splits into:
   - `project_chat_stream` (unchanged signature; default
     behaviour — drops `ReasoningDelta`)
   - `project_chat_stream_with(rx, …, ChatProjectionConfig)` —
     when `include_thinking: true` and `reasoning_markers:
     Some(_)`, re-wraps reasoning content with the literal
     open/close marker text and emits as content deltas.
     Preserves the on-the-wire shape that helexa-acp's
     `ThinkParser` expects.

4. **HTTP handler** reads `x-include-thinking: true` (case-
   insensitive `1`/`true`/`yes`) from the request headers and
   threads it into the projection config. cortex-gateway already
   forwards arbitrary headers verbatim, so the opt-in works
   end-to-end without gateway changes.

5. **helexa-acp's `openai_chat` provider** sets
   `x-include-thinking: true` on every request so its existing
   `ThinkParser` keeps receiving the marked content stream.
   `ThinkParser` itself is unchanged — needed for endpoints that
   aren't reasoning-aware (OpenRouter, OpenAI directly, etc.).

## Acceptance

- Zed's commit-message generator (vanilla chat-completions
  client, no `x-include-thinking`) gets clean commit messages
  with no `<think>` block.
- helexa-acp sessions continue to render thinking in Zed's
  thought UI via the opt-in path.
- Models without reasoning tokens declared in their tokenizer
  pass through unchanged.
- Implementation contains zero references to "qwen3" or any
  specific model — entirely driven by tokenizer metadata.

## Tests

9 new tests in `wire::event` (token-pair detection across 4
marker conventions, edge cases) and `wire::openai_chat` (default
drop, opt-in re-wrap with multi-chunk reasoning, close-marker on
Finish, fallback when markers absent, off-switch with markers
present). All 213 workspace tests pass; fmt + clippy clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-31 17:55:04 +03:00

src

feat(neuron): strip reasoning from chat completions by default

2026-05-31 17:55:04 +03:00

Cargo.toml

feat(helexa-acp): model picker + session/set_model handler

2026-05-29 09:10:16 +03:00

README.md

docs(helexa-acp): README + example config for end-user onboarding

2026-05-31 14:25:56 +03:00

README.md

helexa-acp

ACP (Agent Client Protocol) bridge for editors like Zed. Lets you point your editor's agent panel at any combination of OpenAI-compatible, OpenAI Responses, and Anthropic Messages endpoints — public APIs, private LAN deployments, local Ollama / LM Studio — and switch between them per session via a model dropdown.

The "missing ACP binary" for users who don't want to be locked into one vendor's agent client.

       ┌───────────────────────────────────┐
       │  Zed (or any ACP editor client)   │
       └────────────┬──────────────────────┘
                    │  stdio JSON-RPC (ACP)
                    ▼
            ┌─────────────────┐
            │   helexa-acp    │  ← one binary, multi-endpoint
            └─────┬───────────┘
                  │  HTTP / SSE
         ┌────────┼─────────────┬──────────────┬──────────────┐
         ▼        ▼             ▼              ▼              ▼
    cortex/    OpenAI       Anthropic      OpenRouter    LM Studio
    neuron    Responses    Messages
   (self-     (gpt-5,…)    (Claude)
    hosted)

What it does

Speaks ACP over stdio to editor clients (Zed today; any future ACP client tomorrow).
Multi-endpoint — one config file lists every LLM endpoint you want available; pick one per session via the model dropdown (endpoint:model selector).
Three wire formats: openai-chat (the broadly compatible default), openai-responses (newer OpenAI surface), and anthropic-messages (Claude). Each is a separate provider impl in src/provider/; adding a fourth (Gemini, Ollama native, …) is one file plus a WireApi enum variant.
Built-in tools: read_file, write_file, edit_file, list_dir, bash. Permission-gated by default; the editor user approves writes/shell per-call.
Three session modes: Default (gated), Bypass Permissions (auto-allow), and Plan (write-only-to-plan-dir, no shell).
Vision — drag-drop images into the agent panel against any vision-capable model.
Session resume — multi-day conversations survive editor restarts via on-disk transcript persistence.
Context compaction — rolling history stays inside the model's context window automatically so long sessions on small-context local models don't fall over.

Install

From source

git clone https://git.lair.cafe/helexa/cortex.git
cd cortex
cargo install --path crates/helexa-acp
# Binary lands at ~/.cargo/bin/helexa-acp

Pre-built RPM (Fedora 43)

dnf copr enable helexa/helexa
dnf install helexa-acp

The COPR project bundles helexa-acp alongside the cortex gateway and helexa-neuron flavours; install only the package(s) you need.

Quick start

The fastest path: env-var single-endpoint config.

export HELEXA_ACP_BASE_URL=http://hanzalova.internal:31313/v1
export HELEXA_ACP_MODEL=Qwen/Qwen3.6-27B
helexa-acp  # speaks ACP over stdin/stdout; not interactive

Then in Zed (~/.config/zed/settings.json):

{
  "agent_servers": {
    "helexa": {
      "command": "helexa-acp",
      "args": []
    }
  }
}

Restart Zed → open the agent panel → pick "helexa" → start chatting. Tool calls (file reads, writes, bash) prompt for permission per-call in Default mode.

That's the minimum. The full config story below is what unlocks the multi-endpoint dropdown.

Multi-endpoint config

Copy helexa-acp.example.toml from this repo to $XDG_CONFIG_HOME/helexa-acp/config.toml (typically ~/.config/helexa-acp/config.toml) and edit:

default_endpoint = "helexa"

[[endpoints]]
name = "helexa"
base_url = "http://hanzalova.internal:31313/v1"
wire_api = "openai-chat"
default_model = "Qwen/Qwen3.6-27B"
max_tokens = 8192
context_window = 32768

[[endpoints]]
name = "openrouter"
base_url = "https://openrouter.ai/api/v1"
wire_api = "openai-chat"
api_key_env = "OPENROUTER_API_KEY"
default_model = "anthropic/claude-opus-4"

[[endpoints]]
name = "anthropic"
base_url = "https://api.anthropic.com/v1"
wire_api = "anthropic-messages"
api_key_env = "ANTHROPIC_API_KEY"
default_model = "claude-opus-4"

Restart Zed. The model dropdown lists every model from every configured endpoint with the endpoint:model selector (helexa:Qwen/Qwen3.6-27B, openrouter:anthropic/claude-opus-4, …). Switch mid-session; the next prompt routes to the new endpoint.

When only one endpoint is configured the prefix is dropped (model ids appear bare).

Selector syntax

The model field on every internal request is parsed as <endpoint>:<model>:

openrouter:gpt-4o → routes to the openrouter endpoint, model gpt-4o.
helexa/large → no colon → falls through to whichever endpoint is named in default_endpoint, model helexa/large.
:gpt-5 → leading colon → also falls through to default.

Endpoint cookbook

Copy-pasteable blocks. Mix and match.

cortex / neuron (self-hosted)

[[endpoints]]
name = "helexa"
base_url = "http://hanzalova.internal:31313/v1"
wire_api = "openai-chat"
default_model = "Qwen/Qwen3.6-27B"
max_tokens = 8192
context_window = 32768

Use openai-responses instead of openai-chat once cortex 0.1.16+ is deployed and you want the Responses API surface (vision item shape, structured reasoning items, etc.).

OpenAI directly

[[endpoints]]
name = "openai"
base_url = "https://api.openai.com/v1"
wire_api = "openai-responses"
api_key_env = "OPENAI_API_KEY"
default_model = "gpt-5"

openai-responses is the right choice for current OpenAI models; openai-chat works against legacy GPT-3.5/4 deployments and anything labelled "chat completions".

Anthropic directly

[[endpoints]]
name = "anthropic"
base_url = "https://api.anthropic.com/v1"
wire_api = "anthropic-messages"
api_key_env = "ANTHROPIC_API_KEY"
default_model = "claude-opus-4"

helexa-acp sends x-api-key + anthropic-version: 2023-06-01 automatically. The api_key_env indirection keeps your key out of the config file.

OpenRouter (multi-vendor proxy)

[[endpoints]]
name = "openrouter"
base_url = "https://openrouter.ai/api/v1"
wire_api = "openai-chat"
api_key_env = "OPENROUTER_API_KEY"
default_model = "anthropic/claude-opus-4"

OpenRouter speaks OpenAI-compat for every model it fronts, so openai-chat is the right wire format regardless of the underlying vendor.

LM Studio (local)

[[endpoints]]
name = "lmstudio"
base_url = "http://localhost:1234/v1"
wire_api = "openai-chat"
default_model = "auto"

LM Studio's "auto" model id picks whatever's loaded. Same shape works for Ollama in compat mode (http://localhost:11434/v1) and vLLM.

Multiple cortex deployments

[[endpoints]]
name = "lan"
base_url = "http://hanzalova.internal:31313/v1"
wire_api = "openai-chat"
default_model = "Qwen/Qwen3.6-27B"

[[endpoints]]
name = "cloud"
base_url = "https://cortex.example.com/v1"
wire_api = "openai-chat"
api_key_env = "CLOUD_CORTEX_KEY"
default_model = "Qwen/Qwen3-VL-8B"

Use the endpoint:model selector to switch between them mid-session.

Zed setup

~/.config/zed/settings.json:

{
  "agent_servers": {
    "helexa": {
      "command": "helexa-acp"
    }
  }
}

Optional environment overrides for the binary:

{
  "agent_servers": {
    "helexa": {
      "command": "helexa-acp",
      "env": {
        "HELEXA_ACP_LOG_FILE": "/tmp/helexa-acp.log",
        "RUST_LOG": "helexa_acp=debug"
      }
    }
  }
}

HELEXA_ACP_LOG_FILE is the one you actually want — Zed doesn't surface the agent's stderr, so without that env var debug output is invisible. Point it at a file you can tail -f.

After restarting Zed: ⌘+? (or wherever your "Open Agent Panel" binding is) → select "helexa" → the model dropdown populates from your config → start prompting.

Modes

Three session modes ship; the user picks via Zed's mode dropdown on the agent panel.

Mode	Reads	Writes	Bash	Permission prompts
Default	✓	with prompt	with prompt	per call
Bypass Permissions	✓	✓	✓	never
Plan	✓	only into plan dir	disabled	never (plan-dir writes auto-allow)

Default

Reads are always allowed (read_file, list_dir are unrestricted). Writes and shell commands prompt the user before running. The intended baseline for any session where the agent might do something you'd rather review first.

Bypass Permissions

Auto-allow every tool call. Use for agentic loops you trust — bulk edits across many files, scripted workflows, prepared session templates. Never for code the agent hasn't seen before.

Plan

The "draft an implementation plan before you write code" mode. Available tools:

read_file, list_dir: unrestricted (read the codebase).
write_file, edit_file: allowed only under $XDG_DATA_HOME/helexa-acp/plans/<project-id>/. Any path outside that returns "plan mode: writes are restricted to …" back to the model so it self-corrects.
bash: disabled outright. Returns "plan mode: shell execution is disabled" if attempted.

When the plan is complete, the model presents a 3-option menu:

Bypass Permissions — implement the plan now, no prompts.
Default — implement now with per-tool prompts.
Plan (stay here) — refine the plan with more guidance.

Switch the mode dropdown to your preference and reply to proceed.

Tools

Five tools, defined in src/tools.rs:

Tool	Args	Gated in Default?
`read_file`	`path`, `line?`, `limit?`	no
`list_dir`	`path`	no
`write_file`	`path`, `content`	yes
`edit_file`	`path`, `old_text`, `new_text`	yes
`bash`	`command`, `cwd?`	yes

Path handling

~, ~/, $HOME, and $HOME/ are expanded server-side before the path reaches ACP or local fs. Lets the model emit ~/git/repo/file.rs and have it Just Work.

read_file first tries the editor's filesystem (ACP's fs/read_text_file — respects open buffers, workspace overlays, etc.). If that fails — typically because the path is outside Zed's workspace boundary — it falls back to std::fs::read_to_string. This lets the agent pull in shared material like ~/git/architecture/generic.md from a different project's session.

The fallback is logged at warn level so you can see when it kicks in.

Tool dispatch

Tool descriptions reach the model through a Qwen3 Hermes-format # Tools block injected into the system prompt — cortex/neuron pass the OpenAI tools request field through to the encoder unread, so we work the model into emitting <tool_call>{json}</tool_call> markers it then parses out of the content stream. This applies to the helexa wire format; OpenAI / Anthropic endpoints with native tool support would use their own paths once they're wired in.

The parser is tolerant: malformed JSON (trailing braces, missing name, name nested in arguments) gets a repair pass; if that fails the call surfaces as a "Malformed tool call" card in Zed and the model gets a synthetic error result so it can self-correct.

Session resume

helexa-acp persists every session to $XDG_DATA_HOME/helexa-acp/sessions/<id>.json. Zed's session/list RPC asks helexa-acp to enumerate them on workspace open; session/load rehydrates and replays the transcript as session/update notifications so the agent panel renders the prior conversation.

Behaviour:

Persisted per-round, so a mid-turn agent stall (long bash, wedged ACP roundtrip) doesn't lose earlier rounds.
Survives editor restart and the helexa-acp binary upgrading between versions.
Project-scoped: only sessions whose cwd matches the workspace are listed.

To wipe history: rm -rf $XDG_DATA_HOME/helexa-acp/sessions/.

Context compaction

When an endpoint sets context_window, helexa-acp projects the rolling history into a token budget before each request — old ToolResult content (read_file payloads are the worst offenders) gets elided to one-line markers, preserving tool_call_id pairing so the wire schema stays valid.

System prompts, user turns, and the most recent ~4 messages are never elided. The full history stays on disk; compaction is a per-request projection, not a destructive edit.

Set context_window = 32768 for a 32 K Qwen3, 131072 for a modern Claude, etc. With max_tokens also set, the budget is context_window - max_tokens - 512_safety.

Troubleshooting

"default endpoint 'helexa' has no usable provider — check config"

The named default endpoint failed to construct. Usually:

api_key_env references a variable that isn't set in the env Zed launched helexa-acp with.
The TOML's wire_api is misspelled (only openai-chat, openai-responses, anthropic-messages are accepted).

Test by running helexa-acp directly from a shell — startup errors land on stderr.

Each provider's list_models failed at startup. Look at HELEXA_ACP_LOG_FILE for "list_models failed; this endpoint's models won't appear in the picker". Likely the endpoint URL is wrong, the API key is invalid, or the upstream /v1/models endpoint isn't responding.

The agent still works against default_model even when the dropdown is empty — list-models is for picking, not routing.

"prompt_too_long" / agent stalls mid-conversation

You hit the model's context window. Set context_window on the endpoint and helexa-acp will compact before sending. The log line context compaction applied confirms it's running; if it fires but the upstream still rejects, the compaction heuristic under-counted and the budget needs tuning down.

Reading files outside the workspace returns "not found"

Zed's fs/read_text_file is workspace-scoped. helexa-acp falls back to local std::fs automatically when that fails — look for fs/read_text_file failed; falling back to local std::fs in the log. If even local read fails, the file genuinely doesn't exist or the user process lacks permissions.

Tool calls render as text instead of structured cards

The model is emitting <tool_call> markers that the parser can't decode. Two common causes:

The system prompt isn't reaching the model (cortex/neuron's tool-block injection didn't fire). Confirm with RUST_LOG=helexa_acp=debug and look at the outgoing POST /chat/completions body.
The model itself is too small / undertrained to follow the Hermes format reliably. helexa-acp has shape-based name inference and JSON repair, but there's a floor below which nothing helps.

Plan-mode writes refused even inside the plan dir

The path comparison is byte-for-byte. If the model emits a path with ~ and the plan_dir has the expanded form, expansion runs before the comparison — but resolved-vs-symlinked-path mismatches can still bite. The error message names the attempted path and the expected prefix so you can compare directly.

Architecture

Source layout under crates/helexa-acp/src/:

File	Responsibility
`main.rs`	tokio + Stdio transport. Builds providers, hands off to `agent::Agent`
`config.rs`	TOML + env-fallback config, endpoint resolver
`agent.rs`	ACP handlers (initialize, session/new, session/prompt, session/cancel, session/set_mode, session/set_model, session/load, session/list), prompt loop with tool-call recursion
`session.rs`	Per-session state map (Arc<RwLock<HashMap<…>>>)
`store.rs`	On-disk session persistence, plan-dir resolution
`prompt.rs`	System-prompt assembly, plan-mode addendum
`tools.rs`	Tool schemas + shape-based name inference
`tool_runner.rs`	Dispatch a single tool call through ACP client RPCs; permission gate
`qwen3.rs`	Qwen3 Hermes tool-format parser (`<tool_call>` / `<think>` markers)
`compaction.rs`	Token-budget compaction for the rolling history
`path_util.rs`	`~` / `$HOME` expansion shared across every path-taking tool
`provider/openai_chat.rs`	OpenAI chat completions provider
`provider/openai_responses.rs`	OpenAI Responses API provider
`provider/anthropic_messages.rs`	Anthropic Messages API provider

Adding a new wire format

New file under src/provider/ implementing the Provider trait (encoder + SSE decoder).
Add a WireApi variant in config.rs.
Wire it into build_provider in main.rs.
Done — every other module is wire-format-agnostic.

Concurrency

Arc<RwLock<HashMap<SessionId, Arc<Mutex<SessionState>>>>> — per-session mutex so concurrent requests across sessions don't contend; the map's RwLock is read-mostly.
Every tool call dispatched serially within a session (parallel dispatch would require Zed to handle interleaved permission prompts).
Provider streams are back-pressured by the consumer (bounded mpsc channels).

Self-contained

The crate has no workspace-internal dependencies (no cortex-core, no cortex-gateway). Migration to a dedicated GitHub repo for cross-platform CI / cargo-dist binaries is Cargo.toml-only.

Status

Stages 1–6 shipped: scaffold, agent loop, tools, modes, session resume, image input, model picker, three wire formats.
Stage 8 (RPM + multi-platform CI) tracked in the canonical plan; Linux x86_64 RPM ships today via the cortex monorepo's Gitea Actions.

Contributing

Repository: https://git.lair.cafe/helexa/cortex (crates/helexa-acp/). Issues / PRs welcome. The canonical staged plan is in ~/.claude/plans/plan-the-per-device-worker-abstract-micali.md on the maintainer's machine; the substages 3a–3e and 6a/6b that the canonical plan didn't anticipate are documented in commit messages.

CI: cargo fmt --check --all, cargo clippy --workspace -- -D warnings, cargo test --workspace must all pass before merge.

README.md Unescape Escape

helexa-acp

What it does

Install

From source

Pre-built RPM (Fedora 43)

Quick start

Multi-endpoint config

Selector syntax

Endpoint cookbook

cortex / neuron (self-hosted)

OpenAI directly

Anthropic directly

OpenRouter (multi-vendor proxy)

LM Studio (local)

Multiple cortex deployments

Zed setup

Modes

Default

Bypass Permissions

Plan

Tools

Path handling

Tool dispatch

Session resume

Context compaction

Troubleshooting

"default endpoint 'helexa' has no usable provider — check config"

Model dropdown is empty

"prompt_too_long" / agent stalls mid-conversation

Reading files outside the workspace returns "not found"

Tool calls render as text instead of structured cards

Plan-mode writes refused even inside the plan dir

Architecture

Adding a new wire format

Concurrency

Self-contained

Status

Contributing

README.md