Responses API: synthesise function_call output items from tool-call deltas #6

Closed
opened 2026-05-31 08:19:16 +00:00 by grenade · 0 comments
Owner

Scope cut from Step 2 (commit 957f704)

The Responses projector currently emits exactly one output item per response — a single message containing text. function_call items are defined on ResponsesOutputItem (crates/cortex-core/src/responses.rs) but the projector never synthesises them.

Two reasons:

  1. The candle harness doesn't extract tool calls today. Qwen3 emits <tool_call>{json}</tool_call> blocks inline in its content stream; the parsing happens in helexa-acp (crates/helexa-acp/src/qwen3.rs::ToolCallParser) rather than in neuron.
  2. InferenceEvent::ToolCallStart and InferenceEvent::ToolCallArgsDelta are defined on the event enum (crates/neuron/src/wire/event.rs) but never produced.

Compare to how the OpenAI chat projector handles this: it also doesn't split tool calls from content. Today both projectors expose the raw <tool_call> text and rely on every consumer to parse it.

Why it was cut

Tool-call extraction touches the inference inner loop in three sites (CPU, CUDA single-GPU, CUDA TP) the same way reasoning extraction does. We didn't want to bundle that with Step 1's refactor or Step 2's new surface.

What implementation looks like

  1. Tool-call tag parser — same shape as the proposed <think> parser in #5 but for <tool_call>{json}</tool_call> blocks. Emits InferenceEvent::ToolCallStart on the open tag, ToolCallArgsDelta for the body, and finalises when the close tag arrives.
  2. JSON repair — the parser must handle the same Qwen3 misemissions helexa-acp already deals with (trailing braces, name-nested-in-arguments, shape inference). Lift those repairs from helexa-acp/src/qwen3.rs and helexa-acp/src/tools.rs::infer_tool_name into neuron::wire.
  3. Chat projector extension — emit OpenAI's delta: { tool_calls: [{ index, id, function: { name, arguments } }] } shape on each ToolCall* event, plus the final finish_reason: "tool_calls" chunk.
  4. Responses projector extension — emit:
    • response.output_item.added with a new function_call item (after the message item or instead of it depending on order).
    • response.function_call_arguments.delta for each args delta.
    • response.function_call_arguments.done when the call closes.
    • response.output_item.done for the function_call item.
  5. finish_reason mappingFinishReason::ToolCalls is already defined; threads through to tool_calls on chat and the appropriate Responses status.

Acceptance

  • A prompt that triggers Qwen3 tool use produces structured ToolCallStart + ToolCallArgsDelta events on the InferenceEvent stream, with the model's <tool_call>{json}</tool_call> markers consumed (not surfaced as text).
  • Chat completions clients see the OpenAI tool_calls delta shape.
  • Responses clients see response.function_call_arguments.* events.
  • helexa-acp's ToolCallParser becomes redundant for cortex-backed sessions.

Tracking

Blocks: Responses-API-driven tool use against neuron (helexa-acp's openai-responses provider can't get function-call output items today). Coupled with #5 — both are the same harness-level extraction work, just for different tags. May want to land them together.

## Scope cut from Step 2 (commit [`957f704`](https://git.lair.cafe/helexa/cortex/commit/957f704)) The Responses projector currently emits exactly one output item per response — a single `message` containing text. `function_call` items are defined on `ResponsesOutputItem` (`crates/cortex-core/src/responses.rs`) but the projector never synthesises them. Two reasons: 1. The candle harness doesn't extract tool calls today. Qwen3 emits `<tool_call>{json}</tool_call>` blocks inline in its content stream; the parsing happens in helexa-acp (`crates/helexa-acp/src/qwen3.rs::ToolCallParser`) rather than in neuron. 2. `InferenceEvent::ToolCallStart` and `InferenceEvent::ToolCallArgsDelta` are *defined* on the event enum (`crates/neuron/src/wire/event.rs`) but never produced. Compare to how the OpenAI chat projector handles this: it also doesn't split tool calls from content. Today both projectors expose the raw `<tool_call>` text and rely on every consumer to parse it. ## Why it was cut Tool-call extraction touches the inference inner loop in three sites (CPU, CUDA single-GPU, CUDA TP) the same way reasoning extraction does. We didn't want to bundle that with Step 1's refactor or Step 2's new surface. ## What implementation looks like 1. **Tool-call tag parser** — same shape as the proposed `<think>` parser in #5 but for `<tool_call>{json}</tool_call>` blocks. Emits `InferenceEvent::ToolCallStart` on the open tag, `ToolCallArgsDelta` for the body, and finalises when the close tag arrives. 2. **JSON repair** — the parser must handle the same Qwen3 misemissions helexa-acp already deals with (trailing braces, name-nested-in-arguments, shape inference). Lift those repairs from `helexa-acp/src/qwen3.rs` and `helexa-acp/src/tools.rs::infer_tool_name` into `neuron::wire`. 3. **Chat projector extension** — emit OpenAI's `delta: { tool_calls: [{ index, id, function: { name, arguments } }] }` shape on each ToolCall* event, plus the final `finish_reason: "tool_calls"` chunk. 4. **Responses projector extension** — emit: - `response.output_item.added` with a new `function_call` item (after the message item or instead of it depending on order). - `response.function_call_arguments.delta` for each args delta. - `response.function_call_arguments.done` when the call closes. - `response.output_item.done` for the function_call item. 5. **`finish_reason` mapping** — `FinishReason::ToolCalls` is already defined; threads through to `tool_calls` on chat and the appropriate Responses status. ## Acceptance - A prompt that triggers Qwen3 tool use produces structured `ToolCallStart` + `ToolCallArgsDelta` events on the InferenceEvent stream, with the model's `<tool_call>{json}</tool_call>` markers consumed (not surfaced as text). - Chat completions clients see the OpenAI `tool_calls` delta shape. - Responses clients see `response.function_call_arguments.*` events. - helexa-acp's `ToolCallParser` becomes redundant for cortex-backed sessions. ## Tracking Blocks: Responses-API-driven tool use against neuron (helexa-acp's openai-responses provider can't get function-call output items today). Coupled with #5 — both are the same harness-level extraction work, just for different tags. May want to land them together.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: helexa/cortex#6