Responses API: emit response.in_progress and built-in-tool event families #7

New Issue

grenade · 2026-05-31T08:19:37Z

grenade commented

2026-05-31 08:19:37 +00:00

Scope cut from Step 2 (commit `957f704`)

The Responses projector emits a tight 8-event sequence:

response.created → response.output_item.added → response.content_part.added → response.output_text.delta × N → response.output_text.done → response.content_part.done → response.output_item.done → response.completed

That's enough for a Responses client to recover the assistant text. Out-of-scope today:

response.in_progress — OpenAI emits this between response.created and the first content delta to signal "we're past prompt validation, model is generating". Some clients render a different spinner state based on it.
Built-in tool event families — response.web_search_call.*, response.code_interpreter_call.*, response.file_search_call.*, response.image_generation_call.*. These exist for OpenAI's hosted tools; neuron doesn't have any of those tools wired up, so the events would never fire — but the clients that look for them will today render an error or fall back to chat-completions semantics.
Reasoning event family — tracked separately in #5.
Function-call event family — tracked separately in #6.

See crates/cortex-core/src/responses.rs::events for the constants module, and crates/neuron/src/wire/openai_responses.rs::emit_start_frames / emit_finish_frames for the emission sites.

Why it was cut

Most of these events carry no information the consumer doesn't already get from the surrounding events. in_progress is a marker, not a payload; the built-in-tool families are no-ops without the underlying tools. The minimum useful set was what mattered for Stage 6 of helexa-acp to start exercising the route.

What implementation looks like

`response.in_progress`

Add the constant: pub const IN_PROGRESS: &str = "response.in_progress"; to cortex_core::responses::events.
Emit between response.created and the first output_item event in emit_start_frames. Payload mirrors response.created — the shell with status: "in_progress".
Test: extend full_stream_emits_expected_event_sequence to assert it's present.

That one is trivial. The next two depend on tool wiring that doesn't exist:

Hosted-tool families

Don't emit anything until we have the tools themselves. When we do (e.g. a web_search tool that proxies to an external search engine):

Extend InferenceEvent with ToolInvocation { tool_id, name, status, … } variants.
The hosted-tool implementation produces these events alongside TextDeltas.
The Responses projector translates ToolInvocation events to the appropriate response.<tool>_call.* family.

This is the natural extension once tool wiring lands. No work to do until then.

Acceptance

response.in_progress lands cleanly with a test.
The hosted-tool event families are documented as deferred until the corresponding tools exist; this issue tracks the schema work, not the tool implementation.

Tracking

Cosmetic for in_progress (some clients are fussier than others). Hosted-tool families are blocked on actually having hosted tools; not relevant for neuron's current scope.

## Scope cut from Step 2 (commit [`957f704`](https://git.lair.cafe/helexa/cortex/commit/957f704)) The Responses projector emits a tight 8-event sequence: `response.created` → `response.output_item.added` → `response.content_part.added` → `response.output_text.delta` × N → `response.output_text.done` → `response.content_part.done` → `response.output_item.done` → `response.completed` That's enough for a Responses client to recover the assistant text. Out-of-scope today: 1. **`response.in_progress`** — OpenAI emits this between `response.created` and the first content delta to signal "we're past prompt validation, model is generating". Some clients render a different spinner state based on it. 2. **Built-in tool event families** — `response.web_search_call.*`, `response.code_interpreter_call.*`, `response.file_search_call.*`, `response.image_generation_call.*`. These exist for OpenAI's hosted tools; neuron doesn't have any of those tools wired up, so the events would never fire — but the clients that look for them will today render an error or fall back to chat-completions semantics. 3. **Reasoning event family** — tracked separately in #5. 4. **Function-call event family** — tracked separately in #6. See `crates/cortex-core/src/responses.rs::events` for the constants module, and `crates/neuron/src/wire/openai_responses.rs::emit_start_frames` / `emit_finish_frames` for the emission sites. ## Why it was cut Most of these events carry no information the consumer doesn't already get from the surrounding events. `in_progress` is a marker, not a payload; the built-in-tool families are no-ops without the underlying tools. The minimum useful set was what mattered for Stage 6 of helexa-acp to start exercising the route. ## What implementation looks like ### `response.in_progress` 1. Add the constant: `pub const IN_PROGRESS: &str = "response.in_progress";` to `cortex_core::responses::events`. 2. Emit between `response.created` and the first output_item event in `emit_start_frames`. Payload mirrors `response.created` — the shell with `status: "in_progress"`. 3. Test: extend `full_stream_emits_expected_event_sequence` to assert it's present. That one is trivial. The next two depend on tool wiring that doesn't exist: ### Hosted-tool families Don't emit anything until we have the tools themselves. When we do (e.g. a `web_search` tool that proxies to an external search engine): 1. Extend `InferenceEvent` with `ToolInvocation { tool_id, name, status, … }` variants. 2. The hosted-tool implementation produces these events alongside `TextDelta`s. 3. The Responses projector translates `ToolInvocation` events to the appropriate `response.<tool>_call.*` family. This is the natural extension once tool wiring lands. No work to do until then. ## Acceptance - `response.in_progress` lands cleanly with a test. - The hosted-tool event families are documented as deferred until the corresponding tools exist; this issue tracks the schema work, not the tool implementation. ## Tracking Cosmetic for in_progress (some clients are fussier than others). Hosted-tool families are blocked on actually having hosted tools; not relevant for neuron's current scope.

grenade referenced this issue

2026-05-31 14:42:28 +00:00

Responses API: surface Qwen3 `<think>` blocks as reasoning items #5

grenade referenced this issue

2026-05-31 14:43:14 +00:00

Strip reasoning content from chat-completions output by default; opt-in via header #8

grenade referenced this issue

2026-05-31 14:43:38 +00:00

Pass through `chat_template_kwargs` to the chat template at tokenization #9

grenade referenced this issue from a commit

2026-05-31 20:26:34 +00:00

feat(neuron): extract `<tool_call>` blocks to structured tool_calls deltas