Responses API: implement previous_response_id chained conversations #4

New Issue

grenade · 2026-05-31T08:18:29Z

grenade commented

2026-05-31 08:18:29 +00:00

Scope cut from Step 2 (commit `957f704`)

The /v1/responses handler currently rejects any request that sets previous_response_id with a 400:

{
  "error": "previous_response_id is not supported on this neuron",
  "code": "chained_conversation_not_supported"
}

See crates/neuron/src/wire/openai_responses.rs::TranslateError::ChainedConversationNotSupported and the matching test in crates/neuron/tests/api.rs::test_responses_rejects_previous_response_id.

Why it was cut

Chained conversations require server-side persistence: when a client sends previous_response_id: "resp_abc", the agent must look up that prior response's full output (including all output_item.* content and tool calls) and prepend it to the new request's input as conversational context. We don't have anywhere to store that today.

What implementation looks like

Storage layer — pick a backing store. Options:
- In-memory Arc<RwLock<HashMap<ResponseId, StoredResponse>>> on NeuronState, with a TTL evictor. Simple, lossy across restarts.
- On-disk under $XDG_DATA_HOME/neuron/responses/<id>.json like helexa-acp's session store. Survives restarts; trivially auditable.
- SQLite. Overkill for v1 but the natural endpoint.
Persist on completion — after a successful /v1/responses request (streaming or not), serialise the assembled ResponsesResponse to the chosen store, keyed by response.id.
Lookup on translate — when request_to_chat sees previous_response_id, fetch the prior response, walk its output items, and prepend them as assistant / function_call / function_call_output items to the new chat-completions message list.
Cleanup policy — TTL (24h?) or LRU cap. Document expectations in the module header.

Acceptance

previous_response_id set against an unknown id → 404 with a clear error.
previous_response_id set against a known id → the model sees the full prior turn as context. Verify by sending a follow-up that depends on the prior assistant message (e.g. "what's my name?" after "my name is Alice").
Persistence survives a neuron restart (depends on chosen store).
A new integration test in crates/neuron/tests/api.rs exercising the round-trip.

Tracking

Blocks: full Responses API parity for clients that use OpenAI's stateful chaining (most production code). Test surface for helexa-acp's eventual openai-responses provider — that provider can either drive chaining client-side (manually feeding prior output back as input) or use this once it lands.

## Scope cut from Step 2 (commit [`957f704`](https://git.lair.cafe/helexa/cortex/commit/957f704)) The `/v1/responses` handler currently rejects any request that sets `previous_response_id` with a 400: ```json { "error": "previous_response_id is not supported on this neuron", "code": "chained_conversation_not_supported" } ``` See `crates/neuron/src/wire/openai_responses.rs::TranslateError::ChainedConversationNotSupported` and the matching test in `crates/neuron/tests/api.rs::test_responses_rejects_previous_response_id`. ## Why it was cut Chained conversations require server-side persistence: when a client sends `previous_response_id: "resp_abc"`, the agent must look up that prior response's full output (including all `output_item.*` content and tool calls) and prepend it to the new request's input as conversational context. We don't have anywhere to store that today. ## What implementation looks like 1. **Storage layer** — pick a backing store. Options: - In-memory `Arc<RwLock<HashMap<ResponseId, StoredResponse>>>` on `NeuronState`, with a TTL evictor. Simple, lossy across restarts. - On-disk under `$XDG_DATA_HOME/neuron/responses/<id>.json` like helexa-acp's session store. Survives restarts; trivially auditable. - SQLite. Overkill for v1 but the natural endpoint. 2. **Persist on completion** — after a successful `/v1/responses` request (streaming or not), serialise the assembled `ResponsesResponse` to the chosen store, keyed by `response.id`. 3. **Lookup on translate** — when `request_to_chat` sees `previous_response_id`, fetch the prior response, walk its `output` items, and prepend them as `assistant` / `function_call` / `function_call_output` items to the new chat-completions message list. 4. **Cleanup policy** — TTL (24h?) or LRU cap. Document expectations in the module header. ## Acceptance - `previous_response_id` set against an unknown id → 404 with a clear error. - `previous_response_id` set against a known id → the model sees the full prior turn as context. Verify by sending a follow-up that depends on the prior assistant message (e.g. "what's my name?" after "my name is Alice"). - Persistence survives a neuron restart (depends on chosen store). - A new integration test in `crates/neuron/tests/api.rs` exercising the round-trip. ## Tracking Blocks: full Responses API parity for clients that use OpenAI's stateful chaining (most production code). Test surface for helexa-acp's eventual openai-responses provider — that provider can either drive chaining client-side (manually feeding prior output back as input) or use this once it lands.

grenade referenced this issue

2026-05-31 14:43:38 +00:00

Pass through `chat_template_kwargs` to the chat template at tokenization #9

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: helexa/cortex#4

Responses API: implement previous_response_id chained conversations #4

Scope cut from Step 2 (commit 957f704)

Why it was cut

What implementation looks like

Acceptance

Tracking

Responses API: implement `previous_response_id` chained conversations #4

Scope cut from Step 2 (commit `957f704`)