feat(cortex-gateway): proxy /v1/responses to neuron

Step 3 of the Responses rollout: plain proxy route on the gateway, no translation. Neuron speaks the Responses API natively after Step 2 (commit 957f704), so the gateway just needs the same routing shape it uses for /v1/chat/completions — extract `model`, resolve via router::resolve, forward verbatim. - New `POST /v1/responses` handler in handlers.rs::responses. - Mock neuron under tests/common/mod.rs gains a `/v1/responses` endpoint that mirrors the ResponsesResponse shape neuron emits. - New integration test file `tests/responses.rs` exercises: - Happy path (200, body round-trips, ResponsesUsage shape). - Unknown model → 404 (matches chat-completions error shape). - Missing `model` field → 400 (same extract_model helper). Streaming proxy works through the same path as chat completions — the upstream Content-Type (`text/event-stream` for stream:true, `application/json` otherwise) propagates through proxy_with_metrics unchanged. Live-stream integration tests against a streaming mock deferred until we exercise the path against a real neuron, since the chat-completions streaming test already covers the proxy's SSE forwarding mechanics. Three new tests; clippy + fmt clean across the workspace. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-31 11:21:43 +03:00
parent 957f704efa
commit 5ed1140c97
3 changed files with 178 additions and 0 deletions
--- a/crates/cortex-gateway/tests/common/mod.rs
+++ b/crates/cortex-gateway/tests/common/mod.rs
@@ -44,6 +44,7 @@ pub async fn spawn_mock_neuron() -> String {
            post(|Json(_body): Json<Value>| async { Json(json!({"status": "unloaded"})) }),
        )
        .route("/v1/chat/completions", post(mock_chat_completions))
+        .route("/v1/responses", post(mock_responses))
        .route("/v1/models", get(mock_v1_models));

    tokio::spawn(async move {
@@ -93,6 +94,39 @@ async fn mock_chat_completions(Json(body): Json<Value>) -> Json<Value> {
    }))
 }

+async fn mock_responses(Json(body): Json<Value>) -> Json<Value> {
+    let model = body
+        .get("model")
+        .and_then(|v| v.as_str())
+        .unwrap_or("unknown");
+    // Echo the model field back and synthesise a tiny ResponsesResponse.
+    // Mirrors the shape neuron's /v1/responses handler emits so the
+    // gateway test only needs to assert the proxy round-tripped it.
+    Json(json!({
+        "id": "resp-test-001",
+        "object": "response",
+        "created_at": 1700000000_u64,
+        "status": "completed",
+        "model": model,
+        "output": [{
+            "type": "message",
+            "id": "msg-test-001",
+            "role": "assistant",
+            "content": [{
+                "type": "output_text",
+                "text": "Hello from mock backend",
+                "annotations": []
+            }],
+            "status": "completed"
+        }],
+        "usage": {
+            "input_tokens": 5,
+            "output_tokens": 5,
+            "total_tokens": 10
+        }
+    }))
+}
+
 /// Spawns a mock neuron that returns SSE streaming responses for chat completions.
 pub async fn spawn_streaming_mock_neuron(chunk_count: usize, chunk_delay: Duration) -> String {
    let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();