feat(cortex-gateway): proxy /v1/responses to neuron
Some checks failed
CI / CUDA type-check (push) Failing after 12s
build-prerelease / Resolve version stamps (push) Successful in 33s
CI / Format (push) Successful in 37s
CI / Clippy (push) Failing after 1m5s
build-prerelease / Build cortex binary (push) Successful in 4m26s
CI / Test (push) Successful in 5m17s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 5m39s
build-prerelease / Package cortex RPM (push) Successful in 1m24s
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
Some checks failed
CI / CUDA type-check (push) Failing after 12s
build-prerelease / Resolve version stamps (push) Successful in 33s
CI / Format (push) Successful in 37s
CI / Clippy (push) Failing after 1m5s
build-prerelease / Build cortex binary (push) Successful in 4m26s
CI / Test (push) Successful in 5m17s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 5m39s
build-prerelease / Package cortex RPM (push) Successful in 1m24s
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
Step 3 of the Responses rollout: plain proxy route on the gateway,
no translation. Neuron speaks the Responses API natively after Step
2 (commit 957f704), so the gateway just needs the same routing
shape it uses for /v1/chat/completions — extract `model`, resolve
via router::resolve, forward verbatim.
- New `POST /v1/responses` handler in handlers.rs::responses.
- Mock neuron under tests/common/mod.rs gains a `/v1/responses`
endpoint that mirrors the ResponsesResponse shape neuron emits.
- New integration test file `tests/responses.rs` exercises:
- Happy path (200, body round-trips, ResponsesUsage shape).
- Unknown model → 404 (matches chat-completions error shape).
- Missing `model` field → 400 (same extract_model helper).
Streaming proxy works through the same path as chat completions —
the upstream Content-Type (`text/event-stream` for stream:true,
`application/json` otherwise) propagates through proxy_with_metrics
unchanged. Live-stream integration tests against a streaming mock
deferred until we exercise the path against a real neuron, since
the chat-completions streaming test already covers the proxy's
SSE forwarding mechanics.
Three new tests; clippy + fmt clean across the workspace.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -20,6 +20,7 @@ pub fn api_routes() -> Router<Arc<CortexState>> {
|
||||
Router::new()
|
||||
.route("/v1/chat/completions", post(chat_completions))
|
||||
.route("/v1/completions", post(completions))
|
||||
.route("/v1/responses", post(responses))
|
||||
.route("/v1/models", get(list_models))
|
||||
.route("/v1/messages", post(anthropic_messages))
|
||||
.route("/health", get(health))
|
||||
@@ -74,6 +75,58 @@ async fn chat_completions(
|
||||
.await
|
||||
}
|
||||
|
||||
/// `POST /v1/responses` — proxy to the appropriate backend node.
|
||||
///
|
||||
/// Same routing shape as [`chat_completions`]: extract `model` from
|
||||
/// the body, resolve to a node, forward verbatim. No translation —
|
||||
/// neuron speaks the Responses API natively (see
|
||||
/// `crates/neuron/src/wire/openai_responses.rs`), so the gateway is
|
||||
/// a pass-through. Streaming and non-streaming are handled
|
||||
/// identically; the upstream `Content-Type` (text/event-stream vs.
|
||||
/// application/json) propagates through the proxy.
|
||||
async fn responses(
|
||||
State(fleet): State<Arc<CortexState>>,
|
||||
headers: HeaderMap,
|
||||
body: Bytes,
|
||||
) -> Response {
|
||||
let model_id = match extract_model(&body) {
|
||||
Some(m) => m,
|
||||
None => {
|
||||
tracing::warn!(
|
||||
handler = "responses",
|
||||
"rejected: missing 'model' field in request body"
|
||||
);
|
||||
return error_response(400, "missing 'model' field in request body");
|
||||
}
|
||||
};
|
||||
|
||||
let route = match router::resolve(&fleet, &model_id).await {
|
||||
Ok(r) => r,
|
||||
Err(e) => {
|
||||
tracing::warn!(
|
||||
handler = "responses",
|
||||
model = %model_id,
|
||||
error = %e,
|
||||
"route resolve failed"
|
||||
);
|
||||
return error_response(404, &e.to_string());
|
||||
}
|
||||
};
|
||||
|
||||
touch_model(&fleet, &route.node_name, &route.resolved_model_id).await;
|
||||
|
||||
let body = rewrite_model_in_body(body, &route.resolved_model_id);
|
||||
proxy_with_metrics(
|
||||
&fleet,
|
||||
&route,
|
||||
"/v1/responses",
|
||||
headers,
|
||||
body,
|
||||
&route.resolved_model_id,
|
||||
)
|
||||
.await
|
||||
}
|
||||
|
||||
/// `POST /v1/completions` — proxy completions endpoint.
|
||||
async fn completions(
|
||||
State(fleet): State<Arc<CortexState>>,
|
||||
|
||||
Reference in New Issue
Block a user