cortex

helexa/cortex

Fork 0

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
rob thijssen	957f704efa	feat(neuron): OpenAI Responses API + ci cuda-check runner label Some checks failed build-prerelease / Package cortex RPM (push) Blocked by required conditions Details CI / CUDA type-check (push) Failing after 11s Details build-prerelease / Resolve version stamps (push) Successful in 30s Details CI / Format (push) Successful in 32s Details CI / Clippy (push) Successful in 2m31s Details build-prerelease / Build cortex binary (push) Successful in 4m32s Details CI / Test (push) Successful in 5m42s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 6m8s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details Step 2 of the Responses rollout: native `/v1/responses` endpoint on neuron that consumes the same InferenceEvent stream as `/v1/chat/completions` but emits it as the Responses API's named SSE event family. No gateway-side translation. ## Surface - `cortex-core::responses` envelope types: `ResponsesRequest`, `ResponsesInput` (text \| items), `ResponsesInputItem` (message \| function_call \| function_call_output \| reasoning), `ResponsesContentPart` (input_text \| input_image \| output_text), `ResponsesResponse`, `ResponsesOutputItem`, `ResponsesUsage`. Plus a `events::*` constant module so the projector and the wire shape stay in sync without string-typos. - `neuron::wire::openai_responses`: - `request_to_chat(req)` flattens Responses input + instructions into a `ChatCompletionRequest` the candle harness already understands. Text-only Parts collapse to a string; mixed text+image Parts go to chat's content-array shape; reasoning items drop; function_call / function_call_output round-trip via tool_calls / tool_call_id metadata so the surface is consistent for the day the harness emits tool calls. - `project_responses_stream(rx, meta)` reads InferenceEvents and emits the eight named events that compose a Responses stream: response.created → output_item.added → content_part.added → output_text.delta×N → output_text.done → content_part.done → output_item.done → response.completed. Synthesises start frames if the producer skips Start (poisoned model, early disconnect) so the stream stays coherent. - `build_response(meta, text, reason, usage)` for the non-streaming path. - `CandleHarness::inference_stream(req)` extracted from `chat_completion_stream`, returning a typed `InferenceStream` (event receiver + id/created/model_id metadata). Both `chat_completion_stream` and the new `responses_stream` are now thin wrappers that pick their wire projection. TP path got the same treatment (`chat_completion_tp_stream` → `inference_tp_stream`). - `POST /v1/responses` route on neuron. Non-streaming returns one buffered `ResponsesResponse`; streaming returns axum SSE with both event names and JSON data per frame (Responses, unlike chat completions, uses named `event:` lines). Reused `inference_error_response` helper hoisted out so the chat and responses handlers share the InferenceError → HTTP mapping. ## CI Also bundles the `cuda-check` runner-label fix from feedback on commit `1859777`: `runs-on: rpm` doesn't ship the CUDA toolkit so cudarc's nvcc-version build script blew up. Switched to `runs-on: cuda-13.0` per the existing labels. ## Scope cuts (documented in the modules) - `previous_response_id` rejected at translate time with 400 (`code: chained_conversation_not_supported`) — stateful chained conversations need a persistence layer we haven't built. - Reasoning items dropped (no Qwen3 `<think>` routing yet). - Single output item per response (one `"message"` carrying text); `function_call` items reserved but not synthesised. - Streaming events cover the core set; `response.in_progress` and the web_search / image_generation event families are out-of-scope. 22 new tests: 5 in cortex-core (envelope round-trips), 13 in neuron::wire (request translator + projector + non-streaming builder), 4 in neuron's tests/api.rs (route surface — 503 when no candle, 400 on previous_response_id, 404 on missing model for both stream and non-stream). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 11:13:44 +03:00
rob thijssen	302ccfb982	refactor(neuron): introduce InferenceEvent + wire projection layer Some checks failed build-prerelease / Resolve version stamps (push) Successful in 31s Details CI / Format (push) Successful in 38s Details CI / Clippy (push) Successful in 3m28s Details build-prerelease / Build neuron-blackwell (push) Failing after 6m4s Details build-prerelease / Build neuron-ampere (push) Failing after 7m20s Details CI / Test (push) Successful in 7m29s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-ada (push) Failing after 4m57s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m19s Details build-prerelease / Package cortex RPM (push) Successful in 1m24s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped Details Step 1 of the OpenAI Responses API rollout. Pure refactor — no new endpoints, no behaviour change on the wire. Lays the seam for emitting Responses-shaped streaming events from the same harness output as chat completions in Step 2. - New `neuron::wire` module tree: - `wire::event::InferenceEvent` — format-agnostic enum (Start, TextDelta, ReasoningDelta, Finish) the candle harness now emits as its native streaming currency. - `wire::event::FinishReason` — typed reason that maps cleanly onto OpenAI `finish_reason`, OpenAI Responses `status`, and Anthropic `stop_reason` strings. - `wire::openai_chat::project_chat_stream` — async task that consumes an InferenceEvent receiver and produces a ChatCompletionChunk receiver, stamping per-request metadata (id, created, model_id) onto every chunk. Output matches the pre-refactor wire shape bit-for-bit. - candle.rs refactored to emit InferenceEvent on its internal channel through all three streaming paths (CPU run_inference_streaming, CUDA single-GPU stream_inference_via_worker, CUDA TP chat_completion_tp_stream). The streaming functions lost their id/created/model_id parameters since wire-format metadata now lives in the projector. - emit_delta + emit_delta_blocking simplified to single-purpose TextDelta emitters with no wire-format coupling. - chat_completion_stream wraps the InferenceEvent receiver in wire_chat::project_chat_stream before returning so the /v1/chat/completions HTTP handler keeps consuming ChatCompletionChunks unchanged. External signature preserved. Also fixes a pre-existing helexa-acp test race (three modules each declared their own static LOCK for HOME mutation, so cross-module parallelism flaked tests that read HOME at runtime). Consolidated onto a single crate-wide path_util::ENV_LOCK. 122 helexa-acp tests + 44 neuron tests pass (5 new wire projection tests). fmt + clippy --workspace -- -D warnings clean. Ran helexa-acp suite 3x to confirm the env race is closed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 11:30:17 +03:00

rob thijssen

957f704efa

feat(neuron): OpenAI Responses API + ci cuda-check runner label

build-prerelease / Package cortex RPM (push) Blocked by required conditions

Details

CI / CUDA type-check (push) Failing after 11s

Details

build-prerelease / Resolve version stamps (push) Successful in 30s

Details

CI / Format (push) Successful in 32s

Details

CI / Clippy (push) Successful in 2m31s

Details

build-prerelease / Build cortex binary (push) Successful in 4m32s

Details

CI / Test (push) Successful in 5m42s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

build-prerelease / Build neuron-blackwell (push) Successful in 6m8s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled

Details

build-prerelease / Build neuron-ampere (push) Has been cancelled

Details

build-prerelease / Build neuron-ada (push) Has been cancelled

Details

Step 2 of the Responses rollout: native `/v1/responses` endpoint on
neuron that consumes the same InferenceEvent stream as
`/v1/chat/completions` but emits it as the Responses API's named
SSE event family. No gateway-side translation.

## Surface

- `cortex-core::responses` envelope types: `ResponsesRequest`,
  `ResponsesInput` (text | items), `ResponsesInputItem` (message |
  function_call | function_call_output | reasoning),
  `ResponsesContentPart` (input_text | input_image | output_text),
  `ResponsesResponse`, `ResponsesOutputItem`, `ResponsesUsage`. Plus
  a `events::*` constant module so the projector and the wire shape
  stay in sync without string-typos.

- `neuron::wire::openai_responses`:
  - `request_to_chat(req)` flattens Responses input + instructions
    into a `ChatCompletionRequest` the candle harness already
    understands. Text-only Parts collapse to a string; mixed
    text+image Parts go to chat's content-array shape; reasoning
    items drop; function_call / function_call_output round-trip
    via tool_calls / tool_call_id metadata so the surface is
    consistent for the day the harness emits tool calls.
  - `project_responses_stream(rx, meta)` reads InferenceEvents
    and emits the eight named events that compose a Responses
    stream: response.created → output_item.added → content_part.added
    → output_text.delta×N → output_text.done → content_part.done
    → output_item.done → response.completed. Synthesises start
    frames if the producer skips Start (poisoned model, early
    disconnect) so the stream stays coherent.
  - `build_response(meta, text, reason, usage)` for the
    non-streaming path.

- `CandleHarness::inference_stream(req)` extracted from
  `chat_completion_stream`, returning a typed `InferenceStream`
  (event receiver + id/created/model_id metadata). Both
  `chat_completion_stream` and the new `responses_stream` are now
  thin wrappers that pick their wire projection. TP path got the
  same treatment (`chat_completion_tp_stream` → `inference_tp_stream`).

- `POST /v1/responses` route on neuron. Non-streaming returns one
  buffered `ResponsesResponse`; streaming returns axum SSE with
  both event names and JSON data per frame (Responses, unlike
  chat completions, uses named `event:` lines). Reused
  `inference_error_response` helper hoisted out so the chat and
  responses handlers share the InferenceError → HTTP mapping.

## CI

Also bundles the `cuda-check` runner-label fix from feedback on
commit 1859777: `runs-on: rpm` doesn't ship the CUDA toolkit so
cudarc's nvcc-version build script blew up. Switched to
`runs-on: cuda-13.0` per the existing labels.

## Scope cuts (documented in the modules)

- `previous_response_id` rejected at translate time with 400
  (`code: chained_conversation_not_supported`) — stateful chained
  conversations need a persistence layer we haven't built.
- Reasoning items dropped (no Qwen3 `<think>` routing yet).
- Single output item per response (one `"message"` carrying text);
  `function_call` items reserved but not synthesised.
- Streaming events cover the core set; `response.in_progress`
  and the web_search / image_generation event families are
  out-of-scope.

22 new tests: 5 in cortex-core (envelope round-trips), 13 in
neuron::wire (request translator + projector + non-streaming
builder), 4 in neuron's tests/api.rs (route surface — 503 when no
candle, 400 on previous_response_id, 404 on missing model for
both stream and non-stream).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-31 11:13:44 +03:00

rob thijssen

302ccfb982

refactor(neuron): introduce InferenceEvent + wire projection layer

build-prerelease / Resolve version stamps (push) Successful in 31s

Details

CI / Format (push) Successful in 38s

Details

CI / Clippy (push) Successful in 3m28s

Details

build-prerelease / Build neuron-blackwell (push) Failing after 6m4s

Details

build-prerelease / Build neuron-ampere (push) Failing after 7m20s

Details

CI / Test (push) Successful in 7m29s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

build-prerelease / Build neuron-ada (push) Failing after 4m57s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped

Details

build-prerelease / Build cortex binary (push) Successful in 4m19s

Details

build-prerelease / Package cortex RPM (push) Successful in 1m24s

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped

Details

Step 1 of the OpenAI Responses API rollout. Pure refactor — no new
endpoints, no behaviour change on the wire. Lays the seam for
emitting Responses-shaped streaming events from the same harness
output as chat completions in Step 2.

- New `neuron::wire` module tree:
  - `wire::event::InferenceEvent` — format-agnostic enum
    (Start, TextDelta, ReasoningDelta, Finish) the candle harness
    now emits as its native streaming currency.
  - `wire::event::FinishReason` — typed reason that maps cleanly
    onto OpenAI `finish_reason`, OpenAI Responses `status`, and
    Anthropic `stop_reason` strings.
  - `wire::openai_chat::project_chat_stream` — async task that
    consumes an InferenceEvent receiver and produces a
    ChatCompletionChunk receiver, stamping per-request metadata
    (id, created, model_id) onto every chunk. Output matches the
    pre-refactor wire shape bit-for-bit.

- candle.rs refactored to emit InferenceEvent on its internal
  channel through all three streaming paths (CPU
  run_inference_streaming, CUDA single-GPU stream_inference_via_worker,
  CUDA TP chat_completion_tp_stream). The streaming functions lost
  their id/created/model_id parameters since wire-format metadata
  now lives in the projector.

- emit_delta + emit_delta_blocking simplified to single-purpose
  TextDelta emitters with no wire-format coupling.

- chat_completion_stream wraps the InferenceEvent receiver in
  wire_chat::project_chat_stream before returning so the
  /v1/chat/completions HTTP handler keeps consuming
  ChatCompletionChunks unchanged. External signature preserved.

Also fixes a pre-existing helexa-acp test race (three modules each
declared their own static LOCK for HOME mutation, so cross-module
parallelism flaked tests that read HOME at runtime). Consolidated
onto a single crate-wide path_util::ENV_LOCK.

122 helexa-acp tests + 44 neuron tests pass (5 new wire projection
tests). fmt + clippy --workspace -- -D warnings clean. Ran helexa-acp
suite 3x to confirm the env race is closed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-29 11:30:17 +03:00

2 Commits