refactor(neuron): introduce InferenceEvent + wire projection layer

Step 1 of the OpenAI Responses API rollout. Pure refactor — no new endpoints, no behaviour change on the wire. Lays the seam for emitting Responses-shaped streaming events from the same harness output as chat completions in Step 2. - New `neuron::wire` module tree: - `wire::event::InferenceEvent` — format-agnostic enum (Start, TextDelta, ReasoningDelta, Finish) the candle harness now emits as its native streaming currency. - `wire::event::FinishReason` — typed reason that maps cleanly onto OpenAI `finish_reason`, OpenAI Responses `status`, and Anthropic `stop_reason` strings. - `wire::openai_chat::project_chat_stream` — async task that consumes an InferenceEvent receiver and produces a ChatCompletionChunk receiver, stamping per-request metadata (id, created, model_id) onto every chunk. Output matches the pre-refactor wire shape bit-for-bit. - candle.rs refactored to emit InferenceEvent on its internal channel through all three streaming paths (CPU run_inference_streaming, CUDA single-GPU stream_inference_via_worker, CUDA TP chat_completion_tp_stream). The streaming functions lost their id/created/model_id parameters since wire-format metadata now lives in the projector. - emit_delta + emit_delta_blocking simplified to single-purpose TextDelta emitters with no wire-format coupling. - chat_completion_stream wraps the InferenceEvent receiver in wire_chat::project_chat_stream before returning so the /v1/chat/completions HTTP handler keeps consuming ChatCompletionChunks unchanged. External signature preserved. Also fixes a pre-existing helexa-acp test race (three modules each declared their own static LOCK for HOME mutation, so cross-module parallelism flaked tests that read HOME at runtime). Consolidated onto a single crate-wide path_util::ENV_LOCK. 122 helexa-acp tests + 44 neuron tests pass (5 new wire projection tests). fmt + clippy --workspace -- -D warnings clean. Ran helexa-acp suite 3x to confirm the env race is closed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 11:30:17 +03:00
parent df0abfe4d4
commit 302ccfb982
7 changed files with 491 additions and 194 deletions
--- a/crates/neuron/src/wire/mod.rs
+++ b/crates/neuron/src/wire/mod.rs
@@ -0,0 +1,23 @@
+//! Wire-format projection layer.
+//!
+//! The candle harness produces a single, format-agnostic stream of
+//! [`InferenceEvent`]s. Each wire format (OpenAI chat completions,
+//! OpenAI Responses, Anthropic messages, …) lives in its own module
+//! under `wire::` and projects that event stream into the chunks /
+//! events its HTTP clients expect.
+//!
+//! The benefit over translating *between* wire shapes (OpenAI chat
+//! → Anthropic, etc.) is that we never have to reason about a
+//! wire-N → wire-M conversion: every translation is wire-N ↔ the
+//! internal event currency, and the projections are independent. A
+//! new wire format adds a new file under `wire::`; nothing else
+//! needs to know about it.
+//!
+//! Today: [`openai_chat`]. Stage 2 adds `openai_responses`. Stage 3
+//! could add a native Anthropic projection that replaces the
+//! gateway-side translation.
+
+pub mod event;
+pub mod openai_chat;
+
+pub use event::{FinishReason, InferenceEvent};