fix(neuron): TP-vision Stage 0 — reject image requests on the TP path
Some checks failed
build-prerelease / Resolve version stamps (push) Waiting to run
CI / Format (push) Waiting to run
CI / CUDA type-check (push) Successful in 32s
build-prerelease / Build cortex binary (push) Has been cancelled
build-prerelease / Build neuron-blackwell (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package cortex RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build cortex SRPM (push) Has been cancelled
CI / Build neuron SRPM (push) Has been cancelled
CI / Publish cortex to COPR (push) Has been cancelled
CI / Publish neuron to COPR (push) Has been cancelled
CI / Bump version in source (push) Has been cancelled

The TP inference path has no vision tower, and the TP dispatch in
chat_completion / inference_stream returns before the VisionUnsupported
guard runs — so an image request to a TP-loaded model (e.g. beast's
tp=2 Qwen3.6-27B) was silently dropped and answered from text alone,
the exact issue-#3 confident-hallucination pattern Stage C killed for
single-GPU.

Add the request_has_images → VisionUnsupported guard to both
chat_completion_tp and inference_tp_stream, before prefill / before the
SSE stream opens, so beast returns a clean 400 vision_unsupported. The
guard is unconditional for now (TP has no tower); Stage 3 makes it
conditional on the TP model's has_vision once real TP-vision lands.

Detection is covered by the existing request_has_images unit test; the
guard itself is cuda-gated (validated by CI's CUDA type-check).

Refs TP-vision plan Stage 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-04 14:53:56 +03:00
parent dd592d918d
commit f8c0da0ebf
2 changed files with 27 additions and 0 deletions

View File

@@ -0,0 +1 @@
{"sessionId":"a27586bb-2ca7-4e92-8d82-12f41b39f392","pid":3106893,"procStart":"59753850","acquiredAt":1780571089798}

View File

@@ -2739,6 +2739,18 @@ impl CandleHarness {
return Err(poisoned_error(&model_id)); return Err(poisoned_error(&model_id));
} }
// Stage 0 (TP-vision): the TP path has no vision tower yet, so
// an image-bearing request can't be honoured. Reject it cleanly
// with `vision_unsupported` instead of silently dropping the
// image and answering from text alone (the issue-#3 confident-
// hallucination pattern). Made conditional on the TP model's
// `has_vision` once Stage 3 wires real TP-vision.
if request_has_images(&request) {
let _g = span.enter();
tracing::warn!("TP chat_completion: rejecting image request, TP vision unsupported");
return Err(InferenceError::VisionUnsupported { model_id });
}
let tp_for_marker = Arc::clone(&tp); let tp_for_marker = Arc::clone(&tp);
let handle = tokio::spawn(chat_completion_tp_inner(tp, request).instrument(span.clone())); let handle = tokio::spawn(chat_completion_tp_inner(tp, request).instrument(span.clone()));
match handle.await { match handle.await {
@@ -2816,6 +2828,20 @@ impl CandleHarness {
return Err(poisoned_error(&request.model)); return Err(poisoned_error(&request.model));
} }
// Stage 0 (TP-vision): reject image requests on the TP streaming
// path before opening the SSE stream — the TP path has no vision
// tower yet, so honouring the image is impossible and silently
// dropping it would hallucinate. Returns a clean 400; made
// conditional on `has_vision` in Stage 3.
if request_has_images(&request) {
tracing::warn!(
"TP chat_completion (stream): rejecting image request, TP vision unsupported"
);
return Err(InferenceError::VisionUnsupported {
model_id: request.model.clone(),
});
}
let prompt = build_prompt_for_request(tp.chat_template.as_deref(), &request); let prompt = build_prompt_for_request(tp.chat_template.as_deref(), &request);
let encoding = tp let encoding = tp
.tokenizer .tokenizer