The TP inference path has no vision tower, and the TP dispatch in
chat_completion / inference_stream returns before the VisionUnsupported
guard runs — so an image request to a TP-loaded model (e.g. beast's
tp=2 Qwen3.6-27B) was silently dropped and answered from text alone,
the exact issue-#3 confident-hallucination pattern Stage C killed for
single-GPU.
Add the request_has_images → VisionUnsupported guard to both
chat_completion_tp and inference_tp_stream, before prefill / before the
SSE stream opens, so beast returns a clean 400 vision_unsupported. The
guard is unconditional for now (TP has no tower); Stage 3 makes it
conditional on the TP model's has_vision once real TP-vision lands.
Detection is covered by the existing request_has_images unit test; the
guard itself is cuda-gated (validated by CI's CUDA type-check).
Refs TP-vision plan Stage 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>