Files
cortex/crates/neuron
rob thijssen ed2d09864e
Some checks failed
CI / Format (push) Successful in 30s
CI / Clippy (push) Successful in 2m51s
CI / Test (push) Successful in 5m52s
CI / CUDA type-check (push) Failing after 50s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
feat(neuron): TP-vision Stage 3 — wire TP chat + stream vision prefill
End-to-end TP-vision: an image request to a TP-loaded Qwen3.6-27B now
conditions on the image across both ranks.

- TpLoadedModel carries has_vision / image_token_id / lm_tokens_per_image,
  populated at load via the shared VisionMeta::from_config_path (same
  config.json the shards loaded from; Stage 1 materialises the replicated
  tower on every rank).
- LoadedHandle::capabilities() now advertises "vision" for TP loads with
  a tower (cortex-gateway already unions this into /v1/models via C3).
- The TP rejection guards (chat_completion_tp + inference_tp_stream) are
  now conditional on !has_vision — text-only TP models still 400 cleanly,
  vision-capable ones fall through.
- chat_completion_tp_inner and the streaming orchestration task detect
  images (request_has_images), expand <|image_pad|> to the per-image
  patch count, and run a single-shot generate_step_with_images prefill
  (every rank encodes + splices its replicated tower) before the
  unchanged decode loop. Text requests keep chunked_prefill_tp.
- extract_image_data_uris ships the source data URIs to every rank for
  identical per-rank preprocessing.

prompt_tokens now reflects the patch expansion, so usage accounting and
KV offsets match the single-GPU baseline.

TP entry points are cuda-gated (validated by CI's CUDA type-check);
capabilities() + extract_image_data_uris + VisionMeta reuse compile on
the non-cuda build. Full workspace test green.

Refs TP-vision plan Stage 3. Implements #12.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 15:14:44 +03:00
..