Files
cortex/crates
rob thijssen 9a24b05866 feat(neuron): TP-vision Stage 1 — replicated vision tower on the TP model
Load the full, unsharded model.visual.* vision tower on every TP rank
(leader + each subprocess worker mmaps the same local safetensors) when
config.vision_config is present. VisionTower::load already takes a
ShardedVarBuilder whose plain .get() returns the full replicated tensor,
so the tower loads identically regardless of world_size — no sharding,
no NCCL broadcast.

- TpQwen3_5ForCausalLM gains vision: Option<VisionTower> + image_token_id,
  plus has_vision/image_token_id/encode_image/forward_with_vision,
  mirroring the single-GPU Qwen3_5ForCausalLM wrapper.
- TpQwen3_5Model::forward_with_vision mirrors the single-GPU
  forward_inner splice: embed locally, replace rows at image_token_id
  positions, run the sharded decoder stack. Because every rank encodes
  the same pixels through its replicated tower, the spliced input
  embeddings are identical across ranks — preserving the TP
  replicated-hidden-state invariant the row-parallel AllReduce relies on.
- splice_runs is now pub(crate) and shared with the TP model.

No caller yet — Stage 2 wires the RPC/worker path that invokes
encode_image + forward_with_vision per rank. Most of this compiles on
the non-cuda build (only the cuda load variant's tower line is gated);
CI's CUDA type-check covers the rest.

Refs TP-vision plan Stage 1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 15:00:05 +03:00
..