Qwen3_5Model now builds the rotary cos/sin once per forward and threads
(cos, sin) through the decoder → full-attention → rope, replacing the
scalar offset that reached RotaryEmbedding:
- vision forward computes get_rope_index over the (single-shot) prompt,
sets rope_delta, and builds interleaved-M-RoPE cos/sin so image tokens
carry their 14×14 grid (height/width) positions;
- text / decode take plain_cos_sin at offset + rope_delta — with
rope_delta == 0 (no image) this is bit-for-bit the old plain RoPE, and
the device→host id copy is skipped on the text decode hot path.
rope_delta is stored on the model and reset in clear_kv_cache, so decode
after a vision prefill resumes text positions from the image-compressed
counter. decoder.rs / full_attn.rs take (cos, sin) instead of offset;
linear-attention layers are unchanged (no RoPE). The TP path still uses
the retained apply(offset) — wired in Stage 4.
Full workspace tests green; the load-bearing invariant (M-RoPE == plain
for equal axes) keeps text unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>