Files
cortex/crates/neuron
rob thijssen 4c12c7e2f0 feat(neuron): M-RoPE Stage 3 — wire interleaved M-RoPE into single-GPU
Qwen3_5Model now builds the rotary cos/sin once per forward and threads
(cos, sin) through the decoder → full-attention → rope, replacing the
scalar offset that reached RotaryEmbedding:

- vision forward computes get_rope_index over the (single-shot) prompt,
  sets rope_delta, and builds interleaved-M-RoPE cos/sin so image tokens
  carry their 14×14 grid (height/width) positions;
- text / decode take plain_cos_sin at offset + rope_delta — with
  rope_delta == 0 (no image) this is bit-for-bit the old plain RoPE, and
  the device→host id copy is skipped on the text decode hot path.

rope_delta is stored on the model and reset in clear_kv_cache, so decode
after a vision prefill resumes text positions from the image-compressed
counter. decoder.rs / full_attn.rs take (cos, sin) instead of offset;
linear-attention layers are unchanged (no RoPE). The TP path still uses
the retained apply(offset) — wired in Stage 4.

Full workspace tests green; the load-bearing invariant (M-RoPE == plain
for equal axes) keeps text unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 18:39:52 +03:00
..