cortex

Files

feat(neuron): M-RoPE Stage 3 — wire interleaved M-RoPE into single-GPU

Qwen3_5Model now builds the rotary cos/sin once per forward and threads
(cos, sin) through the decoder → full-attention → rope, replacing the
scalar offset that reached RotaryEmbedding:

- vision forward computes get_rope_index over the (single-shot) prompt,
  sets rope_delta, and builds interleaved-M-RoPE cos/sin so image tokens
  carry their 14×14 grid (height/width) positions;
- text / decode take plain_cos_sin at offset + rope_delta — with
  rope_delta == 0 (no image) this is bit-for-bit the old plain RoPE, and
  the device→host id copy is skipped on the text decode hot path.

rope_delta is stored on the model and reset in clear_kv_cache, so decode
after a vision prefill resumes text positions from the image-compressed
counter. decoder.rs / full_attn.rs take (cos, sin) instead of offset;
linear-attention layers are unchanged (no RoPE). The TP path still uses
the retained apply(offset) — wired in Stage 4.

Full workspace tests green; the load-bearing invariant (M-RoPE == plain
for equal axes) keeps text unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-04 18:39:52 +03:00

src

feat(neuron): M-RoPE Stage 3 — wire interleaved M-RoPE into single-GPU

2026-06-04 18:39:52 +03:00

tests

feat(neuron,cortex-core): source-aware loader (scheme:org/name)

2026-06-01 13:42:11 +03:00

build.rs

feat(stage-8d-1): import mistralrs GDN CUDA kernels — build infra only

2026-05-21 11:34:11 +03:00

Cargo.toml

fix(neuron): render HF chat templates via minijinja pycompat

2026-06-04 16:32:23 +03:00