Files
cortex/crates/neuron
rob thijssen dc048ffcc9
All checks were successful
CI / CUDA type-check (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 32s
CI / Format (push) Successful in 33s
CI / Clippy (push) Successful in 2m36s
build-prerelease / Build cortex binary (push) Successful in 4m48s
build-prerelease / Build neuron-blackwell (push) Successful in 5m59s
CI / Test (push) Successful in 6m35s
build-prerelease / Build neuron-ampere (push) Successful in 7m51s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Package cortex RPM (push) Successful in 1m21s
build-prerelease / Build neuron-ada (push) Successful in 5m13s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m0s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m5s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m49s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m6s
fix(neuron): vision-tower 2D positions + M-RoPE default on
Two fixes to the spatial handling of images, validated against the HF
transformers 4.57.1 qwen3_vl reference on beast.

**Vision tower (the real cause of poor spatial vision).** The Stage-A
tower encoded position two ways wrong, so the model saw image *content*
but not *layout* (a row of 5 people read as "a line of 23", sky
inverted), regardless of the LM-side rope:

- Learned pos-embed was a naive sequential lookup of the first
  `n_patches` rows of the 48×48 (`num_position_embeddings=2304`) grid —
  wrong stride for a 28×28 patch grid. Now bilinearly interpolates the
  grid to `gh×gw` (port of HF `fast_pos_embed_interpolate`), row-major.
- The 2D vision rotary was absent entirely. Added
  `VisionRotaryEmbedding` (θ=10000, dim=head_dim/2) applying per-patch
  `(row, col)` rotary to q/k in every ViT block via rope_slow, matching
  HF `apply_rotary_pos_emb_vision`.

Both default on; `NEURON_VISION_LEGACY_POS=1` / `NEURON_VISION_LEGACY_ROPE=1`
revert each for A/B (no rebuild). New unit tests: interpolation reduces
to the sequential lookup at the native grid; rotary row/col structure.

**M-RoPE default on.** The interleaved M-RoPE matches HF
apply_interleaved_mrope / get_rope_index exactly and A/B'd strictly ≥
plain. `NEURON_MROPE` is now a kill switch (`=0` for plain), not opt-in
— defaults should encode the model's trained behaviour, not freeze the
broken state.

Vision tower is plain candle (CPU-testable): built, clippy-clean, full
workspace tests green locally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 20:53:07 +03:00
..