Vision: numerical validation against transformers reference #15
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Deferred during planning of the initial vision capability (umbrella:
#3). Stage A–D ships with "loose" validation — coherent
image-grounded responses, a per-image quality benchmark — but no
rigorous numerical-correctness check against the reference Python
implementation. Refs:
~/.claude/plans/foamy-twirling-catmull.md.Problem
The
qwen3_5arch module's own doc-comment already notes "numericalcorrectness vs the reference Python is not yet validated." Adding a
vision tower stacks more hand-rolled tensor math on top of that.
Without a rigorous comparison fixture, a subtle numerical bug
(wrong RoPE base for vision attention, off-by-one in patch
position embedding, projector bias missing, etc.) could go
unnoticed and surface as gradual quality degradation rather than a
crash — exactly the failure mode that's hardest to debug.
Scope
script/that loads the same model viatransformers, encodes a known image, and dumps:crates/neuron/tests/vision_numerical.rsthat loads the same model, replays the same image+prompt, and
asserts the produced tensors match within
1e-3(tunable).crates/neuron/tests/fixtures/vision/directory rather thanrunning Python in CI; document how to regenerate.
Acceptance
Qwen3.6-27B (the latter once deployment lands).
surfaces a clear failure when a deliberately-mutated weight or
off-by-one is introduced.
model is retrained or the iteration target changes.
Blocked by
Stage A–C of the vision plan. The Stage D quality benchmark is a
coarser substitute; this issue tracks the rigorous version.
References
~/.claude/plans/foamy-twirling-catmull.mdcrates/neuron/src/harness/arch/qwen3_5/mod.rs:1-65