Vision: dynamic image resolution (Qwen-VL min/max pixels) #14
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Deferred during planning of the initial vision capability (umbrella:
#3). Stage A–C ships fixed-resolution preprocessing; this issue
covers Qwen-VL's native variable-resolution behaviour. Refs:
~/.claude/plans/foamy-twirling-catmull.md.Problem
Qwen2-VL / Qwen3-VL natively support variable image sizes via
min_pixelsandmax_pixelsbounds. The reference PythonQwen2VLImageProcessorpicks a bucket within those bounds based onthe image's aspect ratio and produces a variable patch count per
image. This is meaningful for quality:
more vertical, preserving aspect-ratio-relevant detail.
thumbnails downsample sensibly.
Stage A ships fixed resolution (e.g. 448×448 → 256 patches) to keep
the preprocessor and patch-count math simple. This issue tracks the
upgrade to dynamic resolution to match the reference and avoid
quality regressions on non-square input.
Scope
min_pixels/max_pixels(or the equivalent keys in Qwen3.6— confirmed at Stage A0) from
preprocessor_config.json.Qwen2VLImageProcessor(or equivalent for Qwen3.6) intocrates/neuron/src/harness/preprocess.rs.build_prompt_for_requestso the per-image
<|image_pad|>expansion uses the actual patchcount for that image rather than a fixed constant.
chat_template.rsinvocation so the template receives thecomputed
grid_thw(temporal-height-width tuple) per image, whichis what the Qwen-VL templates branch on.
Acceptance
produce different patch counts, both reflected in
prompt_tokens.fixed-resolution baseline on documents / OCR-style content.
Blocked by
Stage B of the vision plan must ship first; this is a refinement
that lives on top of the working fixed-resolution path.
References
~/.claude/plans/foamy-twirling-catmull.mdtransformers/models/qwen2_vl/image_processing_qwen2_vl.pyin thePython HF repo (
smart_resize+select_best_resolution).crates/neuron/src/harness/preprocess.rs,crates/neuron/src/harness/chat_template.rs,crates/neuron/src/harness/candle.rs::build_prompt_for_request.