Vision: deploy on Qwen3.6-27B (production validation) #13

New Issue

grenade · 2026-06-01T13:18:18Z

grenade commented

2026-06-01 13:18:18 +00:00

Context

Deferred during planning of the initial vision capability (umbrella:
#3). Stage A–C of the vision plan develop against a smaller Qwen-VL
iteration target to keep cycle time tractable; this issue tracks
the deploy-on-target step. Refs:
~/.claude/plans/foamy-twirling-catmull.md.

Problem

Iterating directly against Qwen3.6-27B costs full model load time
(minutes) per test cycle and burns a TP setup for each attempt.
Stage A nominates a smaller variant (Stage A0 investigation
identifies which — most likely a Qwen3-VL family member if released,
else Qwen2-VL-2B-Instruct as architecture-adjacent fallback). Once
Stages A–C work end-to-end against the iteration model, this issue
covers what's needed to put the same Rust code in front of the
real 27B in production.

Scope

Load actual Qwen3.6-27B vision weights from
/archive3/llm-cache/models--Qwen--Qwen3.6-27B/... (or wherever
the production cache lives on beast/benjy/quadbrat).
Reconcile any architecture mismatches between the iteration model
and the 27B: vision-tower depth, hidden size, patch size,
projector layout. If the iteration model's vision.rs doesn't
generalise cleanly, factor it.
Validate end-to-end on beast (single-GPU ISQ first, then TP via
the TP-vision issue) and benjy and quadbrat.
Update models.example.toml to mark Qwen3.6-27B as vision-capable
and document the deployment.

Acceptance

The issue #3 curl repro against beast (or hanzalova → beast)
with Qwen/Qwen3.6-27B returns coherent image-grounded text
with image-token-bearing prompt_tokens.
Quality benchmark from Stage D passes its threshold on the 27B.
Operators can flip capabilities in their /v1/models and see
Qwen3.6-27B advertised as vision-capable without any code change.

Blocked by

Stage A–C of the vision plan. The TP-vision issue is the natural
co-traveller for full-quality 27B serving on the existing fleet.

References

Plan: ~/.claude/plans/foamy-twirling-catmull.md
Umbrella: Image content (`image_url`) is dropped — multimodal chat requests are processed as text-only (#3)
Stage A0 investigation note about iteration target selection

## Context Deferred during planning of the initial vision capability (umbrella: #3). Stage A–C of the vision plan develop against a smaller Qwen-VL iteration target to keep cycle time tractable; this issue tracks the deploy-on-target step. Refs: `~/.claude/plans/foamy-twirling-catmull.md`. ## Problem Iterating directly against Qwen3.6-27B costs full model load time (minutes) per test cycle and burns a TP setup for each attempt. Stage A nominates a smaller variant (Stage A0 investigation identifies which — most likely a Qwen3-VL family member if released, else Qwen2-VL-2B-Instruct as architecture-adjacent fallback). Once Stages A–C work end-to-end against the iteration model, this issue covers what's needed to put the same Rust code in front of the real 27B in production. ## Scope - Load actual Qwen3.6-27B vision weights from `/archive3/llm-cache/models--Qwen--Qwen3.6-27B/...` (or wherever the production cache lives on beast/benjy/quadbrat). - Reconcile any architecture mismatches between the iteration model and the 27B: vision-tower depth, hidden size, patch size, projector layout. If the iteration model's `vision.rs` doesn't generalise cleanly, factor it. - Validate end-to-end on beast (single-GPU ISQ first, then TP via the TP-vision issue) and benjy and quadbrat. - Update `models.example.toml` to mark Qwen3.6-27B as vision-capable and document the deployment. ## Acceptance - The issue #3 curl repro against beast (or hanzalova → beast) with `Qwen/Qwen3.6-27B` returns coherent image-grounded text with image-token-bearing `prompt_tokens`. - Quality benchmark from Stage D passes its threshold on the 27B. - Operators can flip `capabilities` in their `/v1/models` and see Qwen3.6-27B advertised as vision-capable without any code change. ## Blocked by Stage A–C of the vision plan. The TP-vision issue is the natural co-traveller for full-quality 27B serving on the existing fleet. ## References - Plan: `~/.claude/plans/foamy-twirling-catmull.md` - Umbrella: #3 - Stage A0 investigation note about iteration target selection

grenade referenced this issue from a commit

2026-06-02 08:40:50 +00:00

feat(neuron): Stage A — vision tower load + preprocessor for Qwen3.6

grenade referenced this issue from a commit

2026-06-02 12:33:04 +00:00

feat(neuron): Stage B — end-to-end text+image chat for Qwen3.6

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: helexa/cortex#13