Vision: deploy on Qwen3.6-27B (production validation) #13

Open
opened 2026-06-01 13:18:18 +00:00 by grenade · 0 comments
Owner

Context

Deferred during planning of the initial vision capability (umbrella:
#3). Stage A–C of the vision plan develop against a smaller Qwen-VL
iteration target to keep cycle time tractable; this issue tracks
the deploy-on-target step. Refs:
~/.claude/plans/foamy-twirling-catmull.md.

Problem

Iterating directly against Qwen3.6-27B costs full model load time
(minutes) per test cycle and burns a TP setup for each attempt.
Stage A nominates a smaller variant (Stage A0 investigation
identifies which — most likely a Qwen3-VL family member if released,
else Qwen2-VL-2B-Instruct as architecture-adjacent fallback). Once
Stages A–C work end-to-end against the iteration model, this issue
covers what's needed to put the same Rust code in front of the
real 27B in production.

Scope

  • Load actual Qwen3.6-27B vision weights from
    /archive3/llm-cache/models--Qwen--Qwen3.6-27B/... (or wherever
    the production cache lives on beast/benjy/quadbrat).
  • Reconcile any architecture mismatches between the iteration model
    and the 27B: vision-tower depth, hidden size, patch size,
    projector layout. If the iteration model's vision.rs doesn't
    generalise cleanly, factor it.
  • Validate end-to-end on beast (single-GPU ISQ first, then TP via
    the TP-vision issue) and benjy and quadbrat.
  • Update models.example.toml to mark Qwen3.6-27B as vision-capable
    and document the deployment.

Acceptance

  • The issue #3 curl repro against beast (or hanzalova → beast)
    with Qwen/Qwen3.6-27B returns coherent image-grounded text
    with image-token-bearing prompt_tokens.
  • Quality benchmark from Stage D passes its threshold on the 27B.
  • Operators can flip capabilities in their /v1/models and see
    Qwen3.6-27B advertised as vision-capable without any code change.

Blocked by

Stage A–C of the vision plan. The TP-vision issue is the natural
co-traveller for full-quality 27B serving on the existing fleet.

References

## Context Deferred during planning of the initial vision capability (umbrella: #3). Stage A–C of the vision plan develop against a smaller Qwen-VL iteration target to keep cycle time tractable; this issue tracks the deploy-on-target step. Refs: `~/.claude/plans/foamy-twirling-catmull.md`. ## Problem Iterating directly against Qwen3.6-27B costs full model load time (minutes) per test cycle and burns a TP setup for each attempt. Stage A nominates a smaller variant (Stage A0 investigation identifies which — most likely a Qwen3-VL family member if released, else Qwen2-VL-2B-Instruct as architecture-adjacent fallback). Once Stages A–C work end-to-end against the iteration model, this issue covers what's needed to put the same Rust code in front of the real 27B in production. ## Scope - Load actual Qwen3.6-27B vision weights from `/archive3/llm-cache/models--Qwen--Qwen3.6-27B/...` (or wherever the production cache lives on beast/benjy/quadbrat). - Reconcile any architecture mismatches between the iteration model and the 27B: vision-tower depth, hidden size, patch size, projector layout. If the iteration model's `vision.rs` doesn't generalise cleanly, factor it. - Validate end-to-end on beast (single-GPU ISQ first, then TP via the TP-vision issue) and benjy and quadbrat. - Update `models.example.toml` to mark Qwen3.6-27B as vision-capable and document the deployment. ## Acceptance - The issue #3 curl repro against beast (or hanzalova → beast) with `Qwen/Qwen3.6-27B` returns coherent image-grounded text with image-token-bearing `prompt_tokens`. - Quality benchmark from Stage D passes its threshold on the 27B. - Operators can flip `capabilities` in their `/v1/models` and see Qwen3.6-27B advertised as vision-capable without any code change. ## Blocked by Stage A–C of the vision plan. The TP-vision issue is the natural co-traveller for full-quality 27B serving on the existing fleet. ## References - Plan: `~/.claude/plans/foamy-twirling-catmull.md` - Umbrella: #3 - Stage A0 investigation note about iteration target selection
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: helexa/cortex#13