feat(neuron): Stage A — vision tower load + preprocessor for Qwen3.6
All checks were successful
CI / CUDA type-check (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 30s
CI / Format (push) Successful in 28s
CI / Clippy (push) Successful in 2m35s
build-prerelease / Build cortex binary (push) Successful in 5m13s
build-prerelease / Build neuron-blackwell (push) Successful in 6m23s
build-prerelease / Build neuron-ampere (push) Successful in 7m56s
CI / Test (push) Successful in 7m11s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Build neuron-ada (push) Successful in 5m30s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m56s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 4m25s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s

Stage A of the vision implementation plan
(doc/vision-qwen3_6-spec.md). Builds the vision tower scaffolding
that today's silent-drop failure mode (issue #3) needs — the
Qwen3.6 ViT loads from `model.visual.*`, runs forward producing
post-merger LM-side image embeddings, and routes through the
device worker via a new `Job::EncodeImage`. No LM splice yet —
that's Stage B.

Refs #3 (umbrella). Deferred sub-stages tracked as #12 (TP-vision),
#13 (27B production deploy), #14 (dynamic resolution), #15
(numerical validation).

What landed:

- **A0 — investigation**: pulled config.json, preprocessor_config.json,
  chat_template.jinja, and safetensors index from beast's local
  Qwen3.6-27B cache. Documented in doc/vision-qwen3_6-spec.md with
  exact tensor shapes for every `model.visual.*` weight. Confirms
  27-block ViT with `hidden_size=1152`, `patch_size=16`,
  `spatial_merge_size=2`, `out_hidden_size=5120`. Vision tower lives
  in 2 of the 15 safetensors shards.

- **A1 — deps + scaffolding**: added `image = "0.25"` (default-
  features off, PNG/JPEG/WebP/BMP/GIF) and `base64 = "0.22"` to
  crates/neuron/Cargo.toml. Created `harness::preprocess` and
  `harness::arch::qwen3_5::vision` modules.

- **A2 — preprocess.rs**: `decode_data_uri` strips
  `data:image/...;base64,...` → image bytes → `image::DynamicImage`
  (rejecting `http(s)://` URLs to avoid SSRF/recursion); `preprocess`
  resizes to a fixed `PreprocessProfile::qwen3_6()` (448×448),
  normalises to `[-1, 1]` per the model's mean/std=0.5, emits
  row-major `(3, H, W)` f32. 9 unit tests covering data URI parse,
  decode failure paths, grayscale-to-RGB promotion, and the
  exact-value normalisation contract.

- **A3 — vision.rs**: `VisionTower` struct with `patch_embed: Conv2d`,
  learned `pos_embed: Embedding`, 27 `VisionBlock`s (pre-LN +
  multi-head self-attention with fused QKV + GELU-tanh MLP +
  residuals), and `VisionMerger` (LayerNorm → 2×2 spatial concat →
  linear_fc1 → GELU-tanh → linear_fc2 to LM hidden_size).
  Includes the Conv3d→Conv2d fold trick documented at the top of
  the file — the published patch_embed.proj.weight is 5D
  `(1152, 3, 2, 16, 16)` but candle 0.10 has no Conv3d; for static
  images we sum-collapse the temporal axis. Video would need real
  Conv3d. 5 unit tests including the exact `gelu_pytorch_tanh`
  reference values from PyTorch.

- **A4 — wire vision into Qwen3_5ForCausalLM**: extended `Config`
  with optional `vision_config: Option<VisionConfig>` and
  `image_token_id`; `Qwen3_5ForCausalLM::new` now loads the vision
  tower when present, exposes `has_vision()` and `vision()` so the
  HTTP layer can advertise capability and so the encode path can
  reach it.

- **A5 — device worker `Job::EncodeImage`**: new job variant carrying
  CPU-side `(C, H, W)` pixels. Dispatch handler reconstructs the
  tensor on the worker's device, calls `arch.encode_image(image)`,
  copies the result back to CPU as flat `Vec<f32>`. Keeps the
  "tensors don't escape the worker" invariant. Poisoned-worker
  drain path handles the new variant.

- **A6 — dispatch round-trip test**: `encode_image_routes_to_dispatch_
  and_errors_on_unknown_handle` proves the channel/dispatch wiring
  works end-to-end via the CPU device worker (errors on unknown
  ArchHandle, which is the expected behaviour without a loaded
  model — real-weights validation happens in Stage B when the LM
  splice path exists).

CI gate: cargo fmt --check, cargo clippy --workspace --all-targets
-- -D warnings, cargo test --workspace (all 28 test groups ok,
zero failures). New test counts: +9 in preprocess, +5 in vision,
+1 in device_worker.

Out of scope (deferred):
- LM-side splice of image embeddings at `<|image_pad|>` positions
  → Stage B.
- Streaming SSE for vision-bearing chat completions → Stage C.
- Reject `image_url` with HTTP 400 for non-vision models /
  advertise `capabilities` in /v1/models → Stage C.
- TP-vision (#12), 27B production deploy (#13), dynamic resolution
  (#14), numerical validation (#15).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-06-02 11:40:47 +03:00
parent 5c520c7e90
commit 7df84fed8f
11 changed files with 1413 additions and 1 deletions

117
Cargo.lock generated
View File

@@ -472,6 +472,12 @@ version = "1.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1fd0f2584146f6f2ef48085050886acf353beff7305ebd1ae69500e27c67f64b"
[[package]]
name = "byteorder-lite"
version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8f1fe948ff07f4bd06c30984e69f5b4899c516a3ef74f34df92a2df2ab535495"
[[package]]
name = "bytes"
version = "1.11.1"
@@ -668,6 +674,12 @@ dependencies = [
"cc",
]
[[package]]
name = "color_quant"
version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3d7b894f5411737b7867f4827955924d7c254fc9f4d91a6aad6b097804b1018b"
[[package]]
name = "colorchoice"
version = "1.0.5"
@@ -1223,6 +1235,15 @@ version = "2.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9f1f227452a390804cdb637b74a86990f2a7d7ba4b7d5693aac9b4dd6defd8d6"
[[package]]
name = "fdeflate"
version = "0.3.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e6853b52649d4ac5c0bd02320cddc5ba956bdb407c4b75a2c6b75bf51500f8c"
dependencies = [
"simd-adler32",
]
[[package]]
name = "figment"
version = "0.10.19"
@@ -1731,6 +1752,16 @@ dependencies = [
"wasip3",
]
[[package]]
name = "gif"
version = "0.14.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ee8cfcc411d9adbbaba82fb72661cc1bcca13e8bba98b364e62b2dba8f960159"
dependencies = [
"color_quant",
"weezl",
]
[[package]]
name = "glob"
version = "0.3.3"
@@ -2135,6 +2166,34 @@ dependencies = [
"icu_properties",
]
[[package]]
name = "image"
version = "0.25.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85ab80394333c02fe689eaf900ab500fbd0c2213da414687ebf995a65d5a6104"
dependencies = [
"bytemuck",
"byteorder-lite",
"color_quant",
"gif",
"image-webp",
"moxcms",
"num-traits",
"png",
"zune-core",
"zune-jpeg",
]
[[package]]
name = "image-webp"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "525e9ff3e1a4be2fbea1fdf0e98686a6d98b4d8f937e1bf7402245af1909e8c3"
dependencies = [
"byteorder-lite",
"quick-error",
]
[[package]]
name = "indexmap"
version = "1.9.3"
@@ -2498,6 +2557,16 @@ dependencies = [
"syn",
]
[[package]]
name = "moxcms"
version = "0.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bb85c154ba489f01b25c0d36ae69a87e4a1c73a72631fc6c0eb6dde34a73e44b"
dependencies = [
"num-traits",
"pxfm",
]
[[package]]
name = "native-tls"
version = "0.2.18"
@@ -2522,6 +2591,7 @@ dependencies = [
"anyhow",
"async-trait",
"axum",
"base64 0.22.1",
"candle-core",
"candle-nn",
"candle-transformers",
@@ -2533,6 +2603,7 @@ dependencies = [
"futures",
"half",
"hf-hub",
"image",
"minijinja",
"reqwest",
"safetensors 0.7.0",
@@ -2861,6 +2932,19 @@ version = "0.3.33"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "19f132c84eca552bf34cab8ec81f1c1dcc229b811638f9d283dceabe58c5569e"
[[package]]
name = "png"
version = "0.18.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "60769b8b31b2a9f263dae2776c37b1b28ae246943cf719eb6946a1db05128a61"
dependencies = [
"bitflags",
"crc32fast",
"fdeflate",
"flate2",
"miniz_oxide",
]
[[package]]
name = "polling"
version = "3.11.0"
@@ -2974,6 +3058,12 @@ version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "40e24eee682d89fb193496edf918a7f407d30175b2e785fe057e4392dfd182e0"
[[package]]
name = "pxfm"
version = "0.1.29"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e0c5ccf5294c6ccd63a74f1565028353830a9c2f5eb0c682c355c471726a6e3f"
[[package]]
name = "quanta"
version = "0.12.6"
@@ -2989,6 +3079,12 @@ dependencies = [
"winapi",
]
[[package]]
name = "quick-error"
version = "2.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a993555f31e5a609f617c12db6250dedcac1b0a85076912c436e6fc9b2c8e6a3"
[[package]]
name = "quinn"
version = "0.11.9"
@@ -4627,6 +4723,12 @@ dependencies = [
"rustls-pki-types",
]
[[package]]
name = "weezl"
version = "0.1.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a28ac98ddc8b9274cb41bb4d9d4d5c425b6020c50c46f25559911905610b4a88"
[[package]]
name = "which"
version = "7.0.3"
@@ -5164,3 +5266,18 @@ name = "zmij"
version = "1.0.21"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa"
[[package]]
name = "zune-core"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cb8a0807f7c01457d0379ba880ba6322660448ddebc890ce29bb64da71fb40f9"
[[package]]
name = "zune-jpeg"
version = "0.5.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "27bc9d5b815bc103f142aa054f561d9187d191692ec7c2d1e2b4737f8dbd7296"
dependencies = [
"zune-core",
]