fix(qwen3_5): tensor names are under model.language_model.*, not model.*
Some checks failed
CI / Format (push) Waiting to run
CI / Clippy (push) Waiting to run
CI / Test (push) Waiting to run
build-prerelease / Resolve version stamps (push) Has started running
build-prerelease / Build cortex binary (push) Has been cancelled
build-prerelease / Build neuron-blackwell (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package cortex RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
CI / Build cortex SRPM (push) Has been cancelled
CI / Build neuron SRPM (push) Has been cancelled
CI / Publish cortex to COPR (push) Has been cancelled
CI / Publish neuron to COPR (push) Has been cancelled
CI / Bump version in source (push) Has been cancelled
Some checks failed
CI / Format (push) Waiting to run
CI / Clippy (push) Waiting to run
CI / Test (push) Waiting to run
build-prerelease / Resolve version stamps (push) Has started running
build-prerelease / Build cortex binary (push) Has been cancelled
build-prerelease / Build neuron-blackwell (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package cortex RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
CI / Build cortex SRPM (push) Has been cancelled
CI / Build neuron SRPM (push) Has been cancelled
CI / Publish cortex to COPR (push) Has been cancelled
CI / Publish neuron to COPR (push) Has been cancelled
CI / Bump version in source (push) Has been cancelled
Qwen3-Next is a multimodal architecture whose text core sits under
`model.language_model.*` — sibling to `model.visual.*` (vision tower)
and to top-level `lm_head` / `mtp.*`. Every text-side tensor in the
safetensors files carries that prefix:
model.language_model.embed_tokens.weight
model.language_model.layers.{i}.{input,post_attention}_layernorm.weight
model.language_model.layers.{i}.linear_attn.{in_proj_*, conv1d.weight, A_log, dt_bias, norm.weight, out_proj.weight}
model.language_model.layers.{i}.self_attn.{q,k,v,o}_proj.weight + {q,k}_norm.weight
model.language_model.layers.{i}.mlp.{gate,up,down}_proj.weight
model.language_model.norm.weight
lm_head.weight (top-level; not under language_model)
The single-pre-emptive fix is in Qwen3_5Model::load — derive a
`text_vb = vb.pp("model.language_model")` once and walk
embed_tokens / layers / norm from there. `lm_head` stays at the
top-level VB; that path was already correct.
The non-text tensors (`model.visual.*`, `mtp.*`) are ignored: we
don't reference them, so the safetensors mmap is fine even though
the bytes are loaded into the address space.
After this, the load that was failing at
"cannot find tensor model.embed_tokens.weight" should proceed to
materialising the actual layer weights — where any further bugs
will be substantive architecture issues rather than naming ones.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -223,7 +223,15 @@ impl Qwen3_5Model {
|
|||||||
let dtype = vb.dtype();
|
let dtype = vb.dtype();
|
||||||
let device = vb.device().clone();
|
let device = vb.device().clone();
|
||||||
|
|
||||||
let embed_vb = vb.pp("model.embed_tokens");
|
// Qwen3-Next is a multimodal architecture whose text core lives
|
||||||
|
// under `model.language_model.*` — sibling to `model.visual.*`
|
||||||
|
// (the vision tower) and to top-level `lm_head` / `mtp.*`.
|
||||||
|
// Every text-side tensor in the safetensors files is under
|
||||||
|
// this prefix; we ignore the vision and MTP weights for
|
||||||
|
// language-model inference.
|
||||||
|
let text_vb = vb.pp("model.language_model");
|
||||||
|
|
||||||
|
let embed_vb = text_vb.pp("embed_tokens");
|
||||||
let embed_weight = embed_vb
|
let embed_weight = embed_vb
|
||||||
.get((cfg.vocab_size, cfg.hidden_size), "weight")
|
.get((cfg.vocab_size, cfg.hidden_size), "weight")
|
||||||
.with_context(|| format!("load '{}/weight'", embed_vb.prefix()))?;
|
.with_context(|| format!("load '{}/weight'", embed_vb.prefix()))?;
|
||||||
@@ -240,7 +248,7 @@ impl Qwen3_5Model {
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
let vb_l = vb.pp("model.layers");
|
let vb_l = text_vb.pp("layers");
|
||||||
let mut layers = Vec::with_capacity(cfg.num_hidden_layers);
|
let mut layers = Vec::with_capacity(cfg.num_hidden_layers);
|
||||||
for i in 0..cfg.num_hidden_layers {
|
for i in 0..cfg.num_hidden_layers {
|
||||||
layers.push(Qwen3_5DecoderLayer::load(
|
layers.push(Qwen3_5DecoderLayer::load(
|
||||||
@@ -251,7 +259,8 @@ impl Qwen3_5Model {
|
|||||||
)?);
|
)?);
|
||||||
}
|
}
|
||||||
|
|
||||||
let norm = Qwen3_5RmsNorm::load(&vb.pp("model.norm"), cfg.hidden_size, cfg.rms_norm_eps)?;
|
let norm =
|
||||||
|
Qwen3_5RmsNorm::load(&text_vb.pp("norm"), cfg.hidden_size, cfg.rms_norm_eps)?;
|
||||||
|
|
||||||
Ok(Self {
|
Ok(Self {
|
||||||
embed_tokens,
|
embed_tokens,
|
||||||
|
|||||||
Reference in New Issue
Block a user