feat(stage-8c): full-attention layer + decoder + Model + ForCausalLM for qwen3_5
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 37s
CI / Format (push) Successful in 39s
CI / Clippy (push) Successful in 2m19s
CI / Test (push) Successful in 4m50s
build-prerelease / Build cortex binary (push) Successful in 4m21s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 3m41s
build-prerelease / Package cortex RPM (push) Successful in 1m27s
build-prerelease / Build neuron-ampere (push) Successful in 4m58s
build-prerelease / Build neuron-ada (push) Successful in 5m8s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m53s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m52s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 58s
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 37s
CI / Format (push) Successful in 39s
CI / Clippy (push) Successful in 2m19s
CI / Test (push) Successful in 4m50s
build-prerelease / Build cortex binary (push) Successful in 4m21s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 3m41s
build-prerelease / Package cortex RPM (push) Successful in 1m27s
build-prerelease / Build neuron-ampere (push) Successful in 4m58s
build-prerelease / Build neuron-ada (push) Successful in 5m8s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m53s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m52s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 58s
Completes the single-GPU dense path for Qwen3-Next (Qwen3.6's
architecture). The four new modules wrap the substantive
`linear_attn.rs` (landed previously) with the rest of the
transformer:
- `arch/qwen3_5/rope.rs` — text-side rotary embedding. MRoPE is
simplified to plain RoPE (the three position grids collapse to one
for text-only inference); uses candle's `rope_slow` for the
GLM-style rotate-half rotation.
- `arch/qwen3_5/mlp.rs` — Qwen3_5MLP (SwiGLU: gate/up/down, bias=False).
- `arch/qwen3_5/full_attn.rs` — Qwen3_5Attention with the two
Qwen3-Next quirks:
- `q_proj` widened to `2 * num_heads * head_dim`; second half
sigmoid'd and multiplied into the attention output before `o_proj`.
- q_norm/k_norm use the `(1+w)*x` RmsNorm variant.
- `arch/qwen3_5/decoder.rs` — Qwen3_5DecoderLayer dispatching on
`layer_types[i]` to either Full attention or GatedDeltaNet.
`arch/qwen3_5/mod.rs` gets the real `Qwen3_5Model` (embedding + layer
stack + final norm) and `Qwen3_5ForCausalLM` (model + lm_head). The
forward returns `[B, 1, vocab]` to match `qwen3_dense`; the harness's
`squeeze_to_vocab` handles either shape.
Switch: `candle.rs::load_arch_dense` for `model_type=qwen3_5` now
builds a `ShardedVarBuilder` instead of a plain VarBuilder. The
sharded backend falls through to the unsharded path when
`world_size=1`, so single-GPU load is zero-cost; this lets the
forthcoming `tp_qwen3_5.rs` reuse the same load functions without a
second copy.
Verified: cargo build CPU + --features cuda inside the patched
container; clippy clean on both; 32 lib tests still pass. The
ForCausalLM forward no longer bails — but numerical correctness vs
the Python reference hasn't been validated yet (that's the next
step, with the Tbilisi probe).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -617,12 +617,22 @@ impl CandleHarness {
|
||||
})))
|
||||
}
|
||||
"qwen3_5" => {
|
||||
// Stage 8c scaffold: config parses, model
|
||||
// constructs, but forward bails. See
|
||||
// `arch/qwen3_5.rs` for the open architecture work.
|
||||
// Qwen3-Next needs a ShardedVarBuilder because its
|
||||
// load functions use the sharded backend (so they
|
||||
// can be reused unchanged by the future TP variant).
|
||||
// With world_size=1 the backend falls through to
|
||||
// the unsharded path, so there is no per-load cost.
|
||||
let cfg: super::arch::qwen3_5::Config = serde_json::from_str(&cfg_text)
|
||||
.context("parse Qwen3-Next (qwen3_5) config.json")?;
|
||||
let model = super::arch::qwen3_5::Qwen3_5ForCausalLM::new(cfg, vb)
|
||||
let sharded_vb = unsafe {
|
||||
candle_nn::var_builder::ShardedSafeTensors::var_builder(
|
||||
&safetensors_paths,
|
||||
dtype,
|
||||
&device_for_load,
|
||||
)
|
||||
.context("build ShardedVarBuilder for Qwen3-Next")?
|
||||
};
|
||||
let model = super::arch::qwen3_5::Qwen3_5ForCausalLM::new(cfg, sharded_vb)
|
||||
.context("build Qwen3-Next dense model")?;
|
||||
Ok(ModelArch::Qwen3_5Dense(model))
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user