Files
cortex/crates
rob thijssen 495d3f7c05
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 40s
CI / Format (push) Successful in 43s
CI / Clippy (push) Successful in 2m20s
CI / Test (push) Successful in 4m33s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build cortex binary (push) Successful in 4m19s
build-prerelease / Package cortex RPM (push) Successful in 1m25s
build-prerelease / Build neuron-blackwell (push) Successful in 3m39s
build-prerelease / Build neuron-ampere (push) Successful in 4m46s
build-prerelease / Build neuron-ada (push) Successful in 5m9s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m58s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m9s
fix(qwen3_5): promote beta to F32 alongside q/k/v in delta rule
The single-GPU dense load of Qwen/Qwen3.5-0.8B succeeded but the first
inference forward bombed with `dtype mismatch in mul, lhs: F32, rhs:
BF16`. Trace through the recurrent delta-rule loop:

  let q = (q.to_dtype(F32)? * scale)?;        // F32
  let k = k.to_dtype(F32)?;                    // F32
  let v = v.to_dtype(F32)?;                    // F32
  // g built from A_log/dt_bias                 // F32
  // beta = sigmoid(b)                          // BF16 (sigmoid preserves dtype)
  ...
  let delta = (v_t - kv_mem)?.broadcast_mul(&beta_col)?;
                ^^^^^^^^^^^^^                    ^^^^^^^^^
                F32                              BF16   ← mismatch

`g` was already F32 because it was constructed from `a_log.to_dtype(F32)`
+ `dt_bias.to_dtype(F32)` earlier in the function. `beta` came from
`sigmoid(b)` where `b` was the model dtype (BF16), so beta stayed BF16
and the multiplication tripped candle's dtype-mismatch check.

Promote beta to F32 at the same point we promote q/k/v.

Caught by the validate-neuron.sh probe against Qwen/Qwen3.5-0.8B on
beast — load returned 200, then `POST /v1/chat/completions` returned
the dtype error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 21:13:19 +03:00
..