cortex

helexa/cortex

Fork 0

Files

History

rob thijssen e7eb3dab6a

build-prerelease / Resolve version stamps (push) Successful in 37s

Details

CI / Format (push) Successful in 39s

Details

CI / Clippy (push) Successful in 2m19s

Details

CI / Test (push) Successful in 4m50s

Details

build-prerelease / Build cortex binary (push) Successful in 4m21s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

build-prerelease / Build neuron-blackwell (push) Successful in 3m41s

Details

build-prerelease / Package cortex RPM (push) Successful in 1m27s

Details

build-prerelease / Build neuron-ampere (push) Successful in 4m58s

Details

build-prerelease / Build neuron-ada (push) Successful in 5m8s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m53s

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m52s

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m44s

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 58s

Details

feat(stage-8c): full-attention layer + decoder + Model + ForCausalLM for qwen3_5

Completes the single-GPU dense path for Qwen3-Next (Qwen3.6's
architecture). The four new modules wrap the substantive
`linear_attn.rs` (landed previously) with the rest of the
transformer:

- `arch/qwen3_5/rope.rs` — text-side rotary embedding. MRoPE is
  simplified to plain RoPE (the three position grids collapse to one
  for text-only inference); uses candle's `rope_slow` for the
  GLM-style rotate-half rotation.
- `arch/qwen3_5/mlp.rs` — Qwen3_5MLP (SwiGLU: gate/up/down, bias=False).
- `arch/qwen3_5/full_attn.rs` — Qwen3_5Attention with the two
  Qwen3-Next quirks:
  - `q_proj` widened to `2 * num_heads * head_dim`; second half
    sigmoid'd and multiplied into the attention output before `o_proj`.
  - q_norm/k_norm use the `(1+w)*x` RmsNorm variant.
- `arch/qwen3_5/decoder.rs` — Qwen3_5DecoderLayer dispatching on
  `layer_types[i]` to either Full attention or GatedDeltaNet.

`arch/qwen3_5/mod.rs` gets the real `Qwen3_5Model` (embedding + layer
stack + final norm) and `Qwen3_5ForCausalLM` (model + lm_head). The
forward returns `[B, 1, vocab]` to match `qwen3_dense`; the harness's
`squeeze_to_vocab` handles either shape.

Switch: `candle.rs::load_arch_dense` for `model_type=qwen3_5` now
builds a `ShardedVarBuilder` instead of a plain VarBuilder. The
sharded backend falls through to the unsharded path when
`world_size=1`, so single-GPU load is zero-cost; this lets the
forthcoming `tp_qwen3_5.rs` reuse the same load functions without a
second copy.

Verified: cargo build CPU + --features cuda inside the patched
container; clippy clean on both; 32 lib tests still pass. The
ForCausalLM forward no longer bails — but numerical correctness vs
the Python reference hasn't been validated yet (that's the next
step, with the Tbilisi probe).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 15:52:33 +03:00

src

feat(stage-8c): full-attention layer + decoder + Model + ForCausalLM for qwen3_5

2026-05-20 15:52:33 +03:00

tests

Stage 7a-ii: real NCCL handshake behind the worker pool

2026-05-19 16:40:01 +03:00

Cargo.toml

fix(tp): add half dep + drop double-wrapped .w() on CudaDevice::alloc

2026-05-19 19:11:59 +03:00