feat(stage-8d-7): direct safetensors fused-region loader
Some checks failed
build-prerelease / Package cortex RPM (push) Blocked by required conditions
CI / Format (push) Successful in 35s
build-prerelease / Resolve version stamps (push) Successful in 39s
CI / Clippy (push) Successful in 2m18s
CI / Test (push) Successful in 4m28s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 3m51s
build-prerelease / Build cortex binary (push) Successful in 4m13s
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
Some checks failed
build-prerelease / Package cortex RPM (push) Blocked by required conditions
CI / Format (push) Successful in 35s
build-prerelease / Resolve version stamps (push) Successful in 39s
CI / Clippy (push) Successful in 2m18s
CI / Test (push) Successful in 4m28s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 3m51s
build-prerelease / Build cortex binary (push) Successful in 4m13s
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
Replaces load_fused_qkv_slice_2d/_3d with reads from a separate MmapedSafetensors handle. Each per-rank fused tensor is built by reading the three region byte-slices directly from the mmap, concatenating them host-side, and uploading as one device allocation — no full-fused-tensor device materialisation. The prior approach allocated a ~100 MB transient device tensor per linear-attention layer; on Qwen3.6-27B with 48 linear-attn layers that's ~4.8 GB of allocator churn during load — enough to fragment the cuda caching allocator on a tight-VRAM 32 GB consumer GPU, which is what triggered the layer-22 up_proj OOM seen on beast. Threading: MmapedSafetensors flows worker → ForCausalLM → Model → DecoderLayer → GatedDeltaNet::load. Both leader (mod.rs) and worker (worker.rs) construct their own mmap; Linux's page cache shares the underlying pages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -76,6 +76,11 @@ cudarc = { version = "0.19", optional = true, default-features = false, features
|
||||
half = { version = "2.5", optional = true }
|
||||
tokenizers = { version = "0.22", default-features = false, features = ["onig"] }
|
||||
hf-hub = { version = "0.4", features = ["tokio"] }
|
||||
# Direct dep on `safetensors` (re-exported by candle but its `TensorView`
|
||||
# / `slice::IndexOp` types are public-but-not-re-exported). Used by the
|
||||
# tp `fused_load` module to read per-rank slices of fused QKV tensors
|
||||
# without materialising the full tensor on device.
|
||||
safetensors = "0.7"
|
||||
|
||||
[dev-dependencies]
|
||||
tokio = { workspace = true, features = ["test-util"] }
|
||||
|
||||
Reference in New Issue
Block a user