fix(tp): import BackendStorage trait for CudaStorage methods
Some checks failed
build-prerelease / Resolve version stamps (push) Successful in 32s
CI / Format (push) Successful in 37s
CI / Clippy (push) Successful in 3m9s
CI / Test (push) Successful in 4m28s
build-prerelease / Build neuron-blackwell (push) Failing after 3m41s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build cortex binary (push) Successful in 4m32s
build-prerelease / Package cortex RPM (push) Successful in 1m23s
build-prerelease / Build neuron-ampere (push) Failing after 4m45s
build-prerelease / Build neuron-ada (push) Failing after 5m13s
build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped

Stage 7b-iii (1/2) introduced AllReduce with `s.device()` and
`s.dtype()` calls on `&CudaStorage`. Both come from the
`candle_core::backend::BackendStorage` trait, which wasn't imported —
fine on CPU builds (the cuda_fwd block was cfg-gated out) but the
prerelease cuda build hit E0599.

Also drop the unused `cudarc::driver::DeviceSlice` import inside
cuda_fwd — `CudaSlice::len()` is an inherent method on cudarc 0.19,
not a trait method.

Caught by run 2894 (build-neuron-{blackwell,ampere}); CPU clippy +
tests stay green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-19 18:32:05 +03:00
parent 46527d7804
commit 12549c9aed

View File

@@ -20,6 +20,7 @@
#![cfg(feature = "cuda")] #![cfg(feature = "cuda")]
use candle_core::backend::BackendStorage;
use candle_core::cuda_backend::WrapErr; use candle_core::cuda_backend::WrapErr;
use candle_core::{CpuStorage, CudaStorage, CustomOp1, DType, Layout, Result, Shape}; use candle_core::{CpuStorage, CudaStorage, CustomOp1, DType, Layout, Result, Shape};
use cudarc::nccl::{Comm, ReduceOp}; use cudarc::nccl::{Comm, ReduceOp};
@@ -61,8 +62,6 @@ impl CustomOp1 for AllReduce {
} }
fn cuda_fwd(&self, s: &CudaStorage, l: &Layout) -> Result<(CudaStorage, Shape)> { fn cuda_fwd(&self, s: &CudaStorage, l: &Layout) -> Result<(CudaStorage, Shape)> {
use cudarc::driver::DeviceSlice;
// Reject non-contiguous inputs explicitly — copying them // Reject non-contiguous inputs explicitly — copying them
// server-side would mask shape bugs (a TP layer feeding a // server-side would mask shape bugs (a TP layer feeding a
// strided activation into all_reduce is almost certainly a // strided activation into all_reduce is almost certainly a