fix(tp): import BackendStorage trait for CudaStorage methods

Stage 7b-iii (1/2) introduced AllReduce with `s.device()` and `s.dtype()` calls on `&CudaStorage`. Both come from the `candle_core::backend::BackendStorage` trait, which wasn't imported — fine on CPU builds (the cuda_fwd block was cfg-gated out) but the prerelease cuda build hit E0599. Also drop the unused `cudarc::driver::DeviceSlice` import inside cuda_fwd — `CudaSlice::len()` is an inherent method on cudarc 0.19, not a trait method. Caught by run 2894 (build-neuron-{blackwell,ampere}); CPU clippy + tests stay green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:32:05 +03:00
parent 46527d7804
commit 12549c9aed
1 changed files with 1 additions and 2 deletions
--- a/crates/neuron/src/harness/tp/all_reduce.rs
+++ b/crates/neuron/src/harness/tp/all_reduce.rs
@@ -20,6 +20,7 @@
 #![cfg(feature = "cuda")]
 use candle_core::backend::BackendStorage;
 use candle_core::cuda_backend::WrapErr;
 use candle_core::{CpuStorage, CudaStorage, CustomOp1, DType, Layout, Result, Shape};
 use cudarc::nccl::{Comm, ReduceOp};
@@ -61,8 +62,6 @@ impl CustomOp1 for AllReduce {
    }
    fn cuda_fwd(&self, s: &CudaStorage, l: &Layout) -> Result<(CudaStorage, Shape)> {
        use cudarc::driver::DeviceSlice;
        // Reject non-contiguous inputs explicitly — copying them
        // server-side would mask shape bugs (a TP layer feeding a
        // strided activation into all_reduce is almost certainly a