fix(neuron): correct nccl_state path on WorkerPool.leader_comm (#17 S2)
Some checks failed
CI / CUDA type-check (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 35s
CI / Format (push) Successful in 44s
build-prerelease / Build cortex binary (push) Successful in 4m57s
build-prerelease / Package cortex RPM (push) Successful in 1m36s
CI / Test (push) Successful in 7m10s
CI / Clippy (push) Failing after 1m21s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-ampere (push) Successful in 8m40s
build-prerelease / Build neuron-ada (push) Successful in 9m5s
build-prerelease / Build neuron-blackwell (push) Failing after 12m2s
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
Some checks failed
CI / CUDA type-check (push) Successful in 32s
build-prerelease / Resolve version stamps (push) Successful in 35s
CI / Format (push) Successful in 44s
build-prerelease / Build cortex binary (push) Successful in 4m57s
build-prerelease / Package cortex RPM (push) Successful in 1m36s
CI / Test (push) Successful in 7m10s
CI / Clippy (push) Failing after 1m21s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-ampere (push) Successful in 8m40s
build-prerelease / Build neuron-ada (push) Successful in 9m5s
build-prerelease / Build neuron-blackwell (push) Failing after 12m2s
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
`super::nccl_state` from tp/mod.rs resolves to `crate::harness::nccl_state` (nonexistent); the module is the child `nccl_state` (cf. the existing `nccl_state::generate_comm_id_hex` call). The field is cuda-gated so the non-cuda build couldn't catch it; the branch CUDA type-check flaked on the runner before compiling. Self-audited fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -253,7 +253,7 @@ pub struct WorkerPool {
|
||||
/// recovery's `unload` doesn't itself hang (#17 Stage 2). `None` if
|
||||
/// init couldn't cache it; the watchdog then logs that it can't abort.
|
||||
#[cfg(feature = "cuda")]
|
||||
leader_comm: Option<super::nccl_state::SendComm>,
|
||||
leader_comm: Option<nccl_state::SendComm>,
|
||||
}
|
||||
|
||||
/// Per-step deadline for a TP forward (#17 Stage 2). A healthy decode
|
||||
|
||||
Reference in New Issue
Block a user