docs: per-device worker thread architecture (phase 5 of refactor)
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 36s
CI / Format (push) Successful in 36s
CI / Clippy (push) Successful in 2m18s
build-prerelease / Build neuron-blackwell (push) Successful in 3m39s
CI / Test (push) Successful in 5m10s
build-prerelease / Build cortex binary (push) Successful in 4m40s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Package cortex RPM (push) Successful in 1m22s
build-prerelease / Build neuron-ampere (push) Successful in 5m16s
build-prerelease / Build neuron-ada (push) Successful in 4m58s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m5s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m39s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 10m36s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m0s

Closes the per-device CUDA context-ownership refactor planned at
~/.claude/plans/plan-the-per-device-worker-abstract-micali.md.

CLAUDE.md:
- New "Per-device worker thread (neuron)" section under Key design
  decisions, covering the three load-bearing properties (context
  locality, drop safety, poisoning blast radius), the CPU-fallback
  exception, and pointers to the canonical narrative in
  crates/neuron/src/harness/device_worker/mod.rs's module doc-comment.
- New 2026-05-27 addendum dating the migration and naming the four
  PR commits (Phase 1: 081b532, Phase 2: b179204, Phase 3: 76ab24d,
  Phase 4: b4f3576). Same convention as the 2026-04-15 and 2026-05-18
  addenda.

README.md:
- One paragraph in "Node setup" noting the per-device thread pattern
  with a pointer to CLAUDE.md and the device_worker module.

No code changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-27 11:15:43 +03:00
parent b4f3576d82
commit c4954e0eed
2 changed files with 106 additions and 0 deletions

View File

@@ -61,6 +61,16 @@ Each GPU node runs `neuron` (listening on `:13131`). Neuron uses
huggingface/candle for in-process inference — there is no external
inference subprocess to manage.
Inside the daemon, every CUDA device gets one dedicated OS thread
(named `cuda-dev-N`) that owns the device's CUDA context for the
daemon's lifetime. Model loads, forward passes, KV-cache resets,
NCCL collectives, VRAM queries, and unloads all route through that
thread via a job channel; tensors never escape it alive. This pins
context binding to a known thread, makes the CUDA Drop contract
structurally safe, and isolates driver-error poisoning to one worker
rather than the whole process. See `CLAUDE.md` for the design
rationale and `crates/neuron/src/harness/device_worker/` for the code.
The neuron RPM (`helexa-neuron`) ships a systemd unit:
```sh