Closes the per-device CUDA context-ownership refactor planned at
~/.claude/plans/plan-the-per-device-worker-abstract-micali.md.
CLAUDE.md:
- New "Per-device worker thread (neuron)" section under Key design
decisions, covering the three load-bearing properties (context
locality, drop safety, poisoning blast radius), the CPU-fallback
exception, and pointers to the canonical narrative in
crates/neuron/src/harness/device_worker/mod.rs's module doc-comment.
- New 2026-05-27 addendum dating the migration and naming the four
PR commits (Phase 1: 081b532, Phase 2: b179204, Phase 3: 76ab24d,
Phase 4: b4f3576). Same convention as the 2026-04-15 and 2026-05-18
addenda.
README.md:
- One paragraph in "Node setup" noting the per-device thread pattern
with a pointer to CLAUDE.md and the device_worker module.
No code changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stage 1 of the candle-native pivot. Replaces the external-process
harness model (mistralrs over HTTP, llamacpp placeholder) with an
in-process Harness trait whose sole implementation is candle. The
trait keeps its shape so future engines slot in additively, but
start/stop default to no-ops and HarnessConfig drops endpoint and
systemd_unit since no harness needs external supervision.
Behaviour is unchanged on the wire: load_model returns a "not
implemented yet (Stage 2)" error and list_models is empty. The
gateway-side proxy, poller, and router are untouched.
CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are
marked superseded; the staged plan lives in
~/.claude/plans/create-a-more-aggressive-calm-naur.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous defaults collided with well-trodden infra services and with
the Linux ephemeral port range:
- cortex API 8000 — common dev-server default (Django, minio UI)
- cortex metrics 9100 — Prometheus node_exporter default
- neuron API 9090 — Cockpit default on Fedora, Prometheus self
Move to helexa-themed palindromic ports, all below Linux's
32768-60999 ephemeral range and not registered to any well-known
service:
- cortex API 31313
- cortex metrics 31314
- neuron API 13131
Updated places:
- cortex.example.toml, neuron.example.toml defaults
- default impls in cortex-core and neuron config
- cortex-cli --endpoint default for the status subcommand
- doc comments citing example URLs
- README.md and CLAUDE.md snippets
Consumers already on the old ports need a one-line edit in their
/etc/cortex/cortex.toml or /etc/neuron/neuron.toml to match;
firewall rules and prometheus scrape configs will also need
updating.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>