helexa

Author	SHA1	Message	Date
grenade	49a8dbcd28	Merge pull request 'perf(neuron): parallel in-situ quantization + cold-load phase timing (#1 )' (#40 ) from perf/1-parallel-isq into main Some checks are pending build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Blocked by required conditions Details build-prerelease / Resolve version stamps + change detection (push) Successful in 30s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m11s Details build-prerelease / Build cortex binary (push) Successful in 2m21s Details build-prerelease / Test (push) Successful in 3m56s Details build-prerelease / Build neuron-blackwell (push) Successful in 9m38s Details build-prerelease / Build neuron-ada (push) Successful in 14m21s Details build-prerelease / Build neuron-ampere (push) Successful in 19m0s Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m24s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m12s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 4m29s Details	2026-06-12 20:12:44 +00:00
rob thijssen	90e971dcf5	perf(neuron): parallel in-situ quantization + cold-load phase timing (#1 ) All checks were successful CI / Format (push) Successful in 32s Details CI / Format (pull_request) Successful in 35s Details CI / CUDA type-check (push) Successful in 1m50s Details CI / CUDA type-check (pull_request) Successful in 2m7s Details CI / Clippy (pull_request) Successful in 2m18s Details CI / Clippy (push) Successful in 2m46s Details CI / Test (push) Successful in 5m33s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Test (pull_request) Successful in 5m33s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details QTensor::quantize runs its per-block math strictly sequentially on one core (CUDA storage round-trips through the same CPU path), which made Q6K ISQ the dominant phase of the 27B TP cold load. Blocks are independent, so quantize_parallel re-implements the same encoding through candle's public per-block API (k_quants::GgmlType::from_float) with rayon fanning blocks across the CPU pool — byte-identical output, pinned by parity tests against QTensor::quantize for Q6K/Q5K/Q4K/Q8_0. Threading discipline holds: the device-to-host read and the QStorage::from_data upload stay on the calling thread (device worker / subprocess main); rayon workers touch host memory only. Also adds the per-phase timing the issue asked for first: per-layer debug + layer-loop total + lm_head info lines, so the next cold load shows where the time actually goes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 22:47:57 +03:00
rob thijssen	92273eb936	chore(ci): retrigger build-prerelease — ampere/blackwell packaging skipped after transient build failure on `128b381` All checks were successful build-prerelease / Resolve version stamps + change detection (push) Successful in 29s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m10s Details build-prerelease / Build cortex binary (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Has been skipped Details build-prerelease / Test (push) Successful in 3m54s Details build-prerelease / Build neuron-blackwell (push) Successful in 9m36s Details build-prerelease / Build neuron-ada (push) Successful in 14m6s Details build-prerelease / Build neuron-ampere (push) Successful in 19m8s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m8s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m8s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m52s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m5s Details	2026-06-12 22:38:31 +03:00
grenade	128b3818cb	Merge pull request 'perf(neuron): chunked delta-rule prefill for Gated DeltaNet (#23 )' (#39 ) from perf/23-chunked-gdn-prefill into main All checks were successful build-prerelease / Resolve version stamps + change detection (push) Successful in 30s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m13s Details build-prerelease / Build cortex binary (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Has been skipped Details build-prerelease / Test (push) Successful in 3m57s Details build-prerelease / Build neuron-blackwell (push) Successful in 10m3s Details build-prerelease / Build neuron-ada (push) Successful in 14m11s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped Details build-prerelease / Build neuron-ampere (push) Successful in 14m3s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m10s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 58s Details	2026-06-12 18:44:22 +00:00
rob thijssen	812d191e50	fix(neuron): UT transform by forward substitution, not nilpotent squaring All checks were successful CI / Format (push) Successful in 32s Details CI / Format (pull_request) Successful in 53s Details CI / CUDA type-check (push) Successful in 1m52s Details CI / CUDA type-check (pull_request) Successful in 2m12s Details CI / Clippy (push) Successful in 2m18s Details CI / Clippy (pull_request) Successful in 2m36s Details CI / Test (push) Successful in 4m18s Details CI / Test (pull_request) Successful in 4m22s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details Live A/B on beast produced NaN logits ("!!!" replies) on real prompts: the nilpotent-squaring form of (I - T)^-1 computes raw powers of T, whose entries grow combinatorially (path counts ~ C(62,31)) before nilpotency collapses them — fine on uncorrelated test data, f32 precision death on real prompts whose repetitive text makes keys highly correlated. The reference's forward-substitution loop never forms raw powers; its intermediates are the convergent M entries. Port the reference loop faithfully (rows accumulate into a fresh tensor). New adversarial parity test with near-identical keys and beta ~= 1 diverges to 8e30 under the squaring form and passes under forward substitution. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 21:18:32 +03:00
rob thijssen	2a9def6d2d	perf(neuron): chunked delta-rule prefill for Gated DeltaNet (#23 ) All checks were successful CI / Format (push) Successful in 32s Details CI / Format (pull_request) Successful in 24s Details CI / CUDA type-check (push) Successful in 1m38s Details CI / CUDA type-check (pull_request) Successful in 2m10s Details CI / Clippy (push) Successful in 2m34s Details CI / Test (push) Successful in 4m20s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Clippy (pull_request) Successful in 2m29s Details CI / Test (pull_request) Successful in 4m21s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details Prefill (seq_len >= 64) now runs the chunk-parallel gated delta rule ported from the HF reference torch_chunk_gated_delta_rule (chunk_size=64): identical math reorganised into per-chunk batched matmuls (cuBLAS/tensor cores on CUDA, gemm on CPU) instead of the O(L)-sequential per-token recurrence. Decode steps and short prompts keep the recurrent paths (CUDA kernel / Rust loop) unchanged. One deliberate deviation from the reference: its in-place row-by-row UT-transform computes (I - T)^-1 - I by forward substitution; T is strictly lower triangular and therefore nilpotent at chunk size 64, so the same inverse is the product of six squarings prod_{j=0..5}(I + T^(2^j)) — batched matmuls instead of 63 sequential row updates, which suits candle's immutable tensors. Chunk-local math runs rank-3 over a flattened BHN batch dim (candle matmul supports at most two batch dims). Initial-state continuation is supported, so chunked prefill composes with #11's restored prefix snapshots. Both single-GPU and TP paths pick this up through the shared run_delta_rule dispatch. NEURON_GDN_CHUNKED=0 forces the recurrent paths for A/B measurement. Parity tests pin chunked against recurrent (2e-4 abs) across padding (L=130), exact multiples with non-zero initial state (L=128 after a 50-token prefix), and a single exact chunk. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 20:51:51 +03:00
grenade	ddb331e1a3	Merge pull request 'docs(bench): record post-#11 fleet numbers' (#38 ) from docs/benchmarks-post-11 into main All checks were successful build-prerelease / Resolve version stamps + change detection (push) Successful in 30s Details build-prerelease / Build neuron-blackwell (push) Has been skipped Details build-prerelease / Build neuron-ampere (push) Has been skipped Details build-prerelease / Build neuron-ada (push) Has been skipped Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped Details build-prerelease / Build cortex binary (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Has been skipped Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m47s Details build-prerelease / Test (push) Successful in 4m25s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped Details	2026-06-12 17:14:00 +00:00
rob thijssen	df0bf4c518	docs(bench): record post-#11 fleet numbers All checks were successful CI / Format (push) Successful in 37s Details CI / Format (pull_request) Successful in 37s Details CI / CUDA type-check (push) Successful in 1m31s Details CI / CUDA type-check (pull_request) Successful in 2m7s Details CI / Clippy (push) Successful in 2m24s Details CI / Test (push) Successful in 4m17s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Clippy (pull_request) Successful in 2m24s Details CI / Test (pull_request) Successful in 3m57s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details Appends the 2026-06-12 post-prefix-cache run: 27B @4k warm TTFT 7.07 s -> 1.43 s, no-cache control models unchanged, with a methodology note that repeated-prompt cells now measure warm TTFT on qwen3_5-arch models. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 20:06:53 +03:00
grenade	a1952a4522	Merge pull request 'fix(neuron): snapshot at the last special-token boundary (#11 )' (#37 ) from fix/11-snapshot-cut-retokenization into main All checks were successful build-prerelease / Resolve version stamps + change detection (push) Successful in 30s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m37s Details build-prerelease / Test (push) Successful in 4m21s Details build-prerelease / Build cortex binary (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 10m22s Details build-prerelease / Build neuron-ampere (push) Successful in 13m8s Details build-prerelease / Build neuron-ada (push) Successful in 21m31s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m57s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m0s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m46s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s Details	2026-06-12 16:24:15 +00:00
rob thijssen	4f266dbd82	fix(neuron): snapshot at the last special-token boundary (#11 ) All checks were successful CI / Format (push) Successful in 42s Details CI / Format (pull_request) Successful in 34s Details CI / CUDA type-check (push) Successful in 1m31s Details CI / Clippy (push) Successful in 2m19s Details CI / CUDA type-check (pull_request) Successful in 2m10s Details CI / Test (push) Successful in 4m13s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Clippy (pull_request) Successful in 2m9s Details CI / Test (pull_request) Successful in 4m5s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details Second finding from live 27B validation: prompt-covering snapshots still never matched. The rendered prompt ends with `<\|im_start\|>assistant\n`, and when the next turn re-tokenizes that text followed by the assistant's reply, BPE merges the trailing newline with the reply's first characters — the final token(s) of the cached sequence differ from the next prompt's, so the exact-prefix match never fires. (A reply starting with an atomic special token like <think> masks this, which is why the 0.8B check passed.) Snapshot one past the last <\|im_start\|> instead: special tokens are hard segmentation points, so ids up to and including it are provably identical across renders. Prefill pauses at that boundary to capture the snapshot, then finishes the ~2-token `assistant\n` tail. Applied to all six request paths; unit tests for the cut helper. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 19:16:45 +03:00
grenade	43a6d96d5f	Merge pull request 'fix(neuron): snapshot prefix cache at the prefill boundary (#11 )' (#36 ) from fix/11-prefix-snapshot-at-prefill into main All checks were successful build-prerelease / Resolve version stamps + change detection (push) Successful in 36s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m16s Details build-prerelease / Build cortex binary (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Has been skipped Details build-prerelease / Test (push) Successful in 4m2s Details build-prerelease / Build neuron-ampere (push) Successful in 13m22s Details build-prerelease / Build neuron-blackwell (push) Successful in 13m31s Details build-prerelease / Build neuron-ada (push) Successful in 14m25s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m6s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m12s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m50s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m6s Details	2026-06-12 15:34:59 +00:00
rob thijssen	3fd1989b2b	fix(neuron): snapshot prefix cache at the prefill boundary (#11 ) All checks were successful CI / Format (push) Successful in 41s Details CI / Format (pull_request) Successful in 42s Details CI / CUDA type-check (push) Successful in 1m39s Details CI / CUDA type-check (pull_request) Successful in 2m6s Details CI / Clippy (push) Successful in 3m10s Details CI / Clippy (pull_request) Successful in 3m3s Details CI / Test (pull_request) Successful in 4m2s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details CI / Test (push) Successful in 5m1s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Live validation on beast's Qwen3.6-27B showed reused=0 on every turn: the post-generation snapshot includes reasoning tokens (<think>...) that get stripped when the client echoes the assistant message back, so the cached sequence is never a token-prefix of the next prompt. quadbrat's 0.8B only matched because its think block round-tripped as literal text. Snapshot after prefill instead (covering exactly the prompt tokens) — that is the state the next turn provably extends under a stable chat template, regardless of how reasoning or tool-call content is transformed on echo. Taken after the first healthy sample so NaN-poisoned prefills never cache their state; this also retires the forwarded-token bookkeeping and the consumer-hangup store sites. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 18:29:00 +03:00
grenade	f7952547e7	Merge pull request 'feat(neuron): prefix KV caching for the TP path (#11 )' (#35 ) from feat/11-prefix-kv-cache-tp into main All checks were successful build-prerelease / Resolve version stamps + change detection (push) Successful in 31s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m12s Details build-prerelease / Build cortex binary (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Has been skipped Details build-prerelease / Test (push) Successful in 3m58s Details build-prerelease / Build neuron-blackwell (push) Successful in 9m5s Details build-prerelease / Build neuron-ada (push) Successful in 14m22s Details build-prerelease / Build neuron-ampere (push) Successful in 19m0s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m56s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m0s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m51s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m5s Details	2026-06-12 14:49:19 +00:00
rob thijssen	7e66f77851	fix(neuron): CUDA type-check fixes for TP prefix cache All checks were successful CI / Format (push) Successful in 38s Details CI / Format (pull_request) Successful in 39s Details CI / CUDA type-check (pull_request) Successful in 1m26s Details CI / CUDA type-check (push) Successful in 1m34s Details CI / Clippy (push) Successful in 3m14s Details CI / Clippy (pull_request) Successful in 3m18s Details CI / Test (push) Successful in 5m15s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Test (pull_request) Successful in 3m56s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details Two errors only the cuda config surfaces: the TpSnapshotKv dispatch arms mixed candle and anyhow error types, and restore_or_clear_tp held the registry MutexGuard across the cleanup await inside a let-chain (making the TP request futures non-Send). Bind the removed ref before awaiting, same discipline as the other lock sites. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 17:39:32 +03:00
rob thijssen	e629e1872c	feat(neuron): prefix KV caching for the TP path (#11 ) Some checks failed CI / Format (push) Successful in 37s Details CI / Format (pull_request) Successful in 31s Details CI / CUDA type-check (push) Failing after 1m55s Details CI / CUDA type-check (pull_request) Failing after 1m47s Details CI / Clippy (push) Successful in 2m11s Details CI / Test (push) Successful in 4m15s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Clippy (pull_request) Successful in 2m23s Details CI / Test (pull_request) Successful in 4m0s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details Extends the prefix cache to tensor-parallel models — Qwen3.6-27B on beast, where the TTFT win is largest. Closes #11. Every rank holds its shard's snapshot under one pool-minted id: the leader's lives in the device worker beside the TP slab (Job::TpSnapshotKv / TpRestoreKv / TpDropKvSnapshot), each subprocess rank stores its own in-process via new WorkerRequest variants (SnapshotKvCache / RestoreKvCache / DropKvSnapshot). Shard state has the same shape as single-GPU (attention ConcatKvCache + GDN conv/recurrent state + rope_delta), so the snapshot types are reused; all ranks sit at the same token boundary because step fan-out is synchronous. Consistency on partial failure: a failed restore falls back to clear-all-ranks + full prefill (and drops the entry); a failed snapshot drops the id on every rank so nothing half-stored leaks. DropTp / UnloadModel invalidate a model's snapshots with it, covering auto-recovery. Vision requests bypass as on single-GPU. Budget accounting uses leader bytes x world_size (shards are symmetric). Wired into both TP request paths (non-streaming inner + streaming orchestration task); chunked_prefill_tp gains the restored-offset start. Closes #11 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 17:34:49 +03:00
grenade	bb558451db	Merge pull request 'feat(neuron): prefix KV caching across requests — single-GPU + CPU paths (#11 )' (#34 ) from feat/11-prefix-kv-cache into main All checks were successful build-prerelease / Resolve version stamps + change detection (push) Successful in 29s Details build-prerelease / Build cortex binary (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Has been skipped Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m15s Details build-prerelease / Test (push) Successful in 4m0s Details build-prerelease / Build neuron-blackwell (push) Successful in 9m44s Details build-prerelease / Build neuron-ampere (push) Successful in 12m47s Details build-prerelease / Build neuron-ada (push) Successful in 19m6s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 4m2s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m10s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 4m8s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s Details	2026-06-12 14:20:24 +00:00
rob thijssen	c5378d532d	feat(neuron): prefix KV caching across requests — single-GPU + CPU paths (#11 ) All checks were successful CI / Format (push) Successful in 32s Details CI / Format (pull_request) Successful in 34s Details CI / Clippy (push) Successful in 2m29s Details CI / CUDA type-check (pull_request) Successful in 1m31s Details CI / CUDA type-check (push) Successful in 1m37s Details CI / Clippy (pull_request) Successful in 2m32s Details CI / Test (push) Successful in 4m24s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Test (pull_request) Successful in 4m23s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details Stop discarding cache state between requests. When an incoming prompt's token sequence starts with the exact tokens of a stored snapshot, restore it and prefill only the divergent suffix. For the hybrid qwen3_5 arch a snapshot is attention ConcatKvCache k/v + GatedDeltaNet conv/recurrent state + the rope_delta counter, all at one token boundary; the recurrent state cannot rewind, so matching is exact-prefix only. GDN states are deep-copied both directions (the CUDA delta-rule kernels mutate the state buffer in place); attention k/v snapshots share storage safely (append-by-cat never mutates). Snapshots live in the device worker's state next to the model slab (Job::SnapshotKv / RestoreKv / DropKvSnapshot); the async side holds only an opaque id + token sequence + byte size. DropArch drops a model's snapshots with it, so unload and auto-recovery invalidate for free. CPU loads hold snapshots inline on the legacy path. Per-model LRU registry (harness/prefix_cache.rs) bounded by [harness.candle.prefix_cache] budget_mb / max_entries, enabled by default; inserting a snapshot drops entries it strictly extends. Vision requests and candle-transformers archs bypass the cache entirely (clear-every-request, unchanged). Covers the single-GPU worker path (streaming + non-streaming) and the CPU-local path. The TP path (Qwen3.6-27B on beast) is a follow-up PR that closes #11 with before/after bench numbers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 17:14:07 +03:00
grenade	9f383e7bc7	Merge pull request 'feat(gateway): Anthropic streaming SSE translation (#24 )' (#33 ) from feat/gateway-24-anthropic-sse into main All checks were successful build-prerelease / Resolve version stamps + change detection (push) Successful in 33s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m13s Details build-prerelease / Test (push) Successful in 3m59s Details build-prerelease / Build cortex binary (push) Successful in 2m15s Details build-prerelease / Build neuron-blackwell (push) Successful in 9m59s Details build-prerelease / Build neuron-ada (push) Successful in 14m24s Details build-prerelease / Build neuron-ampere (push) Successful in 19m3s Details build-prerelease / Package cortex RPM (push) Successful in 1m24s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m5s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m59s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m8s Details	2026-06-12 12:57:09 +00:00
rob thijssen	569c528c4b	feat(gateway): Anthropic streaming SSE translation (#24 ) All checks were successful CI / Format (push) Successful in 36s Details CI / CUDA type-check (push) Successful in 2m25s Details CI / Clippy (push) Successful in 2m25s Details CI / Format (pull_request) Successful in 41s Details CI / CUDA type-check (pull_request) Successful in 2m9s Details CI / Clippy (pull_request) Successful in 2m45s Details CI / Test (push) Successful in 5m3s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Test (pull_request) Successful in 4m29s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details The /v1/messages handler translated request envelopes but proxied raw OpenAI SSE frames back to streaming Anthropic clients — the gap between the README's "point your tooling at it once" contract and what Claude Code actually received. cortex-core gains AnthropicStreamTranslator, a pure per-stream state machine: OpenAI chunks in, ordered (event, payload) pairs out — message_start → content_block_start/delta/stop (text and tool_use blocks, indexed; tool_calls map to input_json_delta) → message_delta (stop_reason mapped via the now-shared map_stop_reason, which also teaches the non-streaming path tool_calls→tool_use) → message_stop. Without an upstream usage frame the output count falls back to the delta count (engine-exact for neuron's one-chunk-per-token streams, #31); with one, input/output tokens ride message_delta. cortex-gateway gains anthropic_sse: the wire pump that splits the upstream byte stream into SSE events, parses data: payloads (leniently — engines omit fields on special frames), feeds the translator, and frames results as `event:`/`data:` pairs through a bounded channel (slow client back-pressures the upstream read). Upstream truncation without [DONE] still closes the Anthropic event sequence. Nothing is buffered beyond the current event's bytes. Tests: 5 state-machine unit tests (text flow, stop-reason mapping + defaults, tool_use blocks, usage propagation, idempotent finish) and 2 gateway integration tests (full event sequence + text reassembly, usage propagation into message_delta). Validated end-to-end by running this branch's gateway against a production neuron and streaming a live Anthropic request. Closes #24 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 15:47:30 +03:00
grenade	06e4ffc25c	Merge pull request 'feat(bench): reproducible benchmark harness + first fleet numbers (#22 )' (#32 ) from feat/22-benchmark-harness into main Some checks failed build-prerelease / Build neuron-blackwell (push) Blocked by required conditions Details build-prerelease / Build neuron-ampere (push) Blocked by required conditions Details build-prerelease / Build neuron-ada (push) Blocked by required conditions Details build-prerelease / Resolve version stamps + change detection (push) Successful in 32s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m27s Details build-prerelease / Build cortex binary (push) Successful in 2m41s Details build-prerelease / Package cortex RPM (push) Successful in 1m29s Details build-prerelease / Test (push) Successful in 4m44s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details	2026-06-12 12:46:33 +00:00
rob thijssen	a2e73a8907	feat(bench): reproducible batch-1 benchmark harness + first fleet numbers (#22 ) All checks were successful CI / Format (push) Successful in 40s Details CI / Format (pull_request) Successful in 38s Details CI / CUDA type-check (push) Successful in 2m8s Details CI / CUDA type-check (pull_request) Successful in 2m8s Details CI / Clippy (push) Successful in 2m23s Details CI / Test (pull_request) Successful in 3m54s Details CI / Test (push) Successful in 6m23s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Clippy (pull_request) Successful in 4m23s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details script/bench.py: stdlib-only, works against any OpenAI-compatible /v1 endpoint (helexa, llama.cpp, Ollama, vLLM) so cross-engine tables are a concatenation via the --label column. Measures the operator-felt trio per (model, prompt-size) cell: TTFT (first SSE content chunk), decode tok/s (visible tokens over the first→last chunk window, chunk-per-token engine invariant since streaming usage frames aren't emitted yet — #31), total wall-clock. Medians over N runs after one warmup; append-only JSONL for longitudinal tracking. Measurement traps found against the live fleet and handled: - thinking models burn the budget invisibly (reasoning deltas are off-wire by default) — the prompt appends Qwen's /no_think soft switch - short coalesced replies collapse the decode window to one TCP read — rates require a ≥200 ms window and the prompt demands ~300 words doc/benchmarks.md: method, fleet table, and the first published numbers (2026-06-12, `8f6f1d3`): 1.7B@3060 81 tok/s, 8B@4090 62 tok/s, 27B@2×5090 Q6K TP=2 35 tok/s with flat decode from 128→4k context — and the 7.1 s 4k-prefill TTFT recorded as #23's before-number. Refs #22 (competitor baselines still pending — the harness is ready for them) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 15:39:13 +03:00
rob thijssen	8f6f1d3205	feat(deploy): validate neuron capability after every deploy Some checks failed build-prerelease / Build neuron-ampere (push) Blocked by required conditions Details build-prerelease / Build neuron-ada (push) Blocked by required conditions Details build-prerelease / Package cortex RPM (push) Blocked by required conditions Details build-prerelease / Resolve version stamps + change detection (push) Successful in 29s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m14s Details build-prerelease / Build neuron-blackwell (push) Successful in 10m36s Details build-prerelease / Build cortex binary (push) Successful in 2m35s Details build-prerelease / Test (push) Successful in 6m35s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details A deploy previously went green the moment systemd reported the service started — a merge that broke model loading or inference itself would deploy "successfully" and only surface when a human noticed. Each neuron deploy now earns its green: 1. Wait for default models: poll /health until activation.state is ready, with per-host timeouts in the matrix (beast 900s for the 27B Q6K TP=2 cold-load, benjy/quadbrat 300s). Any entry in activation.failed fails the deploy with the per-model error — the structured equivalent of watching the journal for "loaded default model", plus failure detail the journal line can't carry. 2. LLM smoke probe: ask the first loaded model to reply with one specific word (max_tokens 512 so thinking models have room, temperature 0) and grep the response for it. Not a quality bar — just proof the deploy didn't lobotomize inference. Hosts whose package is already current still skip everything — the validation cost is only paid when a restart actually happened. The probe was dry-run against benjy's production neuron before landing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 15:28:20 +03:00
grenade	b0d0b939af	Merge pull request 'feat(gateway): per-request token metrics — TTFT and tok/s (#21 )' (#30 ) from feat/gateway-21-token-metrics into main Some checks failed build-prerelease / Lint (fmt + clippy) (push) Blocked by required conditions Details build-prerelease / Test (push) Blocked by required conditions Details build-prerelease / Build cortex binary (push) Blocked by required conditions Details build-prerelease / Build neuron-blackwell (push) Blocked by required conditions Details build-prerelease / Build neuron-ampere (push) Blocked by required conditions Details build-prerelease / Build neuron-ada (push) Blocked by required conditions Details build-prerelease / Resolve version stamps + change detection (push) Successful in 33s Details build-prerelease / Package cortex RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details	2026-06-12 12:25:32 +00:00
rob thijssen	6a36d15ef1	feat(gateway): per-request token metrics — TTFT and tok/s (#21 ) All checks were successful CI / Format (push) Successful in 45s Details CI / Format (pull_request) Successful in 37s Details CI / CUDA type-check (push) Successful in 2m25s Details CI / Clippy (push) Successful in 2m37s Details CI / Test (push) Successful in 4m22s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Clippy (pull_request) Successful in 2m23s Details CI / Test (pull_request) Successful in 4m19s Details CI / CUDA type-check (pull_request) Successful in 1m57s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details The deferred Phase 6b, and the unblock for the 7→8 milestone's benchmark work (#22): until cortex measures itself per request, nothing downstream can be benchmarked or graphed. The proxy wraps the upstream byte stream in a pass-through inspector (TokenMetricsStream): chunks are forwarded verbatim — never buffered or re-serialised — while the inspector records arrival times and keeps a bounded (64 KiB) tail of the body text. At stream end (or client disconnect, via Drop) it extracts the final OpenAI usage object — present on the last SSE chunk and non-streaming JSON bodies alike — for engine-truth token counts. Per request, labelled {model, node}: - cortex_time_to_first_token_seconds (histogram) — first body chunk - cortex_tokens_per_second (histogram) — completion tokens over the decode window (first→last chunk); falls back to total request duration for single-chunk non-streaming bodies - cortex_prompt_tokens_total / cortex_completion_tokens_total (counters) The extractor is pure and chunk-boundary-safe; quoted-needle matching keeps completion_tokens_details from shadowing completion_tokens, and the last usage object wins. Covers chat completions, completions, the Responses API, and the Anthropic streaming path (which currently proxies OpenAI SSE). Tests: 4 extractor unit tests; integration test with a streaming mock emitting a stream_options-style final usage chunk, asserting both histograms and exact-or-greater counter values (the test recorder is process-global and shared across the binary's tests). Closes #21 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 15:11:52 +03:00
grenade	b463439416	Merge pull request 'feat(neuron): startup preflight for NVIDIA driver/library mismatch (#19 )' (#29 ) from feat/neuron-19-driver-preflight into main Some checks failed build-prerelease / Build neuron-ampere (push) Blocked by required conditions Details build-prerelease / Build neuron-ada (push) Blocked by required conditions Details build-prerelease / Resolve version stamps + change detection (push) Successful in 29s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m11s Details build-prerelease / Build cortex binary (push) Successful in 2m33s Details build-prerelease / Test (push) Successful in 4m24s Details build-prerelease / Package cortex RPM (push) Successful in 1m27s Details build-prerelease / Build neuron-blackwell (push) Successful in 10m18s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details	2026-06-12 12:08:20 +00:00
rob thijssen	716558c8ff	feat(neuron): startup preflight for NVIDIA driver/library mismatch (#19 ) All checks were successful CI / Format (push) Successful in 38s Details CI / Format (pull_request) Successful in 38s Details CI / CUDA type-check (push) Successful in 2m11s Details CI / Clippy (push) Successful in 2m13s Details CI / Clippy (pull_request) Successful in 2m37s Details CI / Test (push) Successful in 4m17s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Test (pull_request) Successful in 3m56s Details CI / CUDA type-check (pull_request) Successful in 1m44s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details The un-rebooted driver update (userspace libs bumped, kernel module still old) kills every CUDA call on the host including nvidia-smi, and neuron surfaced it only as `Comm::from_rank ... NcclError` deep inside the first model load — 30 minutes of forensics on beast (2026-06-08) to diagnose. Make it instantly legible instead: - discovery distinguishes nvidia-smi absent (CPU-only, fine) from present-but-failing, classifies the "Driver/library version mismatch" signature, and pairs the userspace NVML version with the loaded kernel-module version from /proc/driver/nvidia/version. - DiscoveryResponse gains `cuda_unavailable_reason` (omitted when None — wire-compatible) so cortex can see why the node has no devices and route around it. - startup logs one loud ERROR line with the actionable reason ("reboot the host to reload the kernel module") and skips default model loads entirely, marking each failed with that reason so /health activation shows the real cause. - POST /models/load fast-rejects with 503 + code=cuda_unavailable on a mismatch host instead of dying minutes later in cuInit/NCCL. No false positives: other nvidia-smi failures (no devices, perms) keep their existing behaviour, CPU-only hosts stay silent. Closes #19 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 15:00:00 +03:00
rob thijssen	112e4e124a	fix(ci): export RUSTC_WRAPPER in the build step itself — GITHUB_ENV doesn't propagate Some checks failed build-prerelease / Package helexa-neuron-ada RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-ampere RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Blocked by required conditions Details build-prerelease / Resolve version stamps + change detection (push) Successful in 32s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m22s Details build-prerelease / Build cortex binary (push) Successful in 2m20s Details build-prerelease / Test (push) Successful in 3m50s Details build-prerelease / Build neuron-blackwell (push) Successful in 10m10s Details build-prerelease / Package cortex RPM (push) Successful in 1m25s Details build-prerelease / Build neuron-ada (push) Successful in 14m29s Details build-prerelease / Build neuron-ampere (push) Successful in 14m31s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details Run 375 proved the CUDA image ships sccache (probe step printed "sccache enabled") but the wrapper never reached cargo: the runner does not propagate GITHUB_ENV across steps, so the builds ran unwrapped (server stats: 4 compile requests for a ~600-crate build, durations unchanged). Probe and export inside the build step's own shell instead, in both build-neuron and ci.yml's cuda-check. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 14:50:25 +03:00
rob thijssen	dc6feec6dc	fix(deploy): gate on the publish manifest, not unprivileged dnf check-update All checks were successful build-prerelease / Resolve version stamps + change detection (push) Successful in 31s Details build-prerelease / Build cortex binary (push) Successful in 2m18s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m33s Details build-prerelease / Test (push) Successful in 4m20s Details build-prerelease / Package cortex RPM (push) Successful in 1m23s Details build-prerelease / Build neuron-blackwell (push) Successful in 9m46s Details build-prerelease / Build neuron-ampere (push) Successful in 13m57s Details build-prerelease / Build neuron-ada (push) Successful in 15m29s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m5s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m5s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m49s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m8s Details The `f5fa840` deploy exposed both failure modes of gating with `dnf check-update` as the gitea_ci user in one run: it hung indefinitely on quadbrat (blocked process, 0 CPU, killed manually), and on benjy/beast it silently reported "no updates" two minutes after new RPMs were published — both hosts skipped a real (luckily binary-identical) update. Gate with data we own instead: fetch packages.json from rpm.lair.cafe (plain curl, no privileges, no dnf locks), take the newest release per package by buildTime, and skip the stop/upgrade/start cycle only when it exactly equals `rpm -q %{VERSION}-%{RELEASE}`. Unreachable or unparsable manifest fails open to a full deploy. The dnf transaction itself still runs under the scoped sudoers rules, unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 14:20:21 +03:00
grenade	02f20bc9e1	Merge pull request 'feat: keep auto-recovering models visible as recovering (#20 )' (#28 ) from feat/neuron-20-recovering-status into main Some checks failed build-prerelease / Test (push) Blocked by required conditions Details build-prerelease / Build neuron-blackwell (push) Blocked by required conditions Details build-prerelease / Build neuron-ampere (push) Blocked by required conditions Details build-prerelease / Build neuron-ada (push) Blocked by required conditions Details build-prerelease / Resolve version stamps + change detection (push) Successful in 30s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m39s Details build-prerelease / Build cortex binary (push) Successful in 3m46s Details build-prerelease / Package cortex RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details	2026-06-12 11:15:38 +00:00
rob thijssen	2a231e49de	merge main (sccache enablement supersedes branch cuda-check pin) All checks were successful CI / Format (push) Successful in 40s Details CI / Format (pull_request) Successful in 37s Details CI / Clippy (push) Successful in 2m17s Details CI / CUDA type-check (push) Successful in 2m39s Details CI / CUDA type-check (pull_request) Successful in 2m30s Details CI / Test (push) Successful in 4m51s Details CI / Clippy (pull_request) Successful in 2m12s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Test (pull_request) Successful in 4m49s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details # Conflicts: # .gitea/workflows/ci.yml	2026-06-12 14:05:55 +03:00
rob thijssen	2dadea5d8d	ci: enable sccache on the build jobs (conditional on the CUDA image) Some checks failed build-prerelease / Build neuron-blackwell (push) Blocked by required conditions Details build-prerelease / Resolve version stamps + change detection (push) Successful in 34s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m57s Details build-prerelease / Test (push) Has been cancelled Details build-prerelease / Build cortex binary (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package cortex RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details The 3 CUDA flavour builds (10-14 min each, the critical path of every full run) and build-cortex compiled entirely uncached. With the gongfoo-side sccache hardening in place, wire them up: - build-cortex: full sccache env (rust image ships it) + the standard escalation loop (retry -> server restart -> uncached final attempt). - build-neuron: probe for sccache before enabling the wrapper — the CUDA image may not ship it, and a missing binary must degrade to an uncached build, not fail cargo at `sccache rustc -vV` (the original reason the wrapper was cleared here). rustc compilations are shared across all three flavours; candle-kernels' nvcc output stays uncached (build-script artifact). - ci.yml cuda-check: same probe pattern replaces the blanket env clear; also pins CUDA_COMPUTE_CAP=86 since the image no longer ships nvidia-smi for candle-kernels' fallback detection (mirrors `9bb9678` on the #20 branch). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 14:05:26 +03:00
rob thijssen	9bb9678f93	fix(ci): pin CUDA_COMPUTE_CAP in cuda-check — builder image has no nvidia-smi All checks were successful CI / Format (push) Successful in 37s Details CI / Format (pull_request) Successful in 38s Details CI / CUDA type-check (push) Successful in 1m45s Details CI / Clippy (push) Successful in 2m24s Details CI / Clippy (pull_request) Successful in 2m19s Details CI / Test (push) Successful in 4m40s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details CI / Test (pull_request) Successful in 4m35s Details CI / CUDA type-check (pull_request) Successful in 1m50s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details candle-kernels' build script shells out to nvidia-smi for compute-cap detection when CUDA_COMPUTE_CAP is unset; the current GPU-less builder image doesn't ship it, so the type-check died in the build script before borrow-checking anything. Pin an arbitrary valid cap — the check is feature-gate compilation only; real caps live in build-prerelease.yml's flavour matrix. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 13:55:23 +03:00
rob thijssen	df9c490614	feat(neuron+gateway): keep auto-recovering models visible as `recovering` (#20 ) Some checks failed CI / Format (push) Successful in 37s Details CI / CUDA type-check (pull_request) Failing after 28s Details CI / Format (pull_request) Successful in 37s Details CI / Clippy (push) Successful in 2m54s Details CI / Clippy (pull_request) Successful in 3m36s Details CI / Test (push) Successful in 4m37s Details CI / Test (pull_request) Successful in 5m20s Details CI / Build cortex SRPM (pull_request) Has been skipped Details CI / Build neuron SRPM (pull_request) Has been skipped Details CI / Publish cortex to COPR (pull_request) Has been skipped Details CI / Publish neuron to COPR (pull_request) Has been skipped Details CI / Bump version in source (pull_request) Has been skipped Details CI / CUDA type-check (push) Failing after 31s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details During the #17 auto-recovery window (unload → reload, minutes for a large TP model) the model's registry slot is absent, so it vanished from neuron's /models — and cortex, routing by /models presence, answered "model not found on any node" while a direct request to neuron would have correctly said "recovering, retry shortly". neuron: the recovery set becomes a map carrying a devices/capabilities snapshot taken at trigger time (while the registry slot still exists). list_models reports `recovering` for models in the set — both while the poisoned slot is still present and during the reload gap, where the snapshot keeps the model listed. gateway: ModelStatus grows a Recovering variant (parsed from the wire); the router holds the route — new RouteError::ModelRecovering mapped to 503 instead of 404 — and deliberately does not fall through to the catalogue cold-load, which would race a second placement against the in-flight recovery. The evictor already ignores non-Loaded entries. Tests: neuron unit test (recovering model stays listed with snapshot), gateway integration tests (poller parses `recovering`; request gets 503 retry-shortly and the model stays on /v1/models). Closes #20 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 13:42:03 +03:00
rob thijssen	f5fa840dfb	ci: escalate sccache retries — restart server, then fall back uncached All checks were successful build-prerelease / Resolve version stamps + change detection (push) Successful in 30s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 2m6s Details build-prerelease / Test (push) Successful in 4m50s Details build-prerelease / Build cortex binary (push) Successful in 3m45s Details build-prerelease / Build neuron-blackwell (push) Successful in 9m59s Details build-prerelease / Build neuron-ada (push) Successful in 14m11s Details build-prerelease / Build neuron-ampere (push) Successful in 14m13s Details build-prerelease / Package cortex RPM (push) Successful in 1m30s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m28s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m50s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m54s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s Details Run 361's Test job failed all 3 attempts with the sccache dead-server signature (sccache fatal error, ENOENT on its own tmp files under target/debug/deps). Retrying the same invocation only helps for transient races; against a wedged server every same-VM retry fails identically — and under the new pipeline that blocks publish and the deploy behind it. Escalate instead: attempt 1 plain, attempt 2 after an sccache server restart, attempt 3 with RUSTC_WRAPPER unset (uncached). A sick cache now costs build minutes, never the deploy. Applied to the lint/test jobs in build-prerelease.yml and ci.yml alike. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 13:24:02 +03:00
rob thijssen	7557c5e877	ci: cut iteration latency — change-aware builds, gated deploys, dev fast path Some checks failed build-prerelease / Build neuron-blackwell (push) Blocked by required conditions Details build-prerelease / Resolve version stamps + change detection (push) Successful in 28s Details build-prerelease / Test (push) Failing after 1m16s Details build-prerelease / Lint (fmt + clippy) (push) Successful in 3m7s Details build-prerelease / Build cortex binary (push) Successful in 3m57s Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package cortex RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details Push-to-testable was ~20.5 min for every commit (measured on the 2026-06-08 green chain) plus a ~5 min 27B cold-load, regardless of what changed. Three structural fixes: - build-prerelease: a change-detection step in `prepare` diffs HEAD against the git sha embedded in the last published unstable RPM (per package, from packages.json) and skips builds whose inputs didn't change. Docs-only commits build nothing; gateway-only commits skip the 3 CUDA flavour builds. Detection failures fall open to a full build. - ci.yml no longer runs on pushes to main; fmt/clippy/test live in build-prerelease as parallel jobs gating publish. The two workflows previously queued against each other on the same runner labels, delaying the cortex build ~12 min. Branches, PRs, and tags keep the full ci.yml gate. - deploy: each host self-gates with `dnf check-update` and leaves the service untouched when the installed package is already current — no more neuron restarts (and 27B cold-loads) for commits that didn't change neuron. - deploy-dev (new): manual single-host fast path — build one CUDA flavour, scp the binary, restart the service. Skips packaging, signing, publish, and dnf entirely. Backed by a new exact-form sudoers rule in asset/sudoers.d/neuron-host.conf (already applied to all three hosts). Expected loop times when runners behave: docs ≈ 1 min (nothing deploys), gateway-only ≈ 6-8 min, single-neuron dev ≈ 8-10 min, full fleet ≈ 13-15 min. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 13:17:22 +03:00
rob thijssen	91e95ca979	docs: rewrite README around project positioning Some checks failed CI / CUDA type-check (push) Failing after 46s Details CI / Format (push) Successful in 47s Details CI / Clippy (push) Successful in 2m53s Details CI / Test (push) Successful in 4m31s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package helexa-neuron-ada RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-ampere RPM (push) Blocked by required conditions Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Blocked by required conditions Details build-prerelease / Resolve version stamps (push) Successful in 39s Details build-prerelease / Build cortex binary (push) Successful in 3m52s Details build-prerelease / Package cortex RPM (push) Successful in 1m18s Details build-prerelease / Build neuron-blackwell (push) Successful in 11m34s Details build-prerelease / Build neuron-ampere (push) Successful in 15m31s Details build-prerelease / Build neuron-ada (push) Successful in 15m37s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details Lead with what helexa is for — near-frontier open-weight models on consumer hardware you own — instead of a feature list. Adds the scope section (intentional divergence from vLLM/SGLang; CUDA-only today as a test-coverage constraint, not a principle), an engine section covering the per-device worker threads and consumer-GPU tensor parallelism, the previously-missing helexa-acp crate, and a status section pointing at git.lair.cafe as the source of truth with GitHub as read-only mirror. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 11:37:00 +03:00
rob thijssen	1a74cb0c56	chore: rename repo cortex -> helexa Some checks failed CI / CUDA type-check (push) Failing after 30s Details build-prerelease / Resolve version stamps (push) Successful in 45s Details CI / Format (push) Successful in 32s Details build-prerelease / Build neuron-blackwell (push) Failing after 31s Details build-prerelease / Build neuron-ada (push) Failing after 34s Details build-prerelease / Build neuron-ampere (push) Failing after 38s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped Details CI / Clippy (push) Failing after 1m11s Details build-prerelease / Build cortex binary (push) Successful in 3m47s Details CI / Test (push) Successful in 5m32s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m22s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details helexa is the project; cortex (per-operator control plane / LLM proxy) and neuron (per-host LLM harness) are its components. The Gitea repo is now helexa/helexa. Update repository URLs in Cargo metadata, RPM specs, and docs; make the CI changelog push URL rename-proof via the github.repository context; reframe README.md and CLAUDE.md around the project name. Binary, package, service, and config-path names are unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 10:54:01 +03:00
rob thijssen	60f5598542	build(neuron): bump cudarc fork to 63327a2 (idempotent abort + Comm Send+Sync) Some checks failed build-prerelease / Resolve version stamps (push) Successful in 29s Details CI / CUDA type-check (push) Successful in 31s Details CI / Format (push) Successful in 35s Details CI / Test (push) Failing after 1m9s Details CI / Clippy (push) Successful in 2m36s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 6m10s Details build-prerelease / Build neuron-ampere (push) Successful in 7m35s Details build-prerelease / Build neuron-ada (push) Successful in 5m7s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m53s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m14s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m48s Details build-prerelease / Build cortex binary (push) Successful in 4m33s Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s Details The fork's new commit makes `Comm: Send + Sync` (asserting NCCL's thread-safety invariant upstream) and makes `Comm::abort` idempotent via an `aborted` flag (so abort-then-Drop can't double-free) — strictly better than the previous Drop-no-panic workaround, and the `abort()` signature is unchanged so the watchdog call site is unaffected. Because `Comm` is now `Send + Sync`, `Arc<Comm>` and the `SendComm` / `NcclState` wrappers auto-derive `Send`/`Sync`, which conflicts (E0119) with neuron's manual `unsafe impl`s. Remove the four now-redundant impls — the safety assertion lives upstream in cudarc where it belongs. The conflict is in cuda-gated code, so only the CUDA type-check catches it (non-cuda build + clippy + tests stay green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 16:33:14 +03:00
rob thijssen	7945240646	chore: re-trigger deploy (#17 Stage 2, attempt 3) All checks were successful CI / CUDA type-check (push) Successful in 31s Details build-prerelease / Resolve version stamps (push) Successful in 31s Details CI / Format (push) Successful in 33s Details CI / Clippy (push) Successful in 2m41s Details build-prerelease / Build cortex binary (push) Successful in 4m45s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m50s Details CI / Test (push) Successful in 6m44s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m23s Details build-prerelease / Build neuron-ampere (push) Successful in 8m38s Details build-prerelease / Build neuron-ada (push) Successful in 5m36s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m55s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m59s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m43s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 59s Details No code change. Each deploy run, the degraded CI runner kills a different single arch build (blackwell, then ada) ~fast, and the all-arch-gated packaging skips → no publish. Every arch HAS built green across runs (blackwell ✅ in 342, ampere ✅, ada ✅ in 339) and the gate + CUDA type-check pass. Re-running to catch all three green in one run so the Stage-2 RPMs publish. Runner FS/cache health is the real fix (separate infra work). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 15:06:04 +03:00
rob thijssen	0c74d89d15	chore: re-trigger deploy (#17 Stage 2) Some checks failed CI / CUDA type-check (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 29s Details CI / Format (push) Successful in 30s Details build-prerelease / Build neuron-ada (push) Failing after 51s Details CI / Clippy (push) Successful in 2m41s Details build-prerelease / Build cortex binary (push) Successful in 4m28s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m32s Details build-prerelease / Build neuron-ampere (push) Successful in 7m42s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped Details CI / Test (push) Successful in 6m6s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped Details No code change. The `c94a2ae` deploy's neuron-blackwell build died ~12min into the Blackwell kernel compile on the degraded runner, while neuron-ampere + neuron-ada built the identical Rust + patched cudarc cleanly and the CUDA type-check passed. Transient infra; re-running to get a healthy blackwell build so the RPMs publish and beast (Blackwell) picks it up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 14:45:16 +03:00
rob thijssen	c94a2ae755	fix(neuron): correct nccl_state path on WorkerPool.leader_comm (#17 S2) Some checks failed CI / CUDA type-check (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 35s Details CI / Format (push) Successful in 44s Details build-prerelease / Build cortex binary (push) Successful in 4m57s Details build-prerelease / Package cortex RPM (push) Successful in 1m36s Details CI / Test (push) Successful in 7m10s Details CI / Clippy (push) Failing after 1m21s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-ampere (push) Successful in 8m40s Details build-prerelease / Build neuron-ada (push) Successful in 9m5s Details build-prerelease / Build neuron-blackwell (push) Failing after 12m2s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details `super::nccl_state` from tp/mod.rs resolves to `crate::harness::nccl_state` (nonexistent); the module is the child `nccl_state` (cf. the existing `nccl_state::generate_comm_id_hex` call). The field is cuda-gated so the non-cuda build couldn't catch it; the branch CUDA type-check flaked on the runner before compiling. Self-audited fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 14:21:43 +03:00
rob thijssen	99920dd322	feat(neuron): TP step watchdog aborts wedged collectives (#17 Stage 2) Some checks failed CI / CUDA type-check (push) Failing after 47s Details CI / Format (push) Successful in 31s Details CI / Test (push) Failing after 1m3s Details CI / Clippy (push) Successful in 2m44s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Make a hung NCCL collective recoverable instead of a permanent brick. Today a wedged collective hangs the in-process leader thread forever, and even Stage 1's recovery can't help — its unload's DropTp queues behind the stuck thread and hangs too. - Cache the leader's NCCL Comm handle async-side at init (new cuda-gated Job::GetLeaderComm → DeviceWorkerHandle::get_leader_comm → stored on WorkerPool.leader_comm). Fetched while the thread is responsive — a wedged thread can't service the fetch, which is why it's cached up front. - Wrap the leader forward in both generate_step and generate_step_with_images in tokio::time::timeout (default 120s, NEURON_TP_STEP_TIMEOUT_S). On expiry the watchdog calls Comm::abort() (ncclCommAbort) on the cached handle from the async thread — the one NCCL op sanctioned concurrently with an in-flight collective — which unblocks the leader thread, then fails the step WITHOUT draining (workers are wedged too; recovery's unload kills them). The error is a device fault → poison → Stage 1 auto-recovery, which now completes because the leader thread is responsive again. - Bumps the cudarc patch to dbc425a (adds the Drop-must-not-panic fix so the post-abort comm teardown during recovery doesn't double-abort-panic). Logs the whole sequence at ERROR with greppable `tp watchdog:` / `ncclCommAbort` markers so a real-world hang leaves a forensic trail — verification is by inspecting journals after real hangs, not a synthetic harness. cuda-gated → validated by the blackwell build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 14:15:29 +03:00
rob thijssen	c4f239ceb9	build(neuron): patch cudarc to expose Comm::abort/get_async_error (#17 Stage 2) All checks were successful CI / CUDA type-check (push) Successful in 33s Details CI / Format (push) Successful in 35s Details CI / Clippy (push) Successful in 2m34s Details CI / Test (push) Successful in 6m1s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details #17 Stage 2 (TP hang-recovery) needs to call ncclCommAbort on a LIVE communicator from another thread — to unblock a collective wedged on a dead/hung peer so the ranks can resync. No cudarc release (incl. main) exposes this: the safe Comm only aborts in Drop, which can't fire while a stuck thread holds an Arc<Comm> clone. Pin neuron's cudarc 0.19.7 to a fork (grenade/cudarc @ nccl-comm-abort, rev 4dff0be) adding three thin methods — Comm::abort, get_async_error, and a raw comm() accessor — to be submitted upstream. The patch targets 0.19.x only; candle's transitive cudarc 0.17.8 stays on crates.io. Foundation only; the watchdog + abort + comm-rebuild that consume these land in follow-up commits (cuda-gated → validated by the blackwell build). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 13:49:59 +03:00
rob thijssen	ac445c1569	chore: re-trigger deploy (#17 Stage 1) Some checks failed CI / CUDA type-check (push) Failing after 19s Details CI / Format (push) Successful in 37s Details build-prerelease / Resolve version stamps (push) Successful in 42s Details CI / Clippy (push) Successful in 3m54s Details build-prerelease / Build cortex binary (push) Successful in 4m43s Details CI / Test (push) Successful in 6m35s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m58s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Build neuron-ampere (push) Successful in 8m10s Details build-prerelease / Build neuron-ada (push) Successful in 5m21s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m56s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m1s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m46s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m4s Details No code change. The `abc6e60` deploy's neuron-ada build died on the degraded CI runner (container dropped mid-checkout), skipping the gated publish — even though neuron-blackwell + neuron-ampere compiled the Stage-1 fault-recovery code cleanly. Re-running to get a healthy ada build so the RPMs publish and beast picks up the build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 09:34:20 +03:00
rob thijssen	abc6e605b8	test(neuron): NEURON_DEBUG_POISON hook to verify auto-recovery (#17 ) Some checks failed CI / CUDA type-check (push) Failing after 19s Details build-prerelease / Resolve version stamps (push) Successful in 43s Details CI / Format (push) Successful in 50s Details CI / Clippy (push) Failing after 57s Details build-prerelease / Build neuron-ada (push) Failing after 48s Details build-prerelease / Build cortex binary (push) Successful in 5m5s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m38s Details build-prerelease / Package cortex RPM (push) Successful in 1m27s Details build-prerelease / Build neuron-ampere (push) Successful in 7m27s Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been skipped Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been skipped Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been skipped Details CI / Test (push) Successful in 10m27s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details One-shot, env-gated fault injector for beast verification: when NEURON_DEBUG_POISON names a model, the first request for it triggers the auto-recovery path as if a device fault had occurred — exercising unload→reload→healthy without corrupting the GPU. Latched so it fires exactly once (no recovery loop). No-op unless the env var is set; wired into both the single-GPU and TP chat poison gates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 09:08:40 +03:00
rob thijssen	4f2957af9e	feat(neuron): auto-recover poisoned models (#17 Stage 1c) When an inference hit a device fault, the model was flagged poisoned and every subsequent request rejected with "unload and reload the model to recover" — until a human did exactly that. Now the harness rebuilds the context automatically. - Retain the loading `ModelSpec` on `LoadedModel`/`TpLoadedModel` (+ `LoadedHandle::spec()`) so a poisoned model can be reloaded without an operator reconstructing the spec. - A background recovery task (held via `Weak<CandleHarness>`, spawned in `new()` when a runtime is present) drains poisoned model ids and runs `unload_model` → `load_model(spec)`. Unload drops the model → cudarc `Comm::drop` aborts NCCL + releases the context; reload re-runs NCCL init + sanity inside the load path, so a successful reload yields a fresh, healthy model. A failed reload leaves it unloaded (next load retries) — never poisoned forever. - The request-entry poison gates now `trigger_recovery` (single-flight per model via a `recovering` set) and return a transient "recovering, retry shortly" error instead of the manual-reload message. Requests that arrive during the brief reload gap (model absent from the registry) also get "recovering" rather than a misleading "not loaded". `new()` now returns `Arc<Self>`. Recovery runs only on the background task — never inline on the request path, which holds `inference_lock` and would deadlock on the `models` write lock. Stage 1c of the #17 plan (verified-healthy auto-recovery). Watchdog (1b) + a fault-injection hook for beast verification follow. The in-process rank-0 leader's own context fault still needs a reload that can't rebind it (Stage 3); comm-desync + worker faults recover here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 09:05:02 +03:00
rob thijssen	75cd088b61	fix(neuron): cap vision max_pixels to the pos_embed patch budget (#14 ) All checks were successful CI / CUDA type-check (push) Successful in 31s Details build-prerelease / Resolve version stamps (push) Successful in 29s Details CI / Format (push) Successful in 30s Details CI / Clippy (push) Successful in 2m32s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m5s Details CI / Test (push) Successful in 5m49s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-ampere (push) Successful in 8m11s Details build-prerelease / Build neuron-ada (push) Successful in 5m40s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m4s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m2s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m57s Details build-prerelease / Build cortex binary (push) Successful in 4m21s Details build-prerelease / Package cortex RPM (push) Successful in 1m25s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m16s Details Beast testing surfaced a real regression in the dynamic-resolution default: a tall 808×1600 image resized (within the 1024² max_pixels) to a 90×44 patch grid = 3960 patches, exceeding the vision tower's hard `num_position_embeddings = 2304` pos-embed budget. The per-rank `patch count 3960 exceeds pos_embed budget 2304` error fired mid-TP- forward and poisoned the device context, bricking the model until reload. Hard-cap `max_pixels` to `2304 × 16² = 589_824` px (≤ 2304 patches → ≤ 576 LM tokens), clamping even the operator env override. `smart_resize` floors the pixel count under the cap, so no resized image can ever exceed the budget — the tower check never fires, no poison. The pos-embed grid (48×48) is the resolution Qwen3.6 was trained at, so the cap is principled, not just defensive. Still ~3× the old fixed 196 tokens, and the book-cover OCR test (1176 patches) already reads full title+subtitle. Test: a huge/tall/wide/extreme image battery stays within the 2304 patch budget. (Per-rank-error poison robustness itself remains issue #17.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 23:30:47 +03:00
rob thijssen	d311c8ca7a	feat(neuron): operator pixel-budget env override + doc cleanup (#14 C5) Some checks failed CI / CUDA type-check (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 38s Details CI / Format (push) Successful in 45s Details CI / Test (push) Failing after 58s Details CI / Clippy (push) Successful in 2m41s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m14s Details build-prerelease / Package cortex RPM (push) Successful in 1m23s Details build-prerelease / Build neuron-blackwell (push) Successful in 6m20s Details build-prerelease / Build neuron-ampere (push) Successful in 7m18s Details build-prerelease / Build neuron-ada (push) Successful in 5m10s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m6s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m7s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m5s Details - PreprocessProfile::qwen3_6() reads NEURON_VISION_MIN_PIXELS / NEURON_VISION_MAX_PIXELS (clamped to factor² ≤ min ≤ max), matching the NEURON_VISION_LEGACY_* / NEURON_MROPE knob convention. Defaults remain 256²…1024² (64…1024 LM tokens/image). - Test: a max-resolution source caps within the token budget (can't blow NEURON_MAX_PROMPT_TOKENS). - Strip stale fixed-resolution / "MRoPE gap (#15)" / 14×14 language from the preprocess, mod, and rope doc-comments now that resolution is dynamic and M-RoPE is implemented. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 22:50:03 +03:00
rob thijssen	c97a8654f5	feat(neuron): dynamic-resolution images via Qwen smart_resize (#14 ) Some checks failed CI / Clippy (push) Waiting to run Details CI / Test (push) Waiting to run Details CI / CUDA type-check (push) Successful in 32s Details CI / Format (push) Successful in 34s Details CI / Build cortex SRPM (push) Has been cancelled Details CI / Build neuron SRPM (push) Has been cancelled Details CI / Publish cortex to COPR (push) Has been cancelled Details CI / Publish neuron to COPR (push) Has been cancelled Details CI / Bump version in source (push) Has been cancelled Details Replace the fixed 448×448-square preprocess with native-aspect `smart_resize`, and thread the resulting per-image grid through the LM so spatial structure survives non-square images (documents, screenshots, charts, panoramas, OCR) instead of being squished into a square. - preprocess.rs: port Qwen `smart_resize` (factor = patch×merge = 32; pixel budget [min,max], default 256²–1024² → 64–1024 LM tokens). `PreprocessProfile` drops the fixed target dims for `factor`/`min_pixels`/ `max_pixels`; `preprocess`/`preprocess_data_uri` now return the resized `(h, w)`; add `resized_dims_for_uri` (decode + resize, no normalize) for the TP leader's token count. - rope.rs: `compute_mrope_index`/`get_rope_index` take per-image `grids: &[(lm_gh, lm_gw)]` instead of assuming a square `isqrt(run)`. Walk image runs in order, validate `run == gh*gw`, emit row-major positions, resume the shared counter at `base + max(gh,gw)`. Correct for multiple images of differing grids interleaved with text. - candle.rs: `VisionMeta`/`LoadedModel`/`TpLoadedModel` carry the `image_grid_factor` (patch×merge) instead of the constant 196; all four prompt-build sites compute per-image counts from each image's resized grid (single-GPU from the extracted `ImageInput.h/w`, TP from `resized_dims_for_uri`). `ModelArch` gains `vision_grid_factor`. - single-GPU (`mod.rs`, `dispatch.rs`) and TP (`tp_qwen3_5.rs::prefill_with_images_chunked`, `dispatch.rs`, `tp/worker.rs`) thread the grids into `get_rope_index`. Each TP rank recomputes grids from its own deterministic preprocess — no rpc.rs change, single source of truth. The vision tower itself was already grid-general (recent pos-embed interpolation + 2D rotary fix). No patch-count cap: pos-embed is interpolated to any grid; `max_pixels` bounds cost (O(patches²) ViT attention + prefill) instead. Tests: smart_resize (aspect/cap/floor/reject), `compute_mrope_index` non-square + two-image + mismatch cases, square-grid regression guard. Non-cuda build + clippy + full workspace tests green; TP load/dispatch paths are cuda-gated → Gitea CUDA type-check. Operator pixel-budget config + remaining doc cleanup follow in C5. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 22:47:27 +03:00
rob thijssen	dc048ffcc9	fix(neuron): vision-tower 2D positions + M-RoPE default on All checks were successful CI / CUDA type-check (push) Successful in 32s Details build-prerelease / Resolve version stamps (push) Successful in 32s Details CI / Format (push) Successful in 33s Details CI / Clippy (push) Successful in 2m36s Details build-prerelease / Build cortex binary (push) Successful in 4m48s Details build-prerelease / Build neuron-blackwell (push) Successful in 5m59s Details CI / Test (push) Successful in 6m35s Details build-prerelease / Build neuron-ampere (push) Successful in 7m51s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Package cortex RPM (push) Successful in 1m21s Details build-prerelease / Build neuron-ada (push) Successful in 5m13s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m0s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m5s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m49s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m6s Details Two fixes to the spatial handling of images, validated against the HF transformers 4.57.1 qwen3_vl reference on beast. Vision tower (the real cause of poor spatial vision). The Stage-A tower encoded position two ways wrong, so the model saw image content but not layout (a row of 5 people read as "a line of 23", sky inverted), regardless of the LM-side rope: - Learned pos-embed was a naive sequential lookup of the first `n_patches` rows of the 48×48 (`num_position_embeddings=2304`) grid — wrong stride for a 28×28 patch grid. Now bilinearly interpolates the grid to `gh×gw` (port of HF `fast_pos_embed_interpolate`), row-major. - The 2D vision rotary was absent entirely. Added `VisionRotaryEmbedding` (θ=10000, dim=head_dim/2) applying per-patch `(row, col)` rotary to q/k in every ViT block via rope_slow, matching HF `apply_rotary_pos_emb_vision`. Both default on; `NEURON_VISION_LEGACY_POS=1` / `NEURON_VISION_LEGACY_ROPE=1` revert each for A/B (no rebuild). New unit tests: interpolation reduces to the sequential lookup at the native grid; rotary row/col structure. M-RoPE default on. The interleaved M-RoPE matches HF apply_interleaved_mrope / get_rope_index exactly and A/B'd strictly ≥ plain. `NEURON_MROPE` is now a kill switch (`=0` for plain), not opt-in — defaults should encode the model's trained behaviour, not freeze the broken state. Vision tower is plain candle (CPU-testable): built, clippy-clean, full workspace tests green locally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 20:53:07 +03:00

1 2 3 4 5 ...

267 Commits