first post: a watchdog, a torrent tracker, and an unhinged friday

This commit is contained in:
2026-06-12 22:29:25 +03:00
commit 3a46a5d4bc
2 changed files with 57 additions and 0 deletions

57
last-week.md Normal file
View File

@@ -0,0 +1,57 @@
---
title: a watchdog, a torrent tracker, and an unhinged friday
slug: last-week
date: 2026-06-12
---
i pulled my activity feed for the week to see what i actually did, as opposed to what i *feel* like i did (which is mostly "stared at ci logs"). the data tells a story in four acts, one of which is an intermission.
## monday: teaching gpus to un-wedge themselves
the week opened deep in helexa's neuron engine, on the least glamorous problem in distributed inference: what to do when a tensor-parallel collective just... stops. monday's answer was a step watchdog that aborts wedged nccl collectives, plus auto-recovery for poisoned models — kill it, reload it, carry on, no human required.
this required going one layer down than i'd have liked: cudarc didn't expose `Comm::abort`, so i forked it and patched it to. there's also now a `NEURON_DEBUG_POISON` hook whose entire job is to deliberately poison a model so i can watch it resurrect itself. building a self-healing system means first building a self-harming one.
honesty in the commit log, monday edition: `chore: re-trigger deploy (#17 Stage 2, attempt 3)`.
## tuesday & wednesday: side quests
tuesday was three commits of toolchain housekeeping on gongfoo — switching from distro rust packages to rustup, adding musl support, and then the inevitable `fix: rustup syntax` because no one writes a rustup invocation correctly on the first try.
wednesday belonged to monsoon, my torrent client, which got two releases (v0.2.13 and v0.3.0) in one day. the desktop and web uis both learned to sort and filter their torrent lists, logs grew up (rotating files on desktop, journald on the server), there's a proper firewalld service definition now, and the dht peer log was told to stop narrating every tick of its life.
## thursday: [no events recorded]
zero commits. zero issues. zero comments. the feed thinks nothing happened. the feed doesn't index twisty mountain roads.
![two bikes, a tree, a stone fountain, and a valley that goes on forever](thursday.jpg)
thursday was my personal form of meditation, which happens on two wheels. a few hundred kilometres of corners, a stop under a tree by a stone fountain, and a valley stretching to the horizon while the bikes ticked themselves cool in the shade. no watchdogs, no wedged collectives — just the kind of head-clearing that, judging by what came next, works better than any retro.
## friday: 192 events
apparently a day in the saddle is rocket fuel, because friday i came in swinging. the feed logged 192 events — more than the rest of the week combined, times three.
the project formerly known as cortex got renamed to **helexa**, picked up a github org mirror, and had its readme rewritten around a sharpened positioning. then came the great triage: thirteen issues closed in one sweep, each with a real closing comment ("closing as completed", "closing as out-of-scope — counterproductive under the project's stability contract"), and a fresh tracking issue laying out a prioritised path to closing every issue that survived.
and then, the actual shipping. ten pull requests opened *and* merged in a single day:
- **prefix kv caching** across requests — four prs' worth, covering single-gpu, cpu, and the tensor-parallel path, plus two follow-up fixes about exactly where to snapshot the cache (the prefill boundary, then no wait, the last special-token boundary)
- **anthropic streaming sse translation** in the gateway
- a **reproducible benchmark harness** with the first published fleet numbers, then *updated* fleet numbers once the kv caching landed
- **per-request token metrics** — ttft and tok/s, end to end
- a **startup preflight** that fails loudly on nvidia driver/library mismatch instead of letting nccl fail cryptically twenty minutes later
- monday's auto-recovering models now stay **visible as `recovering`** so the router holds the route instead of dropping it
in between, the gongfoo ci got a parallel overhaul: sccache on the build jobs, change-aware builds with gated deploys, a reaper for zombie runners, a fix for a build-cancellation livelock, and — my favourite — patching cuda's `math_functions.h` because glibc 2.41 decided it also had opinions about `rsqrt`.
## the scoreboard
- **248 events** across github and gitea
- **10 prs** opened and merged (all on friday, because apparently i batch)
- **13 issues closed**, 7 opened — net negative, the good direction
- **2 releases** tagged
- **1 repo** renamed, **1 fork** created out of nccl-induced necessity
- **1 thursday** spent leaning into corners instead of commits
next week starts with a tagged release and a public writeup with benchmark numbers — there's an issue for it, so it's basically already done. right?

BIN
thursday.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 724 KiB