helexa

helexa/helexa

Fork 0

Files

History

rob thijssen b3dc835375

build-prerelease / Resolve version stamps + change detection (push) Successful in 30s

Details

build-prerelease / Lint (fmt + clippy) (push) Successful in 2m23s

Details

build-prerelease / Build cortex binary (push) Successful in 2m29s

Details

build-prerelease / Build helexa-bench binary (push) Successful in 2m34s

Details

build-prerelease / Test (push) Successful in 4m33s

Details

build-prerelease / Build neuron-blackwell (push) Successful in 1m31s

Details

build-prerelease / Build neuron-ada (push) Successful in 2m13s

Details

build-prerelease / Build neuron-ampere (push) Successful in 2m50s

Details

build-prerelease / Package helexa-bench RPM (push) Successful in 1m17s

Details

build-prerelease / Package cortex RPM (push) Successful in 1m27s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 1m38s

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 1m42s

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 1m44s

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 55s

Details

ci: bound job runtime + stop dropping sccache on rustc signal-death

A neuron-blackwell build hung ~90 min (siblings finished in 2) and there
was no job timeout to kill it, so it sat burning a runner. Root cause of
the hang: the inline retry loop treated every failure identically and, on
its final attempt, rebuilt with sccache disabled. When the real failure
is a rustc SIGSEGV or an OOM-kill, an uncached rebuild does *more* work
under the same memory pressure — turning one transient compiler crash
into a wedged job.

Two fixes:

1. timeout-minutes on every job in build-prerelease.yml and ci.yml
   (builds 25, neuron CUDA build/cuda-check 35, packaging 20, COPR 60,
   fast jobs 10-15). A hang now dies in minutes, not hours.

2. New script/ci-cargo-escalate.sh replaces the five (prerelease) + three
   (ci) inline escalation loops. It classifies the failure:
     - signal death (exit >=128, or cargo reporting `signal: N`/SIGSEGV/
       SIGKILL) → compiler crash, NOT an sccache fault: keep the cache,
       one warm retry, then fail fast. Never escalate to uncached.
     - sccache fault (recognisable sccache error) → restart the server,
       retry, then one final uncached attempt.
     - deterministic compile/test error → fail fast (no wasteful retry).
   It also folds in the CUDA-image sccache probe the neuron/cuda-check
   jobs did inline. Classification verified locally against success,
   plain failure, exit-139, and the cargo-wrapped `signal: 11` form.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-15 13:02:50 +03:00

bench.py

feat(bench): reproducible batch-1 benchmark harness + first fleet numbers (#22 )

2026-06-12 15:39:13 +03:00

ci-cargo-escalate.sh

ci: bound job runtime + stop dropping sccache on rustc signal-death

2026-06-15 13:02:50 +03:00

dump_reference.py

feat(neuron): numerical validation against the transformers reference (#15 )

2026-06-12 23:35:57 +03:00

generate-packages-json.py

ci: add build-prerelease workflow for CUDA RPMs on rpm.lair.cafe