Two CI hygiene fixes uncovered while validating against the live fleet.
1. Same-day prerelease packages were being ordered by RPM-vercmp's
alpha-vs-digit precedence on the git SHA fragment, not by commit
chronology. With release stamps like "0.1.${YYYYMMDD}git${SHA}",
two commits on the same day produce the same numeric prefix and
rpmvercmp falls back to comparing the alphanumeric SHA suffixes,
where digit-leading SHAs are ranked above alpha-leading ones —
completely unrelated to which commit landed first. Verified with
rpmdev-vercmp:
gitabc1234 < gitdef5678 (old scheme — purely lexicographic)
Bumping the timestamp prefix to second-precision (%Y%m%d%H%M%S)
makes the numeric prefix strictly monotonic for any chronologically-
ordered commits, so the SHA fragment becomes a debug identifier
only — never participates in version ordering.
2. ci.yml and build-prerelease.yml both target the `rust` runner label
and both auto-trigger on push to main. The act-based runner reuses
/root/.cache/act/<hash>/hostexecutor/ across concurrent jobs, so
ci.yml's clippy and build-prerelease.yml's build-cortex were racing
each other's checkout/cleanup steps and corrupting in-flight
compile artifacts. Real fix is in gongfoo; workflow-level workaround
is a shared concurrency group with cancel-in-progress=false so the
two workflows queue sequentially on the same ref.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The build-prerelease workflow was workflow_dispatch-only, which meant
every commit needed a manual run dispatch before any host could
upgrade. That left rolling fixes (e.g. f9f5fa4's StateDirectory fix)
sitting on main with no published RPM behind them, so deploy.sh
silently fell back to an older prerelease.
Add 'push: branches: [main]' alongside the existing workflow_dispatch
trigger; the unstable channel now tracks head automatically. The
concurrency group is keyed on ${{ github.ref }} with
cancel-in-progress so successive rapid-fire pushes coalesce to one
build (latest wins) rather than queueing every intermediate commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The act runner container has no sudo binary; the runner user already
runs as root inside the container. Existing steps (rpmbuild, gpg, etc)
already invoke privileged commands directly without sudo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The currently-published runner-cuda-13.0 image (gongfoo) is missing
rust/cargo despite inheriting from runner-rust. Build-neuron fails
immediately with 'cargo: command not found' even though build-cortex
on the bare 'rust' runner builds fine.
Add a defensive `dnf install rust cargo clippy` step at the top of
build-neuron. Idempotent — on a properly-built runner image this is
a fast no-op; on the current broken image it installs the toolchain
in a few seconds. The runner image itself should be rebuilt in
gongfoo so this step becomes redundant.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The build-cortex and build-neuron jobs were running a copied-from-
mistralrs rustup install step. Both jobs use runner images that
already provide rust via dnf:
- runner-rust installs rust/cargo/clippy/rustfmt directly.
- runner-cuda-13.0 extends runner-rust.
Running 'rustup update stable' on top would install a parallel
rustup-managed toolchain and shadow the dnf one — confusing and
unnecessary. The existing ci.yml already trusts the dnf toolchain
without any install step, so match that behaviour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ampere (CUDA compute capability sm_86) to both the build-neuron
and package-neuron matrices, so helexa-neuron-ampere RPMs are built
and published alongside helexa-neuron-ada and helexa-neuron-blackwell.
The prerelease spec already lists ampere in its Conflicts: clause, so
no spec change is needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the candle deps were added, cargo builds run long enough that
the parallel fmt/clippy/test jobs (all on the `rust` runner label,
which appears to use act in host-executor mode) start racing each
other's intermediate temp files under
/root/.cache/act/<hash>/hostexecutor/target/debug/deps/
Concretely the test job hit:
error: No such file or directory at path
"target/debug/deps/.tmprlicL7"
Compiling unicode-ident
because another job's cargo invocation cleaned up the temp file
mid-compile. fmt and clippy happened to finish without their own
target races landing fatally, so only test failed visibly.
Set CARGO_TARGET_DIR=target-${{ github.job }} at the workflow level
so each job writes to its own target directory. sccache still backs
the actual rustc cache, so the rebuild penalty is just metadata not
full recompiles.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous "Import signing key" step inlined ${{ secrets.RPM_SIGNING_KEY }}
and ${{ secrets.RPM_SIGNING_KEY_ID }} directly into the run: block.
Template expansion writes the literal secret value into the rendered
shell script, and Gitea logs the rendered script — Gitea's masker may
not reliably scrub multi-line keys, so values can leak.
Move both secrets into the step's env: block (the same pattern the
"Set up SSH" step already uses) and reference $VARs in the script.
The script body now contains only variable names; the secret values
live in the process environment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a manually-triggered workflow that builds CUDA-flavoured neuron
binaries and a CPU cortex binary, packages them as Fedora RPMs, signs
them, and rsyncs to the unstable channel at
https://rpm.lair.cafe/fedora/43/x86_64/unstable/. Mirrors the build
pipeline used by grenade/mistralrs-package.
Pipeline:
- prepare: derive {version,short_sha,commit_date} from the checkout;
the prerelease Release stamp "0.1.YYYYMMDDgitSHORTSHA" sorts below
the eventual "1" stable release.
- build-cortex: cargo build --release -p cortex-cli on a rust runner.
- build-neuron: matrix over ada (sm_89) and blackwell (sm_120) on
cuda-13.0 runners; cargo build with features "cuda cudnn flash-attn"
and CUDA_COMPUTE_CAP set per flavour.
- package-{cortex,neuron}: rpmbuild on the rpm runner against the new
prebuilt-binary specs in rpm/.
- publish: import signing key, sign RPMs, rsync to oolon, createrepo_c
--update, then regenerate packages.json for the UI.
New specs are prebuilt-binary variants — they consume the artifact
from the build job rather than running cargo at rpmbuild time. Each
helexa-neuron-{flavour} package Conflicts with the other flavours and
with helexa-neuron (the future source-build stable package) so one
flavour is installed at a time on a given host.
neuron crate gains cudnn and flash-attn feature flags forwarding to
the corresponding candle features, so the CI build command compiles
those kernels into the binary.
sccache is intentionally NOT used in the prerelease jobs — CUDA
compute cap isn't in its cache key, so flavours would mis-hit each
other. Each prerelease build is a clean cargo build.
Required Gitea secrets (already in place for cortex.spec / COPR
workflow):
- RPM_SIGNING_KEY, RPM_SIGNING_KEY_ID
- RSYNC_SSH_KEY
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cache round-trip (download + unpack) was consistently taking
around 6 minutes, noticeably longer than the ~3 minute cold build
it was meant to accelerate. Net-negative on CI time — remove it.
sccache with the S3 backend still provides dep-level caching at a
much lower overhead, so we keep the majority of the cache benefit
without paying the actions/cache tarball cost.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidates the previous helexa/cortex and helexa/helexa-neuron COPR
projects into one shared project. Hosts enable a single repo and get
access to both packages — cortex for gateway hosts and helexa-neuron
for GPU nodes. Reduces the "which copr do I enable on this host"
friction, and makes it clear the two packages are parts of the same
helexa project suite.
CI keeps two independent publish jobs (copr-cortex and copr-neuron)
running in parallel; they now both target helexa/helexa with their
respective SRPMs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fedora's official repos ship a package named `neuron` — the NEURON
neural-simulation environment from Yale (see
https://src.fedoraproject.org/rpms/neuron). Having our own `neuron`
in the helexa COPR caused dnf5 to silently no-op `dnf install neuron`
because of the name collision, even with the COPR repo enabled and
keys imported. The only workarounds were full NEVRA (`dnf install
neuron-0.1.12-1.fc43.x86_64`) or a local file install — neither
acceptable for end-users.
Rename the RPM package to `helexa-neuron`. Keep binary (/usr/bin/neuron),
systemd unit (neuron.service), system user (neuron), and config dir
(/etc/neuron) unchanged — those are project-local contexts where the
short name is unambiguous. Follows Fedora subpackage-style naming
except with a vendor prefix rather than a parent-package prefix,
because neuron is an independent package from cortex (installed on
different hosts) and neither depends on the other.
Changes:
- neuron.spec -> helexa-neuron.spec (git rename)
- Name: neuron -> helexa-neuron (with comment explaining why)
- CI: srpm-neuron job now builds helexa-neuron-VERSION.tar.gz with the
matching top-level dir prefix, publishes to helexa/helexa-neuron COPR
- CI: bump-version job references helexa-neuron.spec
- CLAUDE.md: install instructions updated
Old helexa/neuron COPR project can be deleted after the first
helexa/helexa-neuron build lands.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously the srpm-* jobs generated a fresh %changelog entry and
shipped it to COPR, but the version-stamped spec pushed back to main
by the bump-version job only updated the Version: line — not the
%changelog section. The result: SRPM and in-tree spec diverged and
a fresh clone of the repo showed a perpetually empty changelog.
Run the rpm-changelog action in bump-version too. Now the committed
specs track the SRPMs: each release leaves a dated %changelog entry
in main covering commits since the previous tag, visible in git log
and in the repo's spec browser.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the local .gitea/scripts/generate-rpm-changelog.sh with the
shared composite action at https://git.lair.cafe/actions/rpm-changelog@v1.
Behaviour is identical — collect commits since the previous v* tag,
filter bump-version and merge noise, prepend a dated entry to the
spec — but the logic now lives in one place that other projects can
consume.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On every tag push, build a %changelog entry from the git log since
the previous v* tag and prepend it to each spec. Stops the initial
entry from drifting further and catches bogus-date / stale-version
warnings automatically since the generated date always matches the
day the CI runs.
The generator drops "chore: bump version" commits (bot-authored,
noisy in user-facing changelogs) and merge commits. Author defaults
to the gitea-actions identity but can be overridden via
CHANGELOG_AUTHOR env var if a human release is desired.
Requires fetch-depth: 0 on checkout so git describe can see prior
tags and git log can reach them.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the in-repo .gitea/scripts/copr-build.sh and per-job
copr-cli configuration with the shared composite action at
https://git.lair.cafe/actions/copr-publish@v1. Behaviour is
identical — submit, watch, dump per-chroot logs — but the logic
now lives in a single place that other projects can consume.
Removes the actions/checkout step from both COPR jobs since the
build script is no longer local to this repo.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously the COPR publish steps only surfaced copr-cli's status
updates (pending/importing/running). When a build failed, diagnosing
required clicking through to the COPR web UI. Now we submit with
--nowait, watch the build, then use copr-cli download-build to fetch
each chroot's builder-live.log and cat them as collapsible ::group::
blocks in the CI output.
Logic is factored into .gitea/scripts/copr-build.sh so cortex and
neuron jobs share it. Both COPR jobs now check out the repo to access
the script.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three complementary tweaks to close the gap sccache alone can't:
- CARGO_INCREMENTAL=0: reclaims the 17 incremental-mode cache misses
per run and prevents cargo from writing incremental fingerprints
that defeat sccache. Incremental mode is useless in CI anyway since
each run starts from scratch.
- actions/cache for ~/.cargo and target/: sidesteps sccache's
structural limits (proc-macro non-cacheables, clippy-vs-rustc
separate namespaces) by caching the whole build output keyed on
Cargo.lock. Also caches ~/.cargo/bin so the installed sccache
binary survives between runs.
- Drop the separate 'cargo build' step: 'cargo test --workspace'
builds everything anyway, so the standalone build was a full
redundant workspace compile pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The workflow-level env set RUSTC_WRAPPER=sccache for every step,
including the install step itself. cargo install sccache then
tried to invoke `sccache rustc -vV` to detect the toolchain before
sccache existed on PATH, failing with "No such file or directory".
Override RUSTC_WRAPPER to empty on the install step so cargo uses
rustc directly; subsequent steps still inherit the wrapper.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The distro sccache package lacks S3 support. Install from cargo
with --features s3 if the existing binary can't connect to the
S3 backend. Skips install if already present and working.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All Rust compilation steps now use sccache backed by MinIO S3
at caveman.kosherinata.internal:9000. Credentials via repo secrets
SCCACHE_S3_ACCESS_KEY and SCCACHE_S3_SECRET_KEY. Cache is shared
across all bare metal runners.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Token is only needed for the authenticated push, not the public
checkout. Set remote URL with token inline before pushing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cortex.spec: gateway binary, cortex.service systemd unit,
cortex.toml + models.toml config files
- neuron.spec: neuron binary, neuron.service systemd unit,
neuron.toml config file
- Parallel CI: srpm-cortex and srpm-neuron jobs build SRPMs
concurrently, then publish to separate COPR repos
(helexa/cortex and helexa/neuron)
- bump-version job: after both COPR publishes succeed, stamps
tag version into Cargo.toml, specs, Cargo.lock and pushes
to main via GITEA_TOKEN
- Shared cortex user/group across both packages
- Example configs: cortex.example.toml, neuron.example.toml,
models.example.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cortex.spec: gateway binary, cortex.service systemd unit,
cortex.toml + models.toml config files
- neuron.spec: neuron binary, neuron.service systemd unit,
neuron.toml config file
- Parallel CI: srpm-cortex and srpm-neuron jobs build SRPMs
concurrently, then publish to separate COPR repos
(helexa/cortex and helexa/neuron)
- Shared cortex user/group across both packages
- Example configs: cortex.example.toml, neuron.example.toml,
models.example.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add .gitea/workflows/ci.yml with fmt/clippy/test on all branches
and SRPM build + COPR publish on version tags
- Add cortex.spec for Fedora RPM packaging
- Add GPL-3.0-or-later LICENSE file
- Add cortex.example.toml with generic hostnames; gitignore cortex.toml
- Scrub infrastructure-specific hostnames from README.md, CLAUDE.md,
and doc comments
- Fix unused imports and clippy warnings to pass -D warnings
- Fix missing deps (bytes, reqwest, serde_json) exposed during build
- Run cargo fmt across workspace
- Update SPDX license identifier to GPL-3.0-or-later
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>