All checks were successful
CI / CUDA type-check (push) Successful in 1m36s
CI / Format (push) Successful in 31s
CI / Clippy (push) Successful in 2m47s
CI / Test (push) Successful in 4m33s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
Adds automated, longitudinal performance tracking across neuron builds,
replacing manual script/bench.py runs and hand edits to benchmarks.md.
neuron build metadata + GET /version:
- cortex-core: shared BuildInfo type (build_info.rs).
- neuron build.rs captures git SHA (preferring injected HELEXA_BUILD_SHA,
else git, else "unknown"), dirty flag, build timestamp, rustc version,
profile, target, enabled cargo features, and best-effort candle-core
version from Cargo.lock.
- New GET /version endpoint (version.rs) + clap --version long form.
- SHA injected in CI (build-neuron step) and helexa-neuron.spec
(%{?helexa_commit}) so tarball RPMs report the real SHA. /version is
now the canonical "which build is live" probe.
helexa-bench crate:
- Continuous daemon: hits each neuron directly on :13131, exercises each
warm (status==loaded) model, records every run into a SQLite
system-of-record stamped with the neuron's full BuildInfo.
- Version-aware: skips any (target, build SHA, model, scenario) cell
already at samples_per_version, so a steady fleet costs only cheap
/version + /models polls until a new SHA ships.
- Extensible Scenario trait; phase-1 chat-latency family ported verbatim
from bench.py (synthetic 128/4096-tok prompts, /no_think, streamed
TTFT + decode-window tok/s). `report` regenerates the benchmarks table.
- kind="openai" comparison targets scaffolded, not yet wired.
Packaging: data/helexa-bench.service (+ sysusers), prebuilt-binary RPM
spec (outbound-only, no firewalld), and build/package/publish wiring in
build-prerelease.yml with change detection.
Tests: cortex-core BuildInfo round-trip, neuron GET /version integration,
helexa-bench unit (prompt/SSE/config/store) + end-to-end sweep
(record -> skip -> resume on new SHA). Docs updated (benchmarks.md,
CLAUDE.md addendum).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
727 lines
28 KiB
YAML
727 lines
28 KiB
YAML
name: build-prerelease
|
|
|
|
# Builds CUDA-flavoured neuron binaries (and a single cortex binary),
|
|
# packages each as a Fedora RPM, signs them, and publishes to the
|
|
# `unstable` channel at rpm.lair.cafe.
|
|
#
|
|
# Change-aware: the `prepare` job diffs HEAD against the git sha
|
|
# embedded in the most recently *published* unstable RPM (per package)
|
|
# and skips builds whose inputs didn't change. Docs-only commits build
|
|
# nothing; gateway-only commits skip the 3 CUDA builds (and, via
|
|
# deploy.yml's own check-update gate, the neuron restarts + model
|
|
# cold-loads). Diffing against the published sha — not the previous
|
|
# push — means a failed run can never cause a change to be missed.
|
|
#
|
|
# Lint (fmt+clippy) and test run here as parallel jobs and gate
|
|
# `publish`; ci.yml no longer runs on pushes to main (see its trigger
|
|
# comment), so the two workflows stop competing for the same runners.
|
|
#
|
|
# The published packages are versioned as e.g.
|
|
# helexa-neuron-blackwell-0.1.16-0.1.20260518T140530.gitabcdef0.fc43.x86_64
|
|
# ^^^^^^^^^^^^^^^^^^ ^^^^^^^^
|
|
# commit time (s) commit sha
|
|
# so they sort BELOW the eventual 0.1.16-1 stable release, and so two
|
|
# commits on the same day are still strictly ordered by their commit
|
|
# timestamps (rather than by RPM-vercmp's alpha-vs-digit precedence
|
|
# on the SHA fragment).
|
|
|
|
on:
|
|
# Auto-build on every push to main so the unstable channel tracks
|
|
# head without a manual dispatch step.
|
|
push:
|
|
branches: [main]
|
|
# Manual dispatch still available to build from a non-main ref.
|
|
# Dispatched runs skip change detection and build everything.
|
|
workflow_dispatch:
|
|
inputs:
|
|
ref:
|
|
description: "Git ref to build (branch / tag / commit). Defaults to the workflow's branch."
|
|
required: false
|
|
default: ""
|
|
|
|
# Coalesce same-ref pushes: a newer push cancels the older in-flight
|
|
# run — the newest commit is the one we want on the fleet. The publish
|
|
# job keeps its own `rpm-publish` group (cancel=false) so an in-flight
|
|
# repo update is never interrupted. Runners are ephemeral (one VM per
|
|
# job) so concurrent runs no longer race on a shared workspace; the
|
|
# old shared `cortex-runner-pool` group with ci.yml is gone.
|
|
concurrency:
|
|
group: build-prerelease-${{ github.ref }}
|
|
cancel-in-progress: true
|
|
|
|
env:
|
|
CARGO_INCREMENTAL: "0"
|
|
CARGO_TERM_COLOR: "always"
|
|
|
|
jobs:
|
|
prepare:
|
|
name: Resolve version stamps + change detection
|
|
runs-on: rust
|
|
outputs:
|
|
version: ${{ steps.info.outputs.version }}
|
|
release: ${{ steps.info.outputs.release }}
|
|
short_sha: ${{ steps.info.outputs.short_sha }}
|
|
commit_timestamp: ${{ steps.info.outputs.commit_timestamp }}
|
|
build_cortex: ${{ steps.changes.outputs.build_cortex }}
|
|
build_neuron: ${{ steps.changes.outputs.build_neuron }}
|
|
build_bench: ${{ steps.changes.outputs.build_bench }}
|
|
check_rust: ${{ steps.changes.outputs.check_rust }}
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
ref: ${{ inputs.ref }}
|
|
fetch-depth: 0
|
|
|
|
- id: info
|
|
run: |
|
|
set -eux
|
|
VERSION=$(awk -F\" '/^version[[:space:]]*=/ { print $2; exit }' Cargo.toml)
|
|
SHORT_SHA=$(git rev-parse --short=7 HEAD)
|
|
# Second-precise commit timestamp gives the release stamp a
|
|
# strictly monotonic numeric prefix. The earlier %Y%m%d-only
|
|
# form let same-day builds be ordered by RPM's rpmvercmp
|
|
# rules over the SHA, which is non-chronological — e.g.
|
|
# "git602e8e1" sorts newer than "gitf9f5fa4" purely because
|
|
# rpmvercmp ranks digit-prefixed segments above alpha ones.
|
|
# The SHA stays only as a debug identifier; sort order is
|
|
# decided entirely by the timestamp.
|
|
COMMIT_TIMESTAMP=$(git log -1 --format=%cd --date=format:%Y%m%d%H%M%S HEAD)
|
|
RELEASE="0.1.${COMMIT_TIMESTAMP}.git${SHORT_SHA}"
|
|
echo "version=${VERSION}" >> "$GITHUB_OUTPUT"
|
|
echo "release=${RELEASE}" >> "$GITHUB_OUTPUT"
|
|
echo "short_sha=${SHORT_SHA}" >> "$GITHUB_OUTPUT"
|
|
echo "commit_timestamp=${COMMIT_TIMESTAMP}" >> "$GITHUB_OUTPUT"
|
|
|
|
- id: changes
|
|
run: |
|
|
set -ux
|
|
# Default: build everything. Detection only ever narrows
|
|
# this, and any failure along the way (manifest unreachable,
|
|
# unparsable, sha not in history after a force-push) leaves
|
|
# the full build in place. Manual dispatches always build
|
|
# everything — predictable when building odd refs.
|
|
BUILD_CORTEX=true
|
|
BUILD_NEURON=true
|
|
BUILD_BENCH=true
|
|
CHECK_RUST=true
|
|
|
|
if [ "${GITHUB_EVENT_NAME}" = "push" ]; then
|
|
MANIFEST_URL="https://rpm.lair.cafe/fedora/43/x86_64/unstable/packages.json"
|
|
if curl -fsS --max-time 20 -o /tmp/packages.json "$MANIFEST_URL"; then
|
|
# Latest published sha per package, by buildTime.
|
|
base_for() {
|
|
python3 - "$1" <<'PY'
|
|
import json, re, sys
|
|
name = sys.argv[1]
|
|
try:
|
|
with open("/tmp/packages.json") as f:
|
|
pkgs = json.load(f)["packages"]
|
|
cands = [p for p in pkgs if p.get("name") == name]
|
|
if cands:
|
|
latest = max(cands, key=lambda p: p.get("buildTime", 0))
|
|
m = re.search(r"git\.?([0-9a-f]{7,40})", latest.get("release", ""))
|
|
if m:
|
|
print(m.group(1))
|
|
except Exception:
|
|
pass
|
|
PY
|
|
}
|
|
|
|
# true if no usable base, else true iff the diff since
|
|
# the published sha touches the given path pattern.
|
|
decide() {
|
|
local base="$1" pattern="$2"
|
|
if [ -z "$base" ] \
|
|
|| ! git cat-file -e "${base}^{commit}" 2>/dev/null \
|
|
|| ! git merge-base --is-ancestor "$base" HEAD 2>/dev/null; then
|
|
echo true; return
|
|
fi
|
|
if git diff --name-only "${base}..HEAD" | grep -qE "$pattern"; then
|
|
echo true
|
|
else
|
|
echo false
|
|
fi
|
|
}
|
|
|
|
# cortex-core is shared by both binaries; Cargo.{toml,lock}
|
|
# affect both; this workflow file affects both.
|
|
NEURON_RE='^crates/neuron/|^crates/cortex-core/|^Cargo\.toml$|^Cargo\.lock$|^rpm/helexa-neuron-prerelease\.spec$|^data/neuron|^neuron\.example\.toml$|^\.gitea/workflows/build-prerelease\.yml$'
|
|
CORTEX_RE='^crates/cortex-gateway/|^crates/cortex-cli/|^crates/cortex-core/|^Cargo\.toml$|^Cargo\.lock$|^rpm/cortex-prerelease\.spec$|^data/cortex|^cortex\.example\.toml$|^models\.example\.toml$|^\.gitea/workflows/build-prerelease\.yml$'
|
|
BENCH_RE='^crates/helexa-bench/|^crates/cortex-core/|^Cargo\.toml$|^Cargo\.lock$|^rpm/helexa-bench-prerelease\.spec$|^data/helexa-bench|^helexa-bench\.example\.toml$|^\.gitea/workflows/build-prerelease\.yml$'
|
|
# Any Rust change (incl. crates not packaged here, e.g.
|
|
# helexa-acp) still needs lint+test on main.
|
|
RUST_RE='\.rs$|^crates/|Cargo\.toml$|^Cargo\.lock$'
|
|
|
|
CORTEX_BASE=$(base_for cortex)
|
|
NEURON_BASE=$(base_for helexa-neuron-blackwell)
|
|
BENCH_BASE=$(base_for helexa-bench)
|
|
BUILD_CORTEX=$(decide "$CORTEX_BASE" "$CORTEX_RE")
|
|
BUILD_NEURON=$(decide "$NEURON_BASE" "$NEURON_RE")
|
|
BUILD_BENCH=$(decide "$BENCH_BASE" "$BENCH_RE")
|
|
if [ "$BUILD_CORTEX" = "true" ] || [ "$BUILD_NEURON" = "true" ] || [ "$BUILD_BENCH" = "true" ]; then
|
|
CHECK_RUST=true
|
|
else
|
|
CHECK_RUST=$(decide "$CORTEX_BASE" "$RUST_RE")
|
|
fi
|
|
fi
|
|
fi
|
|
|
|
echo "build_cortex=${BUILD_CORTEX}" >> "$GITHUB_OUTPUT"
|
|
echo "build_neuron=${BUILD_NEURON}" >> "$GITHUB_OUTPUT"
|
|
echo "build_bench=${BUILD_BENCH}" >> "$GITHUB_OUTPUT"
|
|
echo "check_rust=${CHECK_RUST}" >> "$GITHUB_OUTPUT"
|
|
echo "### change detection: build_cortex=${BUILD_CORTEX} build_neuron=${BUILD_NEURON} build_bench=${BUILD_BENCH} check_rust=${CHECK_RUST}"
|
|
|
|
# fmt + clippy + test moved here from ci.yml for main pushes so the
|
|
# two workflows stop queueing against each other (ci.yml's checks
|
|
# used to delay build-cortex by ~12 minutes on the shared runner
|
|
# pool). They run in parallel with the builds and gate `publish`,
|
|
# not the builds themselves — a clippy warning still can't reach the
|
|
# fleet, but it also doesn't serialize the pipeline.
|
|
lint:
|
|
name: Lint (fmt + clippy)
|
|
needs: prepare
|
|
if: needs.prepare.outputs.check_rust == 'true'
|
|
runs-on: rust
|
|
env:
|
|
RUSTC_WRAPPER: sccache
|
|
SCCACHE_BUCKET: sccache
|
|
SCCACHE_ENDPOINT: http://caveman.kosherinata.internal:9000
|
|
SCCACHE_REGION: auto
|
|
SCCACHE_S3_USE_SSL: "false"
|
|
AWS_ACCESS_KEY_ID: ${{ secrets.SCCACHE_S3_ACCESS_KEY }}
|
|
AWS_SECRET_ACCESS_KEY: ${{ secrets.SCCACHE_S3_SECRET_KEY }}
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
ref: ${{ inputs.ref }}
|
|
- run: cargo fmt --check --all
|
|
# sccache failures come in two modes: transient races (a plain
|
|
# retry clears them) and a wedged/dead server, where every
|
|
# same-VM retry fails identically (sccache fatal error, ENOENT
|
|
# on its own tmp files). Escalate accordingly: retry → restart
|
|
# the server → final attempt uncached. A sick cache costs build
|
|
# time, never the run.
|
|
- name: Clippy (with sccache escalation)
|
|
run: |
|
|
for attempt in 1 2 3; do
|
|
echo "::group::clippy attempt ${attempt}"
|
|
if [ "${attempt}" -eq 3 ]; then
|
|
echo "final attempt: building without sccache"
|
|
export RUSTC_WRAPPER=""
|
|
fi
|
|
if cargo clippy --workspace -- -D warnings; then
|
|
echo "::endgroup::"
|
|
exit 0
|
|
fi
|
|
echo "::endgroup::"
|
|
echo "clippy failed on attempt ${attempt}"
|
|
if [ "${attempt}" -eq 1 ]; then
|
|
sccache --stop-server || true
|
|
sccache --start-server || true
|
|
fi
|
|
sleep 5
|
|
done
|
|
echo "clippy failed after 3 attempts"
|
|
exit 1
|
|
- run: sccache --show-stats || true
|
|
|
|
test:
|
|
name: Test
|
|
needs: prepare
|
|
if: needs.prepare.outputs.check_rust == 'true'
|
|
runs-on: rust
|
|
env:
|
|
RUSTC_WRAPPER: sccache
|
|
SCCACHE_BUCKET: sccache
|
|
SCCACHE_ENDPOINT: http://caveman.kosherinata.internal:9000
|
|
SCCACHE_REGION: auto
|
|
SCCACHE_S3_USE_SSL: "false"
|
|
AWS_ACCESS_KEY_ID: ${{ secrets.SCCACHE_S3_ACCESS_KEY }}
|
|
AWS_SECRET_ACCESS_KEY: ${{ secrets.SCCACHE_S3_SECRET_KEY }}
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
ref: ${{ inputs.ref }}
|
|
# See the lint job for the escalation rationale.
|
|
- name: Test (with sccache escalation)
|
|
run: |
|
|
for attempt in 1 2 3; do
|
|
echo "::group::test attempt ${attempt}"
|
|
if [ "${attempt}" -eq 3 ]; then
|
|
echo "final attempt: building without sccache"
|
|
export RUSTC_WRAPPER=""
|
|
fi
|
|
if cargo test --workspace; then
|
|
echo "::endgroup::"
|
|
exit 0
|
|
fi
|
|
echo "::endgroup::"
|
|
echo "test failed on attempt ${attempt}"
|
|
if [ "${attempt}" -eq 1 ]; then
|
|
sccache --stop-server || true
|
|
sccache --start-server || true
|
|
fi
|
|
sleep 5
|
|
done
|
|
echo "test failed after 3 attempts"
|
|
exit 1
|
|
- run: sccache --show-stats || true
|
|
|
|
build-cortex:
|
|
name: Build cortex binary
|
|
needs: prepare
|
|
if: needs.prepare.outputs.build_cortex == 'true'
|
|
# runner-rust image already provides rust/cargo/clippy/rustfmt via
|
|
# dnf — no rustup install step needed.
|
|
runs-on: rust
|
|
env:
|
|
RUSTC_WRAPPER: sccache
|
|
SCCACHE_BUCKET: sccache
|
|
SCCACHE_ENDPOINT: http://caveman.kosherinata.internal:9000
|
|
SCCACHE_REGION: auto
|
|
SCCACHE_S3_USE_SSL: "false"
|
|
AWS_ACCESS_KEY_ID: ${{ secrets.SCCACHE_S3_ACCESS_KEY }}
|
|
AWS_SECRET_ACCESS_KEY: ${{ secrets.SCCACHE_S3_SECRET_KEY }}
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
ref: ${{ inputs.ref }}
|
|
|
|
# Escalation mirrors the lint/test jobs: retry → restart the
|
|
# sccache server → final attempt uncached. A sick cache costs
|
|
# build time, never the run.
|
|
- name: Build cortex (release, with sccache escalation)
|
|
run: |
|
|
for attempt in 1 2 3; do
|
|
echo "::group::build attempt ${attempt}"
|
|
if [ "${attempt}" -eq 3 ]; then
|
|
echo "final attempt: building without sccache"
|
|
export RUSTC_WRAPPER=""
|
|
fi
|
|
if cargo build --release -p cortex-cli; then
|
|
echo "::endgroup::"
|
|
sccache --show-stats || true
|
|
exit 0
|
|
fi
|
|
echo "::endgroup::"
|
|
echo "build failed on attempt ${attempt}"
|
|
if [ "${attempt}" -eq 1 ]; then
|
|
sccache --stop-server || true
|
|
sccache --start-server || true
|
|
fi
|
|
sleep 5
|
|
done
|
|
echo "build failed after 3 attempts"
|
|
exit 1
|
|
|
|
- name: Stage binary
|
|
run: |
|
|
mkdir --parents artifacts
|
|
cp target/release/cortex artifacts/cortex
|
|
./artifacts/cortex --version || true
|
|
|
|
- uses: actions/upload-artifact@v3
|
|
with:
|
|
name: cortex-fc43
|
|
path: artifacts/cortex
|
|
retention-days: 1
|
|
|
|
build-bench:
|
|
name: Build helexa-bench binary
|
|
needs: prepare
|
|
if: needs.prepare.outputs.build_bench == 'true'
|
|
# Pure-Rust, non-CUDA binary — same runner as cortex.
|
|
runs-on: rust
|
|
env:
|
|
RUSTC_WRAPPER: sccache
|
|
SCCACHE_BUCKET: sccache
|
|
SCCACHE_ENDPOINT: http://caveman.kosherinata.internal:9000
|
|
SCCACHE_REGION: auto
|
|
SCCACHE_S3_USE_SSL: "false"
|
|
AWS_ACCESS_KEY_ID: ${{ secrets.SCCACHE_S3_ACCESS_KEY }}
|
|
AWS_SECRET_ACCESS_KEY: ${{ secrets.SCCACHE_S3_SECRET_KEY }}
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
ref: ${{ inputs.ref }}
|
|
|
|
- name: Build helexa-bench (release, with sccache escalation)
|
|
run: |
|
|
# Stamp the SHA helexa-bench records as bench_sha against every
|
|
# run (option_env! in sweep.rs reads it at compile time).
|
|
export HELEXA_BUILD_SHA="$(git rev-parse HEAD)"
|
|
for attempt in 1 2 3; do
|
|
echo "::group::build attempt ${attempt}"
|
|
if [ "${attempt}" -eq 3 ]; then
|
|
echo "final attempt: building without sccache"
|
|
export RUSTC_WRAPPER=""
|
|
fi
|
|
if cargo build --release -p helexa-bench; then
|
|
echo "::endgroup::"
|
|
sccache --show-stats || true
|
|
exit 0
|
|
fi
|
|
echo "::endgroup::"
|
|
echo "build failed on attempt ${attempt}"
|
|
if [ "${attempt}" -eq 1 ]; then
|
|
sccache --stop-server || true
|
|
sccache --start-server || true
|
|
fi
|
|
sleep 5
|
|
done
|
|
echo "build failed after 3 attempts"
|
|
exit 1
|
|
|
|
- name: Stage binary
|
|
run: |
|
|
mkdir --parents artifacts
|
|
cp target/release/helexa-bench artifacts/helexa-bench
|
|
./artifacts/helexa-bench --version || true
|
|
|
|
- uses: actions/upload-artifact@v3
|
|
with:
|
|
name: bench-fc43
|
|
path: artifacts/helexa-bench
|
|
retention-days: 1
|
|
|
|
build-neuron:
|
|
name: Build neuron-${{ matrix.flavour }}
|
|
needs: prepare
|
|
if: needs.prepare.outputs.build_neuron == 'true'
|
|
strategy:
|
|
fail-fast: false
|
|
matrix:
|
|
include:
|
|
- flavour: ampere
|
|
compute_cap: "86"
|
|
runner: cuda-13.0
|
|
cuda_home: /usr/local/cuda-13.0
|
|
build_jobs: 8
|
|
nvcc_threads: 4
|
|
cargo_features: "cuda cudnn"
|
|
- flavour: ada
|
|
compute_cap: "89"
|
|
runner: cuda-13.0
|
|
cuda_home: /usr/local/cuda-13.0
|
|
build_jobs: 8
|
|
nvcc_threads: 4
|
|
cargo_features: "cuda cudnn"
|
|
- flavour: blackwell
|
|
compute_cap: "120"
|
|
runner: cuda-13.0
|
|
cuda_home: /usr/local/cuda-13.0
|
|
build_jobs: 8
|
|
nvcc_threads: 4
|
|
cargo_features: "cuda cudnn"
|
|
runs-on: ${{ matrix.runner }}
|
|
env:
|
|
SCCACHE_BUCKET: sccache
|
|
SCCACHE_ENDPOINT: http://caveman.kosherinata.internal:9000
|
|
SCCACHE_REGION: auto
|
|
SCCACHE_S3_USE_SSL: "false"
|
|
AWS_ACCESS_KEY_ID: ${{ secrets.SCCACHE_S3_ACCESS_KEY }}
|
|
AWS_SECRET_ACCESS_KEY: ${{ secrets.SCCACHE_S3_SECRET_KEY }}
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
ref: ${{ inputs.ref }}
|
|
|
|
# Escalation mirrors the lint/test jobs: retry → restart the
|
|
# sccache server → final attempt uncached.
|
|
#
|
|
# The CUDA image may or may not ship sccache — probe inside this
|
|
# step (NOT via GITHUB_ENV from a prior step, which this runner
|
|
# does not propagate; observed: probe step said "enabled", build
|
|
# ran unwrapped, server stats showed 4 compile requests). A
|
|
# missing binary degrades to an uncached build rather than
|
|
# failing cargo at `sccache rustc -vV`. The cache covers the
|
|
# ~600-crate host-side dep tree (the bulk of the 10-14 min
|
|
# build); rustc compilations are shared across all three
|
|
# flavours, so even one run seeds the next.
|
|
- name: Build neuron with CUDA (${{ matrix.flavour }})
|
|
run: |
|
|
set -ux
|
|
if command -v sccache >/dev/null 2>&1; then
|
|
export RUSTC_WRAPPER=sccache
|
|
sccache --start-server 2>/dev/null || true
|
|
echo "sccache enabled"
|
|
else
|
|
echo "sccache not on PATH — building uncached"
|
|
fi
|
|
export PATH="${{ matrix.cuda_home }}/bin:${PATH}"
|
|
export LD_LIBRARY_PATH="${{ matrix.cuda_home }}/targets/x86_64-linux/lib:${{ matrix.cuda_home }}/lib64:${LD_LIBRARY_PATH:-}"
|
|
export LIBRARY_PATH="${{ matrix.cuda_home }}/targets/x86_64-linux/lib:${{ matrix.cuda_home }}/lib64:${LIBRARY_PATH:-}"
|
|
# Pin the build SHA neuron reports from GET /version. The git
|
|
# fallback in build.rs would also work on a full checkout, but
|
|
# injecting the exact checked-out commit is unambiguous under
|
|
# shallow/detached states and makes the artifact self-describing.
|
|
export HELEXA_BUILD_SHA="$(git rev-parse HEAD)"
|
|
for attempt in 1 2 3; do
|
|
echo "::group::build attempt ${attempt}"
|
|
if [ "${attempt}" -eq 3 ]; then
|
|
echo "final attempt: building without sccache"
|
|
export RUSTC_WRAPPER=""
|
|
fi
|
|
if cargo build --release -p neuron --features "${{ matrix.cargo_features }}"; then
|
|
echo "::endgroup::"
|
|
command -v sccache >/dev/null 2>&1 && sccache --show-stats || true
|
|
exit 0
|
|
fi
|
|
echo "::endgroup::"
|
|
echo "build failed on attempt ${attempt}"
|
|
if [ "${attempt}" -eq 1 ] && command -v sccache >/dev/null 2>&1; then
|
|
sccache --stop-server || true
|
|
sccache --start-server || true
|
|
fi
|
|
sleep 5
|
|
done
|
|
echo "build failed after 3 attempts"
|
|
exit 1
|
|
env:
|
|
CUDA_COMPUTE_CAP: ${{ matrix.compute_cap }}
|
|
CARGO_BUILD_JOBS: ${{ matrix.build_jobs }}
|
|
NVCC_THREADS: ${{ matrix.nvcc_threads }}
|
|
|
|
- name: Stage binary
|
|
run: |
|
|
mkdir --parents artifacts
|
|
cp target/release/neuron artifacts/neuron-${{ matrix.flavour }}
|
|
file "artifacts/neuron-${{ matrix.flavour }}"
|
|
|
|
- uses: actions/upload-artifact@v3
|
|
with:
|
|
name: neuron-${{ matrix.flavour }}-fc43
|
|
path: artifacts/neuron-${{ matrix.flavour }}
|
|
retention-days: 1
|
|
|
|
package-cortex:
|
|
name: Package cortex RPM
|
|
needs: [prepare, build-cortex]
|
|
runs-on: rpm
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
ref: ${{ inputs.ref }}
|
|
|
|
- uses: actions/download-artifact@v3
|
|
with:
|
|
name: cortex-fc43
|
|
path: artifacts/
|
|
|
|
- name: Build RPM
|
|
run: |
|
|
set -eux
|
|
rm -f ~/.rpmmacros
|
|
rpmdev-setuptree
|
|
cp artifacts/cortex ~/rpmbuild/SOURCES/
|
|
cp data/cortex.service ~/rpmbuild/SOURCES/
|
|
cp data/cortex-sysusers.conf ~/rpmbuild/SOURCES/
|
|
cp data/cortex-firewalld.xml ~/rpmbuild/SOURCES/
|
|
cp cortex.example.toml ~/rpmbuild/SOURCES/
|
|
cp models.example.toml ~/rpmbuild/SOURCES/
|
|
cp LICENSE ~/rpmbuild/SOURCES/
|
|
rpmbuild -bb rpm/cortex-prerelease.spec \
|
|
--define "cortex_version ${{ needs.prepare.outputs.version }}" \
|
|
--define "cortex_prerelease ${{ needs.prepare.outputs.release }}" \
|
|
--undefine dist \
|
|
--define "dist .fc43"
|
|
|
|
- uses: actions/upload-artifact@v3
|
|
with:
|
|
name: rpm-cortex-fc43
|
|
path: ~/rpmbuild/RPMS/x86_64/*.rpm
|
|
retention-days: 7
|
|
|
|
package-bench:
|
|
name: Package helexa-bench RPM
|
|
needs: [prepare, build-bench]
|
|
runs-on: rpm
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
ref: ${{ inputs.ref }}
|
|
|
|
- uses: actions/download-artifact@v3
|
|
with:
|
|
name: bench-fc43
|
|
path: artifacts/
|
|
|
|
- name: Build RPM
|
|
run: |
|
|
set -eux
|
|
rm -f ~/.rpmmacros
|
|
rpmdev-setuptree
|
|
cp artifacts/helexa-bench ~/rpmbuild/SOURCES/
|
|
cp data/helexa-bench.service ~/rpmbuild/SOURCES/
|
|
cp data/helexa-bench-sysusers.conf ~/rpmbuild/SOURCES/
|
|
cp helexa-bench.example.toml ~/rpmbuild/SOURCES/
|
|
cp LICENSE ~/rpmbuild/SOURCES/
|
|
rpmbuild -bb rpm/helexa-bench-prerelease.spec \
|
|
--define "bench_version ${{ needs.prepare.outputs.version }}" \
|
|
--define "bench_prerelease ${{ needs.prepare.outputs.release }}" \
|
|
--undefine dist \
|
|
--define "dist .fc43"
|
|
|
|
- uses: actions/upload-artifact@v3
|
|
with:
|
|
name: rpm-bench-fc43
|
|
path: ~/rpmbuild/RPMS/x86_64/*.rpm
|
|
retention-days: 7
|
|
|
|
package-neuron:
|
|
name: Package helexa-neuron-${{ matrix.flavour }} RPM
|
|
needs: [prepare, build-neuron]
|
|
runs-on: rpm
|
|
strategy:
|
|
fail-fast: false
|
|
matrix:
|
|
include:
|
|
- flavour: ampere
|
|
- flavour: ada
|
|
- flavour: blackwell
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
ref: ${{ inputs.ref }}
|
|
|
|
- uses: actions/download-artifact@v3
|
|
with:
|
|
name: neuron-${{ matrix.flavour }}-fc43
|
|
path: artifacts/
|
|
|
|
- name: Build RPM
|
|
run: |
|
|
set -eux
|
|
rm -f ~/.rpmmacros
|
|
rpmdev-setuptree
|
|
cp artifacts/neuron-${{ matrix.flavour }} ~/rpmbuild/SOURCES/
|
|
cp data/neuron.service ~/rpmbuild/SOURCES/
|
|
cp data/neuron-sysusers.conf ~/rpmbuild/SOURCES/
|
|
cp data/neuron-firewalld.xml ~/rpmbuild/SOURCES/
|
|
cp neuron.example.toml ~/rpmbuild/SOURCES/
|
|
cp LICENSE ~/rpmbuild/SOURCES/
|
|
rpmbuild -bb rpm/helexa-neuron-prerelease.spec \
|
|
--define "neuron_version ${{ needs.prepare.outputs.version }}" \
|
|
--define "neuron_flavour ${{ matrix.flavour }}" \
|
|
--define "neuron_prerelease ${{ needs.prepare.outputs.release }}" \
|
|
--undefine dist \
|
|
--define "dist .fc43"
|
|
|
|
- uses: actions/upload-artifact@v3
|
|
with:
|
|
name: rpm-neuron-${{ matrix.flavour }}-fc43
|
|
path: ~/rpmbuild/RPMS/x86_64/*.rpm
|
|
retention-days: 7
|
|
|
|
publish:
|
|
name: Publish to rpm.lair.cafe (unstable)
|
|
needs: [lint, test, package-cortex, package-neuron, package-bench]
|
|
# Runs when at least one package was built and nothing failed.
|
|
# lint/test may be skipped (docs-only refs never get here because
|
|
# no packages build), but a real failure in any blocks the
|
|
# fleet from receiving the RPMs.
|
|
if: >-
|
|
${{
|
|
!cancelled()
|
|
&& (needs.lint.result == 'success' || needs.lint.result == 'skipped')
|
|
&& (needs.test.result == 'success' || needs.test.result == 'skipped')
|
|
&& (needs.package-cortex.result == 'success' || needs.package-neuron.result == 'success' || needs.package-bench.result == 'success')
|
|
&& needs.package-cortex.result != 'failure'
|
|
&& needs.package-neuron.result != 'failure'
|
|
&& needs.package-bench.result != 'failure'
|
|
}}
|
|
runs-on: rpm
|
|
concurrency:
|
|
group: rpm-publish
|
|
cancel-in-progress: false
|
|
env:
|
|
RPM_REPO_HOST: oolon.kosherinata.internal
|
|
FEDORA_VERSION: "43"
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
ref: ${{ inputs.ref }}
|
|
|
|
- name: Download all built RPMs
|
|
uses: actions/download-artifact@v3
|
|
with:
|
|
path: rpms/
|
|
pattern: rpm-*-fc43
|
|
|
|
- name: Flatten RPM artifacts
|
|
run: |
|
|
set -eux
|
|
find rpms/ -name '*.rpm' -exec mv --target-directory=rpms/ {} +
|
|
find rpms/ -mindepth 1 -type d -empty -delete
|
|
ls -la rpms/
|
|
|
|
- name: Check for sequoia-sq
|
|
run: |
|
|
if ! command -v sq &> /dev/null; then
|
|
echo "ERROR: sequoia-sq is not installed. Install with: sudo dnf install sequoia-sq"
|
|
exit 1
|
|
fi
|
|
|
|
- name: Import signing key
|
|
env:
|
|
# Pass secrets via env so values stay out of the rendered shell
|
|
# script (which Gitea includes in step logs). Template
|
|
# expansion of ${{ secrets.X }} inside `run:` writes the literal
|
|
# value into the script and depends on Gitea's log masker to
|
|
# scrub it — fragile for multi-line keys.
|
|
RPM_SIGNING_KEY: ${{ secrets.RPM_SIGNING_KEY }}
|
|
RPM_SIGNING_KEY_ID: ${{ secrets.RPM_SIGNING_KEY_ID }}
|
|
run: |
|
|
echo "$RPM_SIGNING_KEY" | gpg --batch --import
|
|
fpr=$(gpg --batch --with-colons --list-keys "$RPM_SIGNING_KEY_ID" | awk -F: '/^fpr:/ { print $10; exit }')
|
|
echo "${fpr}:6:" | gpg --batch --import-ownertrust
|
|
sed "s/@GPG_NAME@/$RPM_SIGNING_KEY_ID/" rpm/rpmmacros > ~/.rpmmacros
|
|
|
|
- name: Sign RPMs
|
|
run: |
|
|
set -eux
|
|
for rpm in rpms/*.rpm; do
|
|
echo "signing ${rpm}..."
|
|
rpm --addsign "${rpm}"
|
|
done
|
|
|
|
- name: Set up SSH for rsync
|
|
run: |
|
|
install --directory --mode 700 ~/.ssh
|
|
echo "${RSYNC_SSH_KEY}" | install --mode 600 /dev/stdin ~/.ssh/id_ed25519
|
|
env:
|
|
RSYNC_SSH_KEY: ${{ secrets.RSYNC_SSH_KEY }}
|
|
|
|
- name: Test SSH connectivity
|
|
run: |
|
|
ssh -o StrictHostKeyChecking=accept-new "gitea_ci@${RPM_REPO_HOST}" exit
|
|
|
|
- name: Ensure unstable repo directory exists
|
|
run: |
|
|
ssh "gitea_ci@${RPM_REPO_HOST}" \
|
|
"mkdir --parents /var/www/rpm/fedora/${FEDORA_VERSION}/x86_64/unstable"
|
|
|
|
- name: Sync RPMs to unstable repo
|
|
run: |
|
|
rsync \
|
|
--archive \
|
|
--verbose \
|
|
--chmod D755,F644 \
|
|
rpms/*.rpm \
|
|
"gitea_ci@${RPM_REPO_HOST}:/var/www/rpm/fedora/${FEDORA_VERSION}/x86_64/unstable/"
|
|
|
|
- name: Update unstable repo metadata
|
|
run: |
|
|
ssh "gitea_ci@${RPM_REPO_HOST}" \
|
|
"cd /var/www/rpm/fedora/${FEDORA_VERSION}/x86_64/unstable && createrepo_c --update ."
|
|
|
|
- name: Generate packages.json manifest
|
|
run: |
|
|
scp script/generate-packages-json.py "gitea_ci@${RPM_REPO_HOST}:/tmp/"
|
|
ssh "gitea_ci@${RPM_REPO_HOST}" \
|
|
"python3 /tmp/generate-packages-json.py \
|
|
--repodata-dir /var/www/rpm/fedora/${FEDORA_VERSION}/x86_64/unstable/repodata \
|
|
--output /var/www/rpm/fedora/${FEDORA_VERSION}/x86_64/unstable/packages.json \
|
|
--base-url https://rpm.lair.cafe/fedora/${FEDORA_VERSION}/x86_64/unstable"
|