Files
cortex/.gitea/workflows/ci.yml
rob thijssen cad7552104
Some checks failed
CI / Test (push) Waiting to run
CI / CUDA type-check (push) Failing after 18s
build-prerelease / Resolve version stamps (push) Successful in 30s
CI / Format (push) Successful in 31s
CI / Clippy (push) Successful in 2m25s
build-prerelease / Build cortex binary (push) Successful in 5m19s
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package cortex RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-blackwell (push) Has been cancelled
CI / Build cortex SRPM (push) Has been cancelled
CI / Build neuron SRPM (push) Has been cancelled
CI / Publish cortex to COPR (push) Has been cancelled
CI / Publish neuron to COPR (push) Has been cancelled
CI / Bump version in source (push) Has been cancelled
build-prerelease / Build neuron-ampere (push) Has been cancelled
ci: clear sccache env on cuda-check so cargo doesn't try to wrap rustc
CI run 255 job 3 (CUDA type-check) fails with:

  error: could not execute process `*** rustc -vV` (never executed)
    Caused by: No such file or directory (os error 2)

The redacted `***` is `sccache`. The ci.yml workflow-level env block
sets `RUSTC_WRAPPER: sccache` because the generic `rust` runner has
sccache installed and routes the cache to caveman.kosherinata.internal.
The new `cuda-check` job runs on `cuda-13.0` (where nvcc lives), and
that runner doesn't carry sccache on PATH — so cargo's first action
(`sccache rustc -vV` to probe the compiler version) fails before
borrow-check even starts.

`build-prerelease.yml`, which uses the same `cuda-13.0` runner for
the actual release neuron builds, deliberately does NOT set
RUSTC_WRAPPER. That's the pattern this commit applies.

Fix: override `RUSTC_WRAPPER` (plus the SCCACHE_* and AWS_* env
locally on the job. We lose caching on the cuda-check job (it's
borrow-check-only and finishes in a couple minutes anyway), but
the gate runs.

The job's purpose — fail fast on `#[cfg(feature = "cuda")]`
borrowck errors that the default-feature gate misses — is what
matters, and that purpose was undermined by the env inheritance.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-31 13:55:18 +03:00

346 lines
11 KiB
YAML

name: CI
on:
push:
branches: ["**"]
tags: ["v*"]
pull_request:
branches: [main]
# Share a concurrency group with build-prerelease.yml so the two
# workflows don't race on the same `rust` runner workspace (act's
# /root/.cache/act/<hash>/hostexecutor/ is shared across concurrent
# jobs and one job's checkout step nukes another's in-flight build
# files). cancel-in-progress=false → they queue; same-ref pushes
# coalesce per workflow via cancel-in-progress on each.
concurrency:
group: cortex-runner-pool-${{ github.ref }}
cancel-in-progress: false
env:
CARGO_INCREMENTAL: "0"
RUSTC_WRAPPER: sccache
SCCACHE_BUCKET: sccache
SCCACHE_ENDPOINT: http://caveman.kosherinata.internal:9000
SCCACHE_REGION: auto
SCCACHE_S3_USE_SSL: "false"
AWS_ACCESS_KEY_ID: ${{ secrets.SCCACHE_S3_ACCESS_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.SCCACHE_S3_SECRET_KEY }}
# fmt, clippy, and test all run in parallel on the same `rust` runner
# and would otherwise share /root/.cache/act/<hash>/hostexecutor/target/,
# racing each other's cargo temp files (.tmpXXXXXX) and failing builds
# mid-compile. Give each job its own target directory so the invocations
# don't collide. sccache still backs the actual rustc cache, so the
# rebuild penalty is small.
CARGO_TARGET_DIR: target-${{ github.job }}
jobs:
fmt:
name: Format
runs-on: rust
steps:
- uses: actions/checkout@v4
- run: cargo fmt --check --all
clippy:
name: Clippy
runs-on: rust
steps:
- uses: actions/checkout@v4
# sccache occasionally fails with spurious race-condition errors;
# retrying the same invocation succeeds without code changes.
# Allow up to 3 attempts before declaring real failure.
- name: Clippy (with retry)
run: |
for attempt in 1 2 3; do
echo "::group::clippy attempt ${attempt}"
if cargo clippy --workspace -- -D warnings; then
echo "::endgroup::"
exit 0
fi
echo "::endgroup::"
echo "clippy failed on attempt ${attempt}"
if [ "${attempt}" -lt 3 ]; then
sleep 5
fi
done
echo "clippy failed after 3 attempts"
exit 1
- run: sccache --show-stats
test:
name: Test
runs-on: rust
steps:
- uses: actions/checkout@v4
# See the clippy job for why this is retried.
- name: Test (with retry)
run: |
for attempt in 1 2 3; do
echo "::group::test attempt ${attempt}"
if cargo test --workspace; then
echo "::endgroup::"
exit 0
fi
echo "::endgroup::"
echo "test failed on attempt ${attempt}"
if [ "${attempt}" -lt 3 ]; then
sleep 5
fi
done
echo "test failed after 3 attempts"
exit 1
- run: sccache --show-stats
# Type-check the CUDA-only code path. Borrow-check-only — we
# never run the tests here (the runner has no GPU). This catches
# the category of bug where a refactor compiles fine under the
# default feature set (which is what the `clippy` and `test` jobs
# exercise) but fails inside a `#[cfg(feature = "cuda")]` block.
# `runs-on: cuda-13.0` selects the runner that ships nvcc /
# cudarc's build prerequisites. The generic `rust` and `rpm`
# runners don't have them (the previous label `rpm` was tried
# first and tripped cudarc's `nvcc --version` build script —
# see commit history).
cuda-check:
name: CUDA type-check
runs-on: cuda-13.0
# The workflow-level env sets `RUSTC_WRAPPER: sccache` for the
# `rust` runner (where fmt/clippy/test live and sccache is
# installed). The `cuda-13.0` runner doesn't have sccache on
# PATH, so inheriting the wrapper makes cargo bail with
# `could not execute process `sccache rustc -vV` (never executed)`
# before borrow-check even starts. Clear it locally. Also clear
# SCCACHE_* so cargo doesn't try to contact the cache (the
# remote auth headers come from secrets that aren't present on
# this runner either). Lose the cache, keep the gate.
env:
RUSTC_WRAPPER: ""
SCCACHE_BUCKET: ""
SCCACHE_ENDPOINT: ""
SCCACHE_REGION: ""
SCCACHE_S3_USE_SSL: ""
AWS_ACCESS_KEY_ID: ""
AWS_SECRET_ACCESS_KEY: ""
steps:
- uses: actions/checkout@v4
- name: cargo check --features cuda (with retry)
run: |
for attempt in 1 2 3; do
echo "::group::cuda-check attempt ${attempt}"
if cargo check -p neuron --features cuda --all-targets; then
echo "::endgroup::"
exit 0
fi
echo "::endgroup::"
echo "cuda-check failed on attempt ${attempt}"
if [ "${attempt}" -lt 3 ]; then
sleep 5
fi
done
echo "cuda-check failed after 3 attempts"
exit 1
srpm-cortex:
name: Build cortex SRPM
runs-on: rpm
needs: [fmt, clippy, test, cuda-check]
if: startsWith(github.ref, 'refs/tags/v')
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Determine version
id: version
run: |
VERSION="${GITHUB_REF#refs/tags/v}"
echo "VERSION=${VERSION}" >> "$GITHUB_OUTPUT"
- name: Stamp version
run: |
VERSION="${{ steps.version.outputs.VERSION }}"
sed -i '/\[workspace\.package\]/,/\[/{ s/^version = ".*"/version = "'"${VERSION}"'"/ }' Cargo.toml
sed -i "s/^Version:.*/Version: ${VERSION}/" cortex.spec
- name: Generate changelog entry
uses: https://git.lair.cafe/actions/rpm-changelog@v1
with:
spec: cortex.spec
version: ${{ steps.version.outputs.VERSION }}
- name: Generate source tarball
run: |
set -ex
VERSION="${{ steps.version.outputs.VERSION }}"
tar czf /tmp/cortex-${VERSION}.tar.gz \
--transform "s,^\.,cortex-${VERSION}," \
--exclude='./target' \
--exclude='./.git' \
--exclude='*.tar.gz' \
--exclude='*.src.rpm' \
.
mv /tmp/cortex-${VERSION}.tar.gz .
- name: Vendor Rust dependencies
run: |
VERSION="${{ steps.version.outputs.VERSION }}"
cargo vendor vendor/
tar czf cortex-${VERSION}-vendor.tar.gz vendor/
rm -rf vendor/
- name: Build SRPM
run: |
rpmbuild -bs cortex.spec \
--define "_sourcedir $(pwd)" \
--define "_srcrpmdir $(pwd)"
- name: Upload SRPM artifact
uses: actions/upload-artifact@v3
with:
name: srpm-cortex
path: "*.src.rpm"
srpm-neuron:
name: Build neuron SRPM
runs-on: rpm
needs: [fmt, clippy, test, cuda-check]
if: startsWith(github.ref, 'refs/tags/v')
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Determine version
id: version
run: |
VERSION="${GITHUB_REF#refs/tags/v}"
echo "VERSION=${VERSION}" >> "$GITHUB_OUTPUT"
- name: Stamp version
run: |
VERSION="${{ steps.version.outputs.VERSION }}"
sed -i '/\[workspace\.package\]/,/\[/{ s/^version = ".*"/version = "'"${VERSION}"'"/ }' Cargo.toml
sed -i "s/^Version:.*/Version: ${VERSION}/" helexa-neuron.spec
- name: Generate changelog entry
uses: https://git.lair.cafe/actions/rpm-changelog@v1
with:
spec: helexa-neuron.spec
version: ${{ steps.version.outputs.VERSION }}
- name: Generate source tarball
run: |
set -ex
VERSION="${{ steps.version.outputs.VERSION }}"
tar czf /tmp/helexa-neuron-${VERSION}.tar.gz \
--transform "s,^\.,helexa-neuron-${VERSION}," \
--exclude='./target' \
--exclude='./.git' \
--exclude='*.tar.gz' \
--exclude='*.src.rpm' \
.
mv /tmp/helexa-neuron-${VERSION}.tar.gz .
- name: Vendor Rust dependencies
run: |
VERSION="${{ steps.version.outputs.VERSION }}"
cargo vendor vendor/
tar czf helexa-neuron-${VERSION}-vendor.tar.gz vendor/
rm -rf vendor/
- name: Build SRPM
run: |
rpmbuild -bs helexa-neuron.spec \
--define "_sourcedir $(pwd)" \
--define "_srcrpmdir $(pwd)"
- name: Upload SRPM artifact
uses: actions/upload-artifact@v3
with:
name: srpm-neuron
path: "*.src.rpm"
copr-cortex:
name: Publish cortex to COPR
runs-on: fedora-43
needs: srpm-cortex
steps:
- name: Download SRPM
uses: actions/download-artifact@v3
with:
name: srpm-cortex
- name: Publish to COPR
uses: https://git.lair.cafe/actions/copr-publish@v1
with:
project: helexa/helexa
srpm: "*.src.rpm"
copr-config: ${{ secrets.COPR_CONFIG }}
copr-neuron:
name: Publish neuron to COPR
runs-on: fedora-43
needs: srpm-neuron
steps:
- name: Download SRPM
uses: actions/download-artifact@v3
with:
name: srpm-neuron
- name: Publish to COPR
uses: https://git.lair.cafe/actions/copr-publish@v1
with:
project: helexa/helexa
srpm: "*.src.rpm"
copr-config: ${{ secrets.COPR_CONFIG }}
bump-version:
name: Bump version in source
runs-on: rust
needs: [copr-cortex, copr-neuron]
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Determine version
id: version
run: echo "VERSION=${GITHUB_REF#refs/tags/v}" >> "$GITHUB_OUTPUT"
- name: Stamp version
run: |
VERSION="${{ steps.version.outputs.VERSION }}"
sed -i '/\[workspace\.package\]/,/\[/{ s/^version = ".*"/version = "'"${VERSION}"'"/ }' Cargo.toml
sed -i "s/^Version:.*/Version: ${VERSION}/" cortex.spec
sed -i "s/^Version:.*/Version: ${VERSION}/" helexa-neuron.spec
cargo check --workspace 2>/dev/null || true
- name: Generate cortex changelog entry
uses: https://git.lair.cafe/actions/rpm-changelog@v1
with:
spec: cortex.spec
version: ${{ steps.version.outputs.VERSION }}
- name: Generate helexa-neuron changelog entry
uses: https://git.lair.cafe/actions/rpm-changelog@v1
with:
spec: helexa-neuron.spec
version: ${{ steps.version.outputs.VERSION }}
- name: Commit and push
env:
GITEA_TOKEN: ${{ secrets.GITEA_TOKEN }}
run: |
VERSION="${{ steps.version.outputs.VERSION }}"
git config user.name "Gitea Actions"
git config user.email "actions@git.lair.cafe"
git add Cargo.toml Cargo.lock cortex.spec helexa-neuron.spec
if git diff --cached --quiet; then
echo "Nothing to commit for ${VERSION}"
else
git commit -m "chore: bump version to ${VERSION}"
git remote set-url origin "https://gitea-actions:${GITEA_TOKEN}@git.lair.cafe/helexa/cortex.git"
git push origin HEAD:main
fi