diff --git a/CLAUDE.md b/CLAUDE.md index 6a4b4bb..af5142b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -10,20 +10,25 @@ This repo packages [mistral.rs](https://github.com/EricLBuehler/mistral.rs) (a R ### Pipeline flow -1. **poll-upstream** (`.gitea/workflows/poll-upstream.yml`) — cron every 15 min, checks GitHub for latest mistral.rs release tag. If the corresponding RPM doesn't exist on `rpm.lair.cafe`, triggers `build-release`. +1. **poll-upstream** (`.gitea/workflows/poll-upstream.yml`) — cron every 15 min, checks GitHub for latest mistral.rs release tag. If the corresponding RPMs don't exist on `rpm.lair.cafe`, triggers `build-release`. Also checks upstream `main` branch HEAD and triggers `build-prerelease` for the unstable repo. 2. **build-release** (`.gitea/workflows/build-release.yml`) — three-stage pipeline: - - **plan** — reads `flavours.yml`, emits a JSON matrix of flavours + stripped version. - - **build** — runs on a `cuda-13.0` runner. Clones upstream at tag, calls `script/build-binary.sh` to `cargo build --release --locked` with flavour-specific CUDA features. + - **build** — runs on a `cuda-13.0` runner. Clones upstream at tag, runs `cargo build --release --locked` with flavour-specific CUDA features. - **package** — runs `rpmbuild -bb rpm/mistralrs.spec` with `--define` for version and flavour. - **publish** — GPG-signs RPMs, rsyncs to `rpm.lair.cafe`, runs `createrepo_c --update`. Uses concurrency group `rpm-publish` to prevent metadata races. +3. **build-prerelease** (`.gitea/workflows/build-prerelease.yml`) — same structure as build-release but clones at a specific commit from `main`, omits `--locked`, uses prerelease release suffix, and publishes to the unstable repo at `rpm.lair.cafe/fedora/$releasever/$basearch/unstable/`. ### Flavours -Defined in `flavours.yml`. Each flavour specifies a name, `cuda_home`, `cargo_features`, and `compute_caps`. The RPM spec uses `update-alternatives` so multiple flavours can coexist, with priority: base=10, fa=20, nccl=30. +Defined in the workflow matrix. Each flavour targets a specific GPU generation using the same CUDA 13.0 toolkit and features (cuda, cudnn, flash-attn, nccl), varying only the compute capability. + +| Flavour | Compute cap | GPU generation | +|------------|-------------|---------------------------| +| ampere | sm_86 | RTX 3060, A2000–A6000 | +| ada | sm_89 | RTX 4060–4090, L40 | +| blackwell | sm_120 | RTX 5090, B100, B200 | ### Key files -- `flavours.yml` — flavour matrix definition (drives CI matrix) - `rpm/mistralrs.spec` — RPM spec (binary-only package, no rebuild) - `rpm/systemd/mistralrs@.service` — templated systemd unit (`@BINARY@` and `@FLAVOUR@` are sed-replaced during rpmbuild) - `rpm/systemd/mistralrs@.conf.example` — example env file for instances @@ -31,23 +36,20 @@ Defined in `flavours.yml`. Each flavour specifies a name, `cuda_home`, `cargo_fe ## Commands -Build a binary locally (requires CUDA toolkit): -```bash -FLAVOUR_NAME=cuda13 CUDA_HOME=/usr/local/cuda-13.0 CARGO_FEATURES="cuda cudnn flash-attn nccl" CUDA_COMPUTE_CAP=120 SRC_DIR=./src ./script/build-binary.sh -``` - Build an RPM from a pre-built binary: ```bash rpmdev-setuptree -cp artifacts/mistralrs-cuda13 ~/rpmbuild/SOURCES/ +cp artifacts/mistralrs-ada ~/rpmbuild/SOURCES/ cp rpm/systemd/mistralrs@.service ~/rpmbuild/SOURCES/ cp rpm/systemd/mistralrs@.conf.example ~/rpmbuild/SOURCES/ -rpmbuild -bb rpm/mistralrs.spec --define "mistralrs_version 0.7.0" --define "mistralrs_flavour cuda13" +rpmbuild -bb rpm/mistralrs.spec --define "mistralrs_version 0.8.0" --define "mistralrs_flavour ada" ``` ## Infrastructure - CI runs on Gitea Actions (self-hosted), not GitHub Actions - RPM repo hosted at `rpm.lair.cafe` on host `oolon.kosherinata.internal` +- Stable repo: `rpm.lair.cafe/fedora/$releasever/$basearch/` +- Unstable repo: `rpm.lair.cafe/fedora/$releasever/$basearch/unstable/` - TLS via Let's Encrypt with Cloudflare DNS challenge - Publish uses rsync over SSH as `gitea_ci` user diff --git a/readme.md b/readme.md index 021fdc0..4050c83 100644 --- a/readme.md +++ b/readme.md @@ -8,31 +8,34 @@ This repo does not contain the mistral.rs source. It clones upstream at a given Two Gitea Actions workflows drive the pipeline: -1. **poll-upstream** runs every 15 minutes, checks GitHub for the latest mistral.rs release tag, and triggers a build if the corresponding RPM doesn't already exist on `rpm.lair.cafe`. +1. **poll-upstream** runs every 15 minutes, checks GitHub for the latest mistral.rs release tag, and triggers a build if the corresponding RPM doesn't already exist on `rpm.lair.cafe`. It also checks the upstream `main` branch HEAD and triggers prerelease builds for the unstable repo. 2. **build-release** runs in three stages: - **build** — clones upstream at the tag and compiles `mistralrs` with flavour-specific CUDA features on a `cuda-13.0` runner. - **package** — builds an RPM from the compiled binary using `rpmbuild`. - **publish** — GPG-signs the RPMs, rsyncs them to `rpm.lair.cafe`, and updates the repo metadata with `createrepo_c`. +3. **build-prerelease** — same structure as build-release but clones at a specific commit from `main`, uses versioning from `Cargo.toml` with a prerelease release suffix (e.g. `0.8.1-0.1.20260511git1a2b3c4`), and publishes to the unstable repo. ### Flavours -Build flavours are defined in the workflow matrix. Each flavour specifies a name, CUDA home path, cargo features, and compute capabilities. The RPM spec uses `update-alternatives` so multiple flavours can coexist, with priority: base=10, fa=20, nccl=30. +Build flavours are defined in the workflow matrix. Each flavour targets a specific GPU generation with the same CUDA 13.0 toolkit and features (cuda, cudnn, flash-attn, nccl). Currently defined: -| Flavour | Features | Compute cap | -|----------|-------------------------------|-------------| -| cuda13 | cuda, cudnn, flash-attn, nccl | sm_120 | +| Flavour | Compute cap | GPU generation | +|------------|-------------|---------------------------| +| ampere | sm_86 | RTX 3060, A2000–A6000 | +| ada | sm_89 | RTX 4060–4090, L40 | +| blackwell | sm_120 | RTX 5090, B100, B200 | ### Systemd integration -Each RPM installs a templated systemd unit (`mistralrs-@.service`). Instances are configured via environment files in `/etc/mistralrs/`: +Each RPM installs a templated systemd unit (`mistralrs@.service`). Instances are configured via environment files in `/etc/mistralrs/`: ```bash # copy the example config -sudo cp /etc/mistralrs/cuda13.conf.example /etc/mistralrs/mymodel.conf +sudo cp /etc/mistralrs/default.conf.example /etc/mistralrs/mymodel.conf # edit MISTRALRS_ARGS, HF_TOKEN, etc. -sudo systemctl start mistralrs-cuda13@mymodel +sudo systemctl start mistralrs@mymodel ``` ## Infrastructure setup @@ -95,25 +98,6 @@ Then update the `RPM_SIGNING_KEY` secret in Gitea with the new subkey. The publi ### 5. Runner prerequisites -#### nvm (for UI builds) - -Runners that build the UI need [nvm](https://github.com/nvm-sh/nvm) installed for the `gitea_runner` user and an `nvm` label in their runner config: - -```bash -sudo -u gitea_runner bash -c 'curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash' -``` - -Then add `nvm` to the labels in `/etc/act_runner/config.yml`: - -```yaml -runner: - labels: - - "fedora-43:host" - - "nvm" -``` - -Restart the runner after changing labels. The `deploy-ui` workflow uses `runs-on: [fedora-43, nvm]` to select runners with Node.js capability. - #### sequoia-sq (for RPM signing) Runners that run the publish job need `sequoia-sq` installed: @@ -124,6 +108,8 @@ sudo dnf install sequoia-sq ## Client setup +### Stable packages + ```bash sudo rpm --import https://rpm.lair.cafe/.gpg sudo tee /etc/yum.repos.d/lair-cafe.repo > /dev/null <<'EOF' @@ -134,7 +120,29 @@ enabled=1 gpgcheck=1 gpgkey=https://rpm.lair.cafe/.gpg EOF -sudo dnf install mistralrs-cuda13 + +# install the package for your GPU generation +sudo dnf install mistralrs-ampere # RTX 3000 series +sudo dnf install mistralrs-ada # RTX 4000 series +sudo dnf install mistralrs-blackwell # RTX 5000 series +``` + +### Unstable (prerelease) packages + +Unstable packages are built from the latest upstream `main` commit and published to a separate repo. The RPM release field uses the Fedora snapshot convention (e.g. `0.8.1-0.1.20260511git1a2b3c4.fc43`) so stable releases automatically supersede any installed prerelease. + +```bash +sudo tee /etc/yum.repos.d/lair-cafe-unstable.repo > /dev/null <<'EOF' +[lair-cafe-unstable] +name=lair.cafe RPM Repository (unstable) +baseurl=https://rpm.lair.cafe/fedora/$releasever/$basearch/unstable/ +enabled=0 +gpgcheck=1 +gpgkey=https://rpm.lair.cafe/.gpg +EOF + +# install from unstable on demand +sudo dnf --enablerepo=lair-cafe-unstable install mistralrs-ada ``` ## Forcing a rebuild @@ -143,7 +151,7 @@ To force a rebuild of an already-published RPM (e.g. after a packaging change), ```bash ssh oolon " - sudo rm /var/www/rpm/fedora/43/x86_64/mistralrs-cuda13--1.fc43.x86_64.rpm \ + sudo rm /var/www/rpm/fedora/43/x86_64/mistralrs-ada--1.fc43.x86_64.rpm \ && cd /var/www/rpm/fedora/43/x86_64 \ && sudo createrepo_c --update .; " @@ -155,7 +163,7 @@ Do not delete the RPM without running `createrepo_c --update` afterwards — thi ## CI secrets -The build-release workflow requires the following secrets: +The build-release and build-prerelease workflows require the following secrets: | Secret | Purpose | |------------------|----------------------------------------------|