cortex

Author	SHA1	Message	Date
rob thijssen	62ca125a68	chore: keep models.example.toml generic; deploy.sh sync's local models.toml Some checks failed build-prerelease / Resolve version stamps (push) Successful in 34s Details CI / Format (push) Successful in 40s Details CI / Clippy (push) Successful in 2m22s Details CI / Test (push) Successful in 4m31s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m28s Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package cortex RPM (push) Has started running Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-blackwell (push) Has been cancelled Details Reverts the previous commit's naming of specific helexa neuron hosts in the shipped example catalogue (`models.example.toml`) — the example is supposed to be a generic starting point that any operator copies and adapts, not a record of one particular fleet's layout. - `pinned_on` in the TP example uses the placeholder `"your-multi-gpu-neuron"`. Other entries keep the model ids (since those are HuggingFace-canonical, not fleet-specific). - New `models.toml` at repo root holds the helexa-fleet catalogue (beast / benjy / quadbrat). Added to `.gitignore` alongside `cortex.toml` — both are operator-owned, gitignored, RPM-marked `%config(noreplace)`, and synced by `deploy.sh`. - `deploy.sh` now rsync's `models.toml` to `/etc/cortex/models.toml` on the gateway host on the same lifecycle as `cortex.toml`. Skips cleanly when no local file exists, so users without a catalogue aren't surprised by silent overwrites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:47:08 +03:00
rob thijssen	18ae3c30ee	post-validation cleanup: cuDNN runtime + repetition penalty All checks were successful CI / Format (push) Successful in 34s Details build-prerelease / Resolve version stamps (push) Successful in 35s Details CI / Clippy (push) Successful in 2m17s Details CI / Test (push) Successful in 4m16s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m28s Details build-prerelease / Build neuron-blackwell (push) Successful in 3m42s Details build-prerelease / Package cortex RPM (push) Successful in 1m25s Details build-prerelease / Build neuron-ampere (push) Successful in 4m27s Details build-prerelease / Build neuron-ada (push) Successful in 4m51s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m50s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m40s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 6m52s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 2m32s Details Two followups from the live single-GPU validation pass. 1. deploy.sh now ensures libcudnn.so.9 is available on each neuron host before installing/upgrading the package. Probes ldconfig first so hosts with a manual (tar/runfile) cuDNN install are untouched, then adds NVIDIA's RHEL9 CUDA repo (the Fedora 43 CUDA repo doesn't ship cuDNN; only the RHEL9 one does) and installs libcudnn9-cuda-13. benjy hit "cannot open shared object file: libcudnn.so.9" during validation; this prevents that recurring. 2. candle.rs applies a 1.1 repetition penalty over the last 64 generated tokens before sampling, in both the non-streaming chat_completion path and the streaming chat_completion_stream path. Without it small Q4_K_M models degenerate into "Wait, no, no..." loops once they hit a confident-but-wrong path; with it sampling stays coherent. Defaults match mistral.rs and llama.cpp; exposing the value via the OpenAI request (frequency/presence penalty mapping) is Stage 8 territory. Both routes through a new sample_with_penalty() helper so future sampling tweaks land in one place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 14:48:08 +03:00
rob thijssen	1a0400131e	fix(deploy): use dnf upgrade for stale installs, install only when absent All checks were successful CI / Format (push) Successful in 35s Details build-prerelease / Resolve version stamps (push) Successful in 39s Details CI / Clippy (push) Successful in 2m27s Details CI / Test (push) Successful in 4m30s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build neuron-blackwell (push) Successful in 3m29s Details build-prerelease / Build cortex binary (push) Successful in 4m32s Details build-prerelease / Package cortex RPM (push) Successful in 1m20s Details build-prerelease / Build neuron-ampere (push) Successful in 5m15s Details build-prerelease / Build neuron-ada (push) Successful in 4m51s Details build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m48s Details build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m47s Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m38s Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 57s Details dnf5's `dnf install <pkg>` is a no-op when the package is already installed at ANY version — it does NOT auto-upgrade to the latest available. The deploy script's install branch was therefore silently leaving hosts on older builds even though needs_update correctly reported an upgrade was available. Add an is_installed() probe and an install_or_upgrade() helper that picks the right verb: `dnf install` when fresh, `dnf upgrade` when stale. Captured combined-stream output is exposed via __DNF_OUTPUT__ for the existing failure-diagnostic path. Verified end-to-end against the live fleet: hanzalova/beast/benjy/ quadbrat all upgraded cleanly from prior prerelease NVRs to 0.1.16-0.1.20260519134302.git1866b99.fc43, validation script returned "Paris" from all three neurons. Followup (not in this commit): all hosts running helexa-neuron-* need libcudnn.so.9 available at runtime. Currently: - quadbrat: libcudnn9-cuda-13 RPM (rhel9 CUDA repo) - beast: /usr/lib64/libcudnn.so.9 (manual install) - benjy: needed rhel9 CUDA repo added + libcudnn9-cuda-13 installed as part of this validation pass. The spec currently excludes cuDNN from auto-detected deps. Should add a Recommends:libcudnn9-cuda-13 (soft) and ensure the rhel9 CUDA repo is configured on each neuron host, similar to how ensure_lair_repo handles the unstable channel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 14:10:48 +03:00
rob thijssen	8a2334eacb	deploy: dnf-native version check + lair.cafe repo bootstrap Replaces the string compare of 'git describe --tags' vs the binary's self-reported --version (which lies about prereleases — every 0.1.16-* RPM reports just "0.1.16") with the dnf-native question of "is the installed package current against what the repo offers". Mechanism: - installed_nvr(): rpm -q --qf '%{version}-%{release}' for the resident package, falling back to "(not installed)". Capturing rpm's output through a variable keeps its "package X is not installed" stdout message out of the result on failure. - needs_update(): probes rpm -q first (treats absent as "needs work"), then asks dnf check-update --refresh -q. Other dnf failures collapse into "needs update" so the subsequent install surfaces a real error rather than this check swallowing one silently. - ensure_lair_repo(): probes for /etc/yum.repos.d/lair-cafe-unstable.repo and adds it with `dnf config-manager addrepo` when missing. The upstream .repo file ships enabled=0 (unstable channel doesn't auto-engage on fetch), so we then run `dnf config-manager setopt lair-cafe-unstable.enabled=1` every run — cheap, idempotent. - Cortex and neuron install branches now guard `systemctl stop` with `[ ! -f /usr/lib/systemd/system/...service ] \|\| sudo systemctl stop` so fresh installs (no unit file yet) don't short-circuit the install step under set -e. - dnf output is captured into a variable and only printed (with a [host] prefix per line) on failure, so success stays quiet and failures show the actual diagnostic instead of being eaten by &> /dev/null. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 18:55:02 +03:00
rob thijssen	249c9442e8	chore: track deployment script All checks were successful CI / Format (push) Successful in 37s Details CI / Clippy (push) Successful in 2m2s Details CI / Test (push) Successful in 3m59s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details	2026-05-18 17:50:35 +03:00

5 Commits