Commit Graph

6 Commits

Author SHA1 Message Date
7557c5e877 ci: cut iteration latency — change-aware builds, gated deploys, dev fast path
Some checks failed
build-prerelease / Build neuron-blackwell (push) Blocked by required conditions
build-prerelease / Resolve version stamps + change detection (push) Successful in 28s
build-prerelease / Test (push) Failing after 1m16s
build-prerelease / Lint (fmt + clippy) (push) Successful in 3m7s
build-prerelease / Build cortex binary (push) Successful in 3m57s
build-prerelease / Build neuron-ampere (push) Has been cancelled
build-prerelease / Build neuron-ada (push) Has been cancelled
build-prerelease / Package cortex RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
Push-to-testable was ~20.5 min for every commit (measured on the
2026-06-08 green chain) plus a ~5 min 27B cold-load, regardless of
what changed. Three structural fixes:

- build-prerelease: a change-detection step in `prepare` diffs HEAD
  against the git sha embedded in the last *published* unstable RPM
  (per package, from packages.json) and skips builds whose inputs
  didn't change. Docs-only commits build nothing; gateway-only
  commits skip the 3 CUDA flavour builds. Detection failures fall
  open to a full build.
- ci.yml no longer runs on pushes to main; fmt/clippy/test live in
  build-prerelease as parallel jobs gating publish. The two workflows
  previously queued against each other on the same runner labels,
  delaying the cortex build ~12 min. Branches, PRs, and tags keep the
  full ci.yml gate.
- deploy: each host self-gates with `dnf check-update` and leaves the
  service untouched when the installed package is already current —
  no more neuron restarts (and 27B cold-loads) for commits that
  didn't change neuron.
- deploy-dev (new): manual single-host fast path — build one CUDA
  flavour, scp the binary, restart the service. Skips packaging,
  signing, publish, and dnf entirely. Backed by a new exact-form
  sudoers rule in asset/sudoers.d/neuron-host.conf (already applied
  to all three hosts).

Expected loop times when runners behave: docs ≈ 1 min (nothing
deploys), gateway-only ≈ 6-8 min, single-neuron dev ≈ 8-10 min,
full fleet ≈ 13-15 min.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 13:17:22 +03:00
ea1fdf8aa6 chore(deploy): drop deploy.sh and manifest.yml now that workflow runs
First end-to-end run of the deploy workflow succeeded (gitea run #289),
so the operator-run rolling-deploy script and its YAML manifest are no
longer the source of truth — fleet topology lives in
.gitea/workflows/deploy.yml and per-host config in script/infra-setup.sh.

Per-host neuron config comments updated to point at the new sync path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 16:41:04 +03:00
5c520c7e90 feat(deploy): gitea workflow for rolling RPM deploys + host bootstrap
Replace operator-run script/deploy.sh with a CI-driven rolling deploy:

- .gitea/workflows/deploy.yml fires on build-prerelease success (and is
  re-runnable via workflow_dispatch). Cortex upgrades first on
  hanzalova.internal; the three neuron hosts upgrade in parallel under
  fail-fast: false so one failing host doesn't sink the rest.
  Concurrency-grouped to serialize overlapping deploys, never cancelling
  in-flight runs (a half-applied dnf transaction is worse than a stale
  deploy).

- asset/sudoers.d/{cortex,neuron}-host.conf are the canonical source for
  the scoped privileges gitea_ci needs on each host kind, installed as
  /etc/sudoers.d/helexa_gitea_ci. URLs and = signs are backslash-escaped
  per sudoers reserved-character rules.

- script/infra-setup.sh idempotently provisions the gitea_ci user,
  installs the runner pubkey, drops in the appropriate sudoers fragment
  with visudo verification, and syncs cortex.toml / models.toml /
  per-host asset/neuron/<short>.toml — config still ships from operator
  workstations rather than CI because the first two are gitignored.

The CI-only secret is RSYNC_SSH_KEY (already configured for the repo);
the matching pubkey is ~/.ssh/id_gitea_ci.pub on the operator's box.

script/deploy.sh and asset/manifest.yml are left in place until the
first end-to-end deploy workflow run succeeds, then removed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 14:58:23 +03:00
740299bd9d chore(neuron/beast): switch default-model quant from q5k to q6k
Some checks failed
CI / Format (push) Successful in 35s
build-prerelease / Resolve version stamps (push) Successful in 39s
CI / Clippy (push) Successful in 2m22s
build-prerelease / Build neuron-blackwell (push) Successful in 3m35s
CI / Test (push) Successful in 5m8s
build-prerelease / Build cortex binary (push) Successful in 4m34s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Package cortex RPM (push) Successful in 1m16s
build-prerelease / Build neuron-ampere (push) Successful in 5m12s
build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled
build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled
build-prerelease / Build neuron-ada (push) Has been cancelled
q5k produced NaN logits on Qwen/Qwen3.6-27B under candle TP=2 (sampler
fell over with "logits unhealthy nan: 248320/248320"). q6k is the
quant that worked well in production under mistral.rs on the same
hardware, so it's the right baseline for verifying the mempool-trim
fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 12:36:18 +03:00
d3f2d50749 feat(deploy): per-host neuron config + pre-warm headline models
All checks were successful
CI / Format (push) Successful in 39s
build-prerelease / Resolve version stamps (push) Successful in 40s
CI / Clippy (push) Successful in 2m17s
CI / Test (push) Successful in 4m57s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 3m50s
build-prerelease / Build cortex binary (push) Successful in 4m52s
build-prerelease / Package cortex RPM (push) Successful in 1m22s
build-prerelease / Build neuron-ampere (push) Successful in 5m13s
build-prerelease / Build neuron-ada (push) Successful in 5m14s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m53s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m55s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s
Adds asset/neuron/{beast,benjy,quadbrat}.toml — per-host neuron.toml
files keyed by the first dot-component of the host. deploy.sh now
rsyncs the matching file to /etc/neuron/neuron.toml on each neuron and
stops+starts the service so default_models is re-read.

Headline model per host (drives /v1/models output immediately after a
clean deploy):

  beast     Qwen/Qwen3.6-27B  (q5k, tp=2, devices=[0,1])
  benjy     Qwen/Qwen3-8B     (bf16, devices=[0])
  quadbrat  Qwen/Qwen3-1.7B   (bf16, devices=[0])

Removes the need to follow deploy.sh with `validate-neuron.sh beast
Qwen/Qwen3.6-27B q5k 2` to surface the 27B in the catalogue — the
neuron loads it itself on activation.

The neuron loop now mirrors the cortex flow (stop → install/upgrade →
sync config → start) so config-only changes pick up on subsequent
deploys; previously a no-package-change deploy would silently leave
the host on the old default_models.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 14:05:54 +03:00
03bed93fee add asset/manifest.yml describing fleet hosts and neuron flavours
All checks were successful
CI / Format (push) Successful in 28s
CI / Clippy (push) Successful in 2m54s
CI / Test (push) Successful in 5m37s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
Adds a single source of truth for which hosts run cortex vs neuron
and which CUDA compute-capability flavour each neuron host needs:

  cortex   : hanzalova.internal
  neurons  :
    beast      → helexa-neuron-blackwell  (2x RTX 5090, sm_120)
    benjy      → helexa-neuron-ada        (RTX 4090,    sm_89)
    quadbrat   → helexa-neuron-ampere     (RTX 3060,    sm_86)

script/deploy.sh (gitignored, local-only) is updated locally to read
hosts and flavours from this manifest and dnf install the correct
helexa-neuron-<flavour> package per host. Using
'dnf install --refresh --allowerasing' lets it swap out the previous
bare helexa-neuron RPM or a different flavour without manual
intervention; the spec Conflicts: clauses keep at most one flavour
resident.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:37:14 +03:00