cortex

Author	SHA1	Message	Date
rob thijssen	39010c779f	add script/validate-neuron.sh — end-to-end candle harness smoke test Loads a small public Qwen3 GGUF on a target neuron host, fires a deterministic reasoning probe ("What is the capital of France?"), and asserts the response contains 'Paris'. Used to validate the candle harness on a real GPU host before the Stage 7 TP work begins, and as a regression check after future neuron builds. Defaults to beast.hanzalova.internal + Qwen/Qwen3-1.7B-GGUF + Q4_K_M; all three are positional args so the same script tests any node / model combination. Polls /models after triggering the load since /models/load returns once the materialisation is queued, not finished. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 07:58:05 +03:00
rob thijssen	57d7ef8d3c	chore: revert dnf. runner user has no system privs All checks were successful CI / Format (push) Successful in 38s Details CI / Clippy (push) Successful in 2m20s Details CI / Test (push) Successful in 4m42s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details	2026-05-19 07:16:38 +03:00
rob thijssen	0e9671dd7d	fix(ci): drop sudo from dnf install (runner runs as root, no sudo) All checks were successful CI / Format (push) Successful in 36s Details CI / Clippy (push) Successful in 2m13s Details CI / Test (push) Successful in 4m17s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details The act runner container has no sudo binary; the runner user already runs as root inside the container. Existing steps (rpmbuild, gpg, etc) already invoke privileged commands directly without sudo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 07:06:52 +03:00
rob thijssen	e29c9e35f0	fix(ci): ensure rust toolchain present on cuda-13.0 runner The currently-published runner-cuda-13.0 image (gongfoo) is missing rust/cargo despite inheriting from runner-rust. Build-neuron fails immediately with 'cargo: command not found' even though build-cortex on the bare 'rust' runner builds fine. Add a defensive `dnf install rust cargo clippy` step at the top of build-neuron. Idempotent — on a properly-built runner image this is a fast no-op; on the current broken image it installs the toolchain in a few seconds. The runner image itself should be rebuilt in gongfoo so this step becomes redundant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 07:04:57 +03:00
rob thijssen	8a2334eacb	deploy: dnf-native version check + lair.cafe repo bootstrap Replaces the string compare of 'git describe --tags' vs the binary's self-reported --version (which lies about prereleases — every 0.1.16-* RPM reports just "0.1.16") with the dnf-native question of "is the installed package current against what the repo offers". Mechanism: - installed_nvr(): rpm -q --qf '%{version}-%{release}' for the resident package, falling back to "(not installed)". Capturing rpm's output through a variable keeps its "package X is not installed" stdout message out of the result on failure. - needs_update(): probes rpm -q first (treats absent as "needs work"), then asks dnf check-update --refresh -q. Other dnf failures collapse into "needs update" so the subsequent install surfaces a real error rather than this check swallowing one silently. - ensure_lair_repo(): probes for /etc/yum.repos.d/lair-cafe-unstable.repo and adds it with `dnf config-manager addrepo` when missing. The upstream .repo file ships enabled=0 (unstable channel doesn't auto-engage on fetch), so we then run `dnf config-manager setopt lair-cafe-unstable.enabled=1` every run — cheap, idempotent. - Cortex and neuron install branches now guard `systemctl stop` with `[ ! -f /usr/lib/systemd/system/...service ] \|\| sudo systemctl stop` so fresh installs (no unit file yet) don't short-circuit the install step under set -e. - dnf output is captured into a variable and only printed (with a [host] prefix per line) on failure, so success stays quiet and failures show the actual diagnostic instead of being eaten by &> /dev/null. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 18:55:02 +03:00
rob thijssen	aad314cdfa	feat(neuron): graceful unload-on-shutdown via SIGTERM/SIGINT Stage 6 of the candle-native pivot. Adds first-class deactivation: neuron now drains in-flight requests on SIGTERM (systemd stop) or SIGINT (Ctrl-C), then unloads every loaded model before the process exits — releasing CUDA contexts and VRAM cleanly rather than leaving the OS to reclaim them. Mechanism: - startup::shutdown_signal() resolves on either ctrl_c() or a SIGTERM listener. - axum::serve(...).with_graceful_shutdown(shutdown_signal()) stops accepting new connections, lets active requests finish, then returns control to main. - startup::unload_all_models(&registry) iterates list_all_models() and calls unload per entry. Per-model failures are logged warnings; cleanup continues. Empty registry is a fast no-op. - main holds an Arc<NeuronState> reference past axum's lifetime so the registry is still reachable for the unload sweep. data/neuron.service: - TimeoutStopSec=120s — generous bound for big-model unloads before systemd escalates to SIGKILL. - KillSignal=SIGTERM — explicit, matches the handler. Two non-gated tests cover the empty-registry no-op and the no-models- loaded path. Real load-then-unload-on-shutdown is exercised by the cuda-integration test from Stage 2 (which calls unload_model directly) and observable on a real GPU host by stopping the service and watching nvidia-smi. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:58:07 +03:00
rob thijssen	6779b7526a	feat(neuron): load default_models on service activation All checks were successful CI / Format (push) Successful in 34s Details CI / Clippy (push) Successful in 2m13s Details CI / Test (push) Successful in 4m6s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Stage 5 of the candle-native pivot. Adds first-class support for auto-loading a configured set of models when the neuron service activates. Config: - NeuronConfig.default_models: Vec<ModelSpec> (defaults to []). - neuron.example.toml ships a commented [[default_models]] example. Activation flow (crates/neuron/src/startup.rs::load_default_models): - Sequential — VRAM contention makes parallel loads risky. - Per-entry timing logged at info level on success. - Failures logged as warnings; the next entry is still attempted. - An empty list short-circuits without log noise. Called from main.rs after the registry is built and before the axum listener binds, so /models reflects the loaded state from the very first request. data/neuron.service gains TimeoutStartSec=1800s. With activation blocked on potentially slow first-time HF downloads + GGUF materialisation, systemd's default 90s would kill larger model loads mid-flight. Two non-gated tests in tests/activation.rs cover the continues-past-failure and empty-list paths using a synthetically unknown harness name to fail loads fast without touching the network. The cuda-integration test from earlier stages still exercises the real load/unload lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:56:08 +03:00
rob thijssen	84f5662df1	feat(neuron): OpenAI-compatible SSE streaming chat completions Stage 4 of the candle-native pivot. /v1/chat/completions now switches to text/event-stream when the request sets stream: true, emitting one chat.completion.chunk per generated token followed by the OpenAI [DONE] terminator. Pipeline: - chat_completion_stream creates a bounded mpsc::channel<ChatCompletionChunk>(32), sends the leading role chunk, then spawns a blocking task that acquires the per-model arch lock and runs the streaming generation loop. - run_inference_streaming tracks a cumulative decoded prefix so each chunk's delta.content is the substring added since the last chunk — safe across BPE byte-fallback boundaries that would otherwise split multi-byte UTF-8 chars. - The blocking task aborts cleanly if blocking_send fails (client disconnected), so generation stops when the SSE consumer hangs up. - Final chunk carries finish_reason ("stop" on EOS, "length" on max_tokens). The handler appends data: [DONE] after the channel closes. The Stage 3 streaming 501 placeholder test is repurposed: with the streaming path live, an unloaded model now hits the same 404 surface as the non-streaming path (the model lookup happens first). cortex-gateway's existing proxy is unchanged — it already forwards SSE bytes verbatim from Phase 2 work, so the candle SSE format passes through unmodified. Neuron Cargo.toml gains futures + tokio-stream (both already in workspace deps) for ReceiverStream and stream combinators. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:53:14 +03:00
rob thijssen	249c9442e8	chore: track deployment script All checks were successful CI / Format (push) Successful in 37s Details CI / Clippy (push) Successful in 2m2s Details CI / Test (push) Successful in 3m59s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details	2026-05-18 17:50:35 +03:00
rob thijssen	5e17081fb4	ci(prerelease): drop redundant rustup install step The build-cortex and build-neuron jobs were running a copied-from- mistralrs rustup install step. Both jobs use runner images that already provide rust via dnf: - runner-rust installs rust/cargo/clippy/rustfmt directly. - runner-cuda-13.0 extends runner-rust. Running 'rustup update stable' on top would install a parallel rustup-managed toolchain and shadow the dnf one — confusing and unnecessary. The existing ci.yml already trusts the dnf toolchain without any install step, so match that behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:47:29 +03:00
rob thijssen	03bed93fee	add asset/manifest.yml describing fleet hosts and neuron flavours All checks were successful CI / Format (push) Successful in 28s Details CI / Clippy (push) Successful in 2m54s Details CI / Test (push) Successful in 5m37s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Adds a single source of truth for which hosts run cortex vs neuron and which CUDA compute-capability flavour each neuron host needs: cortex : hanzalova.internal neurons : beast → helexa-neuron-blackwell (2x RTX 5090, sm_120) benjy → helexa-neuron-ada (RTX 4090, sm_89) quadbrat → helexa-neuron-ampere (RTX 3060, sm_86) script/deploy.sh (gitignored, local-only) is updated locally to read hosts and flavours from this manifest and dnf install the correct helexa-neuron-<flavour> package per host. Using 'dnf install --refresh --allowerasing' lets it swap out the previous bare helexa-neuron RPM or a different flavour without manual intervention; the spec Conflicts: clauses keep at most one flavour resident. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:37:14 +03:00
rob thijssen	4a5211d830	ci(prerelease): add ampere flavour alongside ada and blackwell Adds ampere (CUDA compute capability sm_86) to both the build-neuron and package-neuron matrices, so helexa-neuron-ampere RPMs are built and published alongside helexa-neuron-ada and helexa-neuron-blackwell. The prerelease spec already lists ampere in its Conflicts: clause, so no spec change is needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:28:19 +03:00
rob thijssen	6d2dc5ff1a	fix(ci): give fmt/clippy/test distinct CARGO_TARGET_DIR to avoid races After the candle deps were added, cargo builds run long enough that the parallel fmt/clippy/test jobs (all on the `rust` runner label, which appears to use act in host-executor mode) start racing each other's intermediate temp files under /root/.cache/act/<hash>/hostexecutor/target/debug/deps/ Concretely the test job hit: error: No such file or directory at path "target/debug/deps/.tmprlicL7" Compiling unicode-ident because another job's cargo invocation cleaned up the temp file mid-compile. fmt and clippy happened to finish without their own target races landing fatally, so only test failed visibly. Set CARGO_TARGET_DIR=target-${{ github.job }} at the workflow level so each job writes to its own target directory. sccache still backs the actual rustc cache, so the rebuild penalty is just metadata not full recompiles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:26:29 +03:00
rob thijssen	b713dbe669	fix(ci): pass GPG secrets via env to avoid Gitea log leakage Some checks failed CI / Format (push) Successful in 28s Details CI / Test (push) Failing after 43s Details CI / Clippy (push) Successful in 2m9s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details The previous "Import signing key" step inlined ${{ secrets.RPM_SIGNING_KEY }} and ${{ secrets.RPM_SIGNING_KEY_ID }} directly into the run: block. Template expansion writes the literal secret value into the rendered shell script, and Gitea logs the rendered script — Gitea's masker may not reliably scrub multi-line keys, so values can leak. Move both secrets into the step's env: block (the same pattern the "Set up SSH" step already uses) and reference $VARs in the script. The script body now contains only variable names; the secret values live in the process environment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:13:52 +03:00
rob thijssen	5c957d08ec	ci: add build-prerelease workflow for CUDA RPMs on rpm.lair.cafe Some checks failed CI / Format (push) Successful in 36s Details CI / Test (push) Failing after 53s Details CI / Clippy (push) Successful in 2m35s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Adds a manually-triggered workflow that builds CUDA-flavoured neuron binaries and a CPU cortex binary, packages them as Fedora RPMs, signs them, and rsyncs to the unstable channel at https://rpm.lair.cafe/fedora/43/x86_64/unstable/. Mirrors the build pipeline used by grenade/mistralrs-package. Pipeline: - prepare: derive {version,short_sha,commit_date} from the checkout; the prerelease Release stamp "0.1.YYYYMMDDgitSHORTSHA" sorts below the eventual "1" stable release. - build-cortex: cargo build --release -p cortex-cli on a rust runner. - build-neuron: matrix over ada (sm_89) and blackwell (sm_120) on cuda-13.0 runners; cargo build with features "cuda cudnn flash-attn" and CUDA_COMPUTE_CAP set per flavour. - package-{cortex,neuron}: rpmbuild on the rpm runner against the new prebuilt-binary specs in rpm/. - publish: import signing key, sign RPMs, rsync to oolon, createrepo_c --update, then regenerate packages.json for the UI. New specs are prebuilt-binary variants — they consume the artifact from the build job rather than running cargo at rpmbuild time. Each helexa-neuron-{flavour} package Conflicts with the other flavours and with helexa-neuron (the future source-build stable package) so one flavour is installed at a time on a given host. neuron crate gains cudnn and flash-attn feature flags forwarding to the corresponding candle features, so the CI build command compiles those kernels into the binary. sccache is intentionally NOT used in the prerelease jobs — CUDA compute cap isn't in its cache key, so flavours would mis-hit each other. Each prerelease build is a clean cargo build. Required Gitea secrets (already in place for cortex.spec / COPR workflow): - RPM_SIGNING_KEY, RPM_SIGNING_KEY_ID - RSYNC_SSH_KEY Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:01:35 +03:00
rob thijssen	729317d1ef	feat(neuron): OpenAI-compatible non-streaming chat completion Stage 3 of the candle-native pivot. neuron now serves POST /v1/chat/completions backed by candle's quantized_qwen3 forward pass on a per-model serialised generation loop, returning the standard OpenAI ChatCompletionResponse envelope. Pipeline per request: - Look up the LoadedModel by request.model (404 if absent). - Apply the Qwen3 chat template across all messages. - Tokenize, then spawn_blocking onto tokio's blocking pool to acquire the per-model arch lock and run prefill + greedy/temperature/top-p sampling via LogitsProcessor. - Stop on <\|im_end\|>/<\|endoftext\|> EOS or max_tokens (finish_reason "stop" vs "length"). - Decode with skip_special_tokens=true, build OpenAI response with prompt/completion/total usage counts. Supporting changes: - HarnessRegistry now stores Arc<dyn Harness> and caches a typed Arc<CandleHarness> so inference routes bypass dyn-Trait dispatch. - LoadedModel.arch becomes Arc<Mutex<ModelArch>> so the lock guard can be moved into spawn_blocking. - NeuronState gains an Option<Arc<CandleHarness>> field for the new inference route. - Typed InferenceError lets the handler map ModelNotLoaded → 404 and other failures → 500 without string-matching anyhow messages. - stream=true returns 501 until Stage 4 wires up SSE. - Two leftover mistral.rs string references in proxy.rs and cortex-cli (missed during the Stage 1 sweep) are corrected here. Three new default-feature tests cover the no-candle 503, model-not- loaded 404, and stream=true 501 paths. The cuda-integration test from Stage 2 still covers real load/unload; a streaming-feature gated test exercising actual generation will arrive with Stage 4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:47:58 +03:00
rob thijssen	5c2bd1a1da	feat(neuron): wire candle harness load/unload via GGUF Stage 2 of the candle-native pivot. Fleshes out CandleHarness with a LoadedModel registry keyed by model_id, hf-hub-backed GGUF download, and Qwen3 quantized weight construction via candle-transformers' quantized_qwen3 module. unload_model drops the entry; Drop on the candle ModelWeights frees device memory. Device selection prefers CUDA (gated behind the new `cuda` feature), falling back to CPU when CUDA is unavailable so default builds work on non-GPU hosts. The candle CUDA toolchain isn't pulled in unless `--features cuda` is passed, keeping CI green on CPU runners. Config gains a [harness.candle] block with an optional hf_cache path. HarnessRegistry::from_configs now takes HarnessSettings so per-harness config flows through. A gated tests/candle_lifecycle.rs exercises real load → list → unload → list-empty when run with `--features cuda-integration` against a host with HF network access. The default-feature test in tests/api.rs covers the wrong-harness rejection path without needing the network. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:02:49 +03:00
rob thijssen	3cccc2c56b	refactor(neuron): cut mistralrs/llamacpp, scaffold candle harness Stage 1 of the candle-native pivot. Replaces the external-process harness model (mistralrs over HTTP, llamacpp placeholder) with an in-process Harness trait whose sole implementation is candle. The trait keeps its shape so future engines slot in additively, but start/stop default to no-ops and HarnessConfig drops endpoint and systemd_unit since no harness needs external supervision. Behaviour is unchanged on the wire: load_model returns a "not implemented yet (Stage 2)" error and list_models is empty. The gateway-side proxy, poller, and router are untouched. CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are marked superseded; the staged plan lives in ~/.claude/plans/create-a-more-aggressive-calm-naur.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:53:04 +03:00
rob thijssen	7f797b0265	ci: parallelise fmt/clippy/test and drop sccache install step All checks were successful CI / Format (push) Successful in 33s Details CI / Clippy (push) Successful in 1m31s Details CI / Test (push) Successful in 2m11s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 13:55:17 +03:00
rob thijssen	5a0360c1d5	ci: use container runner labels for CI jobs Some checks failed CI / Format, lint, build, test (push) Successful in 4m20s Details CI / Build cortex SRPM (push) Has been cancelled Details CI / Build neuron SRPM (push) Has been cancelled Details CI / Publish cortex to COPR (push) Has been cancelled Details CI / Publish neuron to COPR (push) Has been cancelled Details CI / Bump version in source (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 13:29:42 +03:00
rob thijssen	472c0e8737	fix(rpm): ship firewalld service definitions with correct ports Some checks failed CI / Format, lint, build, test (push) Has been cancelled Details CI / Build cortex SRPM (push) Has been cancelled Details CI / Build neuron SRPM (push) Has been cancelled Details CI / Publish cortex to COPR (push) Has been cancelled Details CI / Publish neuron to COPR (push) Has been cancelled Details CI / Bump version in source (push) Has been cancelled Details cortex: opens 31313/tcp (API) and 31314/tcp (metrics) neuron: opens 13131/tcp Installs to /usr/lib/firewalld/services/ so firewall-cmd --add-service=cortex / --add-service=helexa-neuron works out of the box. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 12:52:20 +03:00
Gitea Actions	b9d8e30058	chore: bump version to 0.1.16	2026-04-16 15:04:21 +00:00
rob thijssen	25f75fe552	chore: ignore local deploy script All checks were successful CI / Format, lint, build, test (push) Successful in 1m15s Details CI / Build cortex SRPM (push) Successful in 43s Details CI / Build neuron SRPM (push) Successful in 44s Details CI / Publish cortex to COPR (push) Successful in 7m23s Details CI / Publish neuron to COPR (push) Successful in 15m58s Details CI / Bump version in source (push) Successful in 31s Details v0.1.16	2026-04-16 17:45:25 +03:00
rob thijssen	3f94c50817	chore: move default ports out of common-collision ranges Previous defaults collided with well-trodden infra services and with the Linux ephemeral port range: - cortex API 8000 — common dev-server default (Django, minio UI) - cortex metrics 9100 — Prometheus node_exporter default - neuron API 9090 — Cockpit default on Fedora, Prometheus self Move to helexa-themed palindromic ports, all below Linux's 32768-60999 ephemeral range and not registered to any well-known service: - cortex API 31313 - cortex metrics 31314 - neuron API 13131 Updated places: - cortex.example.toml, neuron.example.toml defaults - default impls in cortex-core and neuron config - cortex-cli --endpoint default for the status subcommand - doc comments citing example URLs - README.md and CLAUDE.md snippets Consumers already on the old ports need a one-line edit in their /etc/cortex/cortex.toml or /etc/neuron/neuron.toml to match; firewall rules and prometheus scrape configs will also need updating. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 17:45:25 +03:00
rob thijssen	3e1fb60076	ci: drop actions/cache for cargo registry and target The cache round-trip (download + unpack) was consistently taking around 6 minutes, noticeably longer than the ~3 minute cold build it was meant to accelerate. Net-negative on CI time — remove it. sccache with the S3 backend still provides dep-level caching at a much lower overhead, so we keep the majority of the cache benefit without paying the actions/cache tarball cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 17:45:25 +03:00
Gitea Actions	9bf987888c	chore: bump version to 0.1.14	2026-04-16 16:57:24 +03:00
rob thijssen	abe4ff7ccc	ci: publish both packages to a single helexa/helexa COPR project All checks were successful CI / Format, lint, build, test (push) Successful in 9m50s Details CI / Build neuron SRPM (push) Successful in 43s Details CI / Build cortex SRPM (push) Successful in 48s Details CI / Publish neuron to COPR (push) Successful in 6m14s Details CI / Publish cortex to COPR (push) Successful in 7m53s Details CI / Bump version in source (push) Successful in 31s Details Consolidates the previous helexa/cortex and helexa/helexa-neuron COPR projects into one shared project. Hosts enable a single repo and get access to both packages — cortex for gateway hosts and helexa-neuron for GPU nodes. Reduces the "which copr do I enable on this host" friction, and makes it clear the two packages are parts of the same helexa project suite. CI keeps two independent publish jobs (copr-cortex and copr-neuron) running in parallel; they now both target helexa/helexa with their respective SRPMs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.14	2026-04-16 16:37:47 +03:00
rob thijssen	7c3390a4e1	fix(rpm): rename neuron package to helexa-neuron Fedora's official repos ship a package named `neuron` — the NEURON neural-simulation environment from Yale (see https://src.fedoraproject.org/rpms/neuron). Having our own `neuron` in the helexa COPR caused dnf5 to silently no-op `dnf install neuron` because of the name collision, even with the COPR repo enabled and keys imported. The only workarounds were full NEVRA (`dnf install neuron-0.1.12-1.fc43.x86_64`) or a local file install — neither acceptable for end-users. Rename the RPM package to `helexa-neuron`. Keep binary (/usr/bin/neuron), systemd unit (neuron.service), system user (neuron), and config dir (/etc/neuron) unchanged — those are project-local contexts where the short name is unambiguous. Follows Fedora subpackage-style naming except with a vendor prefix rather than a parent-package prefix, because neuron is an independent package from cortex (installed on different hosts) and neither depends on the other. Changes: - neuron.spec -> helexa-neuron.spec (git rename) - Name: neuron -> helexa-neuron (with comment explaining why) - CI: srpm-neuron job now builds helexa-neuron-VERSION.tar.gz with the matching top-level dir prefix, publishes to helexa/helexa-neuron COPR - CI: bump-version job references helexa-neuron.spec - CLAUDE.md: install instructions updated Old helexa/neuron COPR project can be deleted after the first helexa/helexa-neuron build lands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:37:47 +03:00
rob thijssen	2ff062da0e	ci: commit generated %changelog entries back to main Previously the srpm-* jobs generated a fresh %changelog entry and shipped it to COPR, but the version-stamped spec pushed back to main by the bump-version job only updated the Version: line — not the %changelog section. The result: SRPM and in-tree spec diverged and a fresh clone of the repo showed a perpetually empty changelog. Run the rpm-changelog action in bump-version too. Now the committed specs track the SRPMs: each release leaves a dated %changelog entry in main covering commits since the previous tag, visible in git log and in the repo's spec browser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:37:03 +03:00
Gitea Actions	357f858a29	chore: bump version to 0.1.12	2026-04-16 15:47:21 +03:00
rob thijssen	556e5293dc	fix(rpm): explicitly Provides user(name) to satisfy systemd unit Requires All checks were successful CI / Format, lint, build, test (push) Successful in 2m59s Details CI / Build cortex SRPM (push) Successful in 44s Details CI / Build neuron SRPM (push) Successful in 49s Details CI / Publish neuron to COPR (push) Successful in 8m17s Details CI / Publish cortex to COPR (push) Successful in 9m56s Details CI / Bump version in source (push) Successful in 30s Details Diagnosing the persistent "Nothing to do" on v0.1.10 surfaced that removing %attr(,,name) from %files wasn't enough. systemd-rpm-macros ships its own rpm dep generator (/usr/lib/rpm/systemd.req) that parses User=/Group= directives from every .service file the package ships and emits Requires: user(NAME)/group(NAME) accordingly. Rpmbuild log from v0.1.10 shows these Requires are still emitted even after the %attr removal. Meanwhile the sysusers provides-generator emits group(NAME) in both unversioned and versioned forms, but only a versioned user(NAME) = <base64> when the u-line has GECOS/home/shell fields. The asymmetry leaves Requires: user(NAME) unresolvable. Add explicit Provides: user(NAME) back to both specs, with a comment documenting the actual cause (systemd unit parsing, not file attrs) so the next person touching these specs doesn't repeat the mistake. Why monsoon didn't hit this: it creates its user in %pre via groupadd/useradd (not sysusers.d), so no Provides are generated at all — matching the Requires: user(monsoon) by luck of the rpm solver treating unknown symbols as soft-fails for that path. Ours went through the sysusers Provides code path and hit the asymmetry instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.12	2026-04-16 15:32:51 +03:00
rob thijssen	1d90238b01	ci: migrate rpm changelog generation to reusable action Replace the local .gitea/scripts/generate-rpm-changelog.sh with the shared composite action at https://git.lair.cafe/actions/rpm-changelog@v1. Behaviour is identical — collect commits since the previous v* tag, filter bump-version and merge noise, prepend a dated entry to the spec — but the logic now lives in one place that other projects can consume. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:32:51 +03:00
rob thijssen	d99b25fb8a	ci: auto-generate rpm changelog entry per release On every tag push, build a %changelog entry from the git log since the previous v* tag and prepend it to each spec. Stops the initial entry from drifting further and catches bogus-date / stale-version warnings automatically since the generated date always matches the day the CI runs. The generator drops "chore: bump version" commits (bot-authored, noisy in user-facing changelogs) and merge commits. Author defaults to the gitea-actions identity but can be overridden via CHANGELOG_AUTHOR env var if a human release is desired. Requires fetch-depth: 0 on checkout so git describe can see prior tags and git log can reach them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:32:51 +03:00
rob thijssen	034da319f1	fix(rpm): correct weekday in changelog entry April 15 2026 was a Wednesday, not Tuesday. rpmbuild validates the day-of-week against the date and warns on mismatch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:32:51 +03:00
Gitea Actions	7ece281617	chore: bump version to 0.1.10	2026-04-16 15:06:18 +03:00
rob thijssen	3bb5b3c425	fix(rpm): drop %attr(,,user) on config files to avoid dnf silent filter All checks were successful CI / Format, lint, build, test (push) Successful in 1m11s Details CI / Publish cortex to COPR (push) Successful in 11m3s Details CI / Build cortex SRPM (push) Successful in 43s Details CI / Build neuron SRPM (push) Successful in 43s Details CI / Publish neuron to COPR (push) Successful in 8m56s Details CI / Bump version in source (push) Successful in 30s Details Using %attr(,,cortex) / %attr(,,neuron) on config files caused rpm's auto-dep-generator to emit Requires: user(name) and group(name) on each package. When those Requires couldn't be resolved — whether due to sysusers Provides mismatches, missing GPG keys, or dnf5 cache state — dnf5 silently filtered the package out of the candidate set and reported "Nothing to do" rather than an unsatisfied-dep error. Adopt the pattern that already works reliably across our infra (grenade/monsoon): ship config files as default root:root with 0644 perms, don't declare user/group ownership in the rpm file list. systemd-sysusers still creates the service user via the shipped sysusers.d file; the service drops to that user at runtime via the User= directive in the unit. This removes the user(cortex)/user(neuron) Requires entirely, which is the root cause of the dnf5 filtering. File permission tightening can be reintroduced later — either via a separate secrets file with different mode bits, or by moving secret material to /var/lib/<svc>/ where the service drop-privileges account already has write access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.10	2026-04-16 14:50:17 +03:00
Gitea Actions	9fa51ad874	chore: bump version to 0.1.8	2026-04-16 10:56:07 +00:00
rob thijssen	9697fbae73	fix(neuron): run service as neuron user, not cortex All checks were successful CI / Format, lint, build, test (push) Successful in 2m22s Details CI / Build cortex SRPM (push) Successful in 43s Details CI / Build neuron SRPM (push) Successful in 43s Details CI / Publish neuron to COPR (push) Successful in 8m49s Details CI / Publish cortex to COPR (push) Successful in 11m22s Details CI / Bump version in source (push) Successful in 31s Details neuron and cortex are independent packages installable on different hosts. Having neuron run under a 'cortex' system user implied a shared identity that doesn't exist. Give neuron its own user/group. - New data/neuron-sysusers.conf declares the neuron user/group with home /var/lib/neuron. - systemd unit User/Group changed to neuron. - Spec file attrs, explicit Provides, and %sysusers_create_compat updated to reference the neuron user. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.8	2026-04-16 13:32:36 +03:00
Gitea Actions	2ce1060cb8	chore: bump version to 0.1.7	2026-04-16 13:25:34 +03:00
rob thijssen	142e91c3f7	fix(neuron): install config at /etc/neuron/, not /etc/cortex/ All checks were successful CI / Format, lint, build, test (push) Successful in 4m45s Details CI / Build neuron SRPM (push) Successful in 44s Details CI / Build cortex SRPM (push) Successful in 45s Details CI / Publish neuron to COPR (push) Successful in 8m52s Details CI / Publish cortex to COPR (push) Successful in 11m17s Details CI / Bump version in source (push) Successful in 30s Details The neuron package was shipping its config at /etc/cortex/neuron.toml, which implied a shared config directory between two independent packages. Move to /etc/neuron/neuron.toml — neuron owns its own etc dir, consistent with its own /usr/lib/sysusers.d/neuron.conf and /usr/lib/systemd/system/neuron.service. Updated the systemd unit's ExecStart path and the example toml header to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.7	2026-04-16 13:07:06 +03:00
Gitea Actions	52c8b4c983	chore: bump version to 0.1.5	2026-04-16 13:01:42 +03:00
rob thijssen	4a9a4fc775	ci: migrate copr publish to reusable action All checks were successful CI / Format, lint, build, test (push) Successful in 1m26s Details CI / Build neuron SRPM (push) Successful in 45s Details CI / Build cortex SRPM (push) Successful in 44s Details CI / Publish neuron to COPR (push) Successful in 8m22s Details CI / Publish cortex to COPR (push) Successful in 11m0s Details CI / Bump version in source (push) Successful in 30s Details Replace the in-repo .gitea/scripts/copr-build.sh and per-job copr-cli configuration with the shared composite action at https://git.lair.cafe/actions/copr-publish@v1. Behaviour is identical — submit, watch, dump per-chroot logs — but the logic now lives in a single place that other projects can consume. Removes the actions/checkout step from both COPR jobs since the build script is no longer local to this repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.5	2026-04-16 12:34:39 +03:00
rob thijssen	53a3c1e157	fix(rpm): explicitly Provides user(cortex)/group(cortex) All checks were successful CI / Format, lint, build, test (push) Successful in 57s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details dnf5 was silently rejecting neuron-0.1.3 with "Nothing to do" because it had an unresolvable Requires. Inspection showed: Requires: user(cortex) ← unversioned Provides: user(cortex) = <base64> ← versioned only, no unversioned rpm's sysusers provides-generator only emits the unversioned user() provide when the u-line is minimal. Our sysusers.conf specifies GECOS, home dir, and shell, which pushes the generator to versioned-only. The matching Requires (auto-generated from %attr(,,cortex) on config files) is unversioned, so resolution failed silently. Explicitly declare Provides: user(cortex) and Provides: group(cortex) to guarantee the unversioned forms exist. group(cortex) was already emitted unversioned but adding it for symmetry and to protect against future generator changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:06:05 +03:00
rob thijssen	5c7d63c658	ci: dump COPR per-chroot build logs to CI output Previously the COPR publish steps only surfaced copr-cli's status updates (pending/importing/running). When a build failed, diagnosing required clicking through to the COPR web UI. Now we submit with --nowait, watch the build, then use copr-cli download-build to fetch each chroot's builder-live.log and cat them as collapsible ::group:: blocks in the CI output. Logic is factored into .gitea/scripts/copr-build.sh so cortex and neuron jobs share it. Both COPR jobs now check out the repo to access the script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:06:05 +03:00
Gitea Actions	f161412f91	chore: bump version to 0.1.3	2026-04-16 11:41:11 +03:00
rob thijssen	ba5020138f	fix(rpm): rename sysusers files to match package names All checks were successful CI / Format, lint, build, test (push) Successful in 3m35s Details CI / Build cortex SRPM (push) Successful in 1m46s Details CI / Build neuron SRPM (push) Successful in 1m41s Details CI / Publish cortex to COPR (push) Successful in 7m14s Details CI / Publish neuron to COPR (push) Successful in 5m44s Details CI / Bump version in source (push) Successful in 30s Details cortex-gateway.conf/cortex-neuron.conf implied a hierarchy or coupling that doesn't exist — cortex and neuron are independent packages. Each package's sysusers.d file now matches the package name: cortex ships cortex.conf, neuron ships neuron.conf. Content is still identical (both create the cortex system user/group), and filenames remain distinct so the packages can coinstall. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.3	2026-04-16 11:20:08 +03:00
rob thijssen	209150771e	fix(rpm): use sysusers.d for cortex user/group creation Both packages set %attr(...,cortex) on their config files, which caused RPM's auto-dep-generator to emit Requires: group(cortex) / user(cortex). The %pre scriptlets that actually created the group ran too late — dnf rejected neuron installation on hosts without cortex because nothing Provided group(cortex). Switch to systemd-sysusers declarative user creation: each package ships its own named sysusers.d file (cortex-gateway.conf and cortex-neuron.conf — different names so both packages can coinstall) with identical content defining the cortex user/group. RPM's user/group dep generator now emits Provides: user(cortex) and Provides: group(cortex) automatically from the sysusers.d files, satisfying the auto-generated Requires. Either package installs standalone; both can coinstall on the gateway host if desired. Also added Requires: systemd since %sysusers_create_compat depends on systemd-sysusers being present on the target. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 11:18:37 +03:00
Gitea Actions	7c60af3464	chore: bump version to 0.1.2	2026-04-16 11:03:29 +03:00
rob thijssen	ada76b0153	fix(rpm): add missing native build dependencies All checks were successful CI / Format, lint, build, test (push) Successful in 4m34s Details CI / Build neuron SRPM (push) Successful in 1m49s Details CI / Build cortex SRPM (push) Successful in 44s Details CI / Publish cortex to COPR (push) Successful in 7m14s Details CI / Publish neuron to COPR (push) Successful in 5m43s Details CI / Bump version in source (push) Successful in 52s Details COPR build failed on openssl-sys because openssl headers were not available in the mock chroot. Adding: - pkgconfig(openssl): fixes the immediate openssl-sys failure. Kept as a build dep because we plan to add optional mTLS between cortex and neuron, which requires native-tls/openssl at build time. - cmake, gcc-c++: aws-lc-sys (pulled via rustls) compiles libcrypto via cmake and includes C++ sources. Would be the next failure after openssl. - perl-interpreter: catchall for -sys crate build scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.2	2026-04-16 10:49:20 +03:00
rob thijssen	15ded3a5bd	ci: cache target/, disable incremental, drop redundant build Three complementary tweaks to close the gap sccache alone can't: - CARGO_INCREMENTAL=0: reclaims the 17 incremental-mode cache misses per run and prevents cargo from writing incremental fingerprints that defeat sccache. Incremental mode is useless in CI anyway since each run starts from scratch. - actions/cache for ~/.cargo and target/: sidesteps sccache's structural limits (proc-macro non-cacheables, clippy-vs-rustc separate namespaces) by caching the whole build output keyed on Cargo.lock. Also caches ~/.cargo/bin so the installed sccache binary survives between runs. - Drop the separate 'cargo build' step: 'cargo test --workspace' builds everything anyway, so the standalone build was a full redundant workspace compile pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 09:44:45 +03:00

1 2

70 Commits