Commit Graph

68 Commits

Author SHA1 Message Date
0e9671dd7d fix(ci): drop sudo from dnf install (runner runs as root, no sudo)
All checks were successful
CI / Format (push) Successful in 36s
CI / Clippy (push) Successful in 2m13s
CI / Test (push) Successful in 4m17s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
The act runner container has no sudo binary; the runner user already
runs as root inside the container. Existing steps (rpmbuild, gpg, etc)
already invoke privileged commands directly without sudo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 07:06:52 +03:00
e29c9e35f0 fix(ci): ensure rust toolchain present on cuda-13.0 runner
The currently-published runner-cuda-13.0 image (gongfoo) is missing
rust/cargo despite inheriting from runner-rust. Build-neuron fails
immediately with 'cargo: command not found' even though build-cortex
on the bare 'rust' runner builds fine.

Add a defensive `dnf install rust cargo clippy` step at the top of
build-neuron. Idempotent — on a properly-built runner image this is
a fast no-op; on the current broken image it installs the toolchain
in a few seconds. The runner image itself should be rebuilt in
gongfoo so this step becomes redundant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 07:04:57 +03:00
8a2334eacb deploy: dnf-native version check + lair.cafe repo bootstrap
Replaces the string compare of 'git describe --tags' vs the binary's
self-reported --version (which lies about prereleases — every
0.1.16-* RPM reports just "0.1.16") with the dnf-native question of
"is the installed package current against what the repo offers".

Mechanism:
- installed_nvr(): rpm -q --qf '%{version}-%{release}' for the
  resident package, falling back to "(not installed)". Capturing rpm's
  output through a variable keeps its "package X is not installed"
  stdout message out of the result on failure.
- needs_update(): probes rpm -q first (treats absent as "needs work"),
  then asks dnf check-update --refresh -q. Other dnf failures collapse
  into "needs update" so the subsequent install surfaces a real error
  rather than this check swallowing one silently.
- ensure_lair_repo(): probes for /etc/yum.repos.d/lair-cafe-unstable.repo
  and adds it with `dnf config-manager addrepo` when missing. The
  upstream .repo file ships enabled=0 (unstable channel doesn't
  auto-engage on fetch), so we then run `dnf config-manager setopt
  lair-cafe-unstable.enabled=1` every run — cheap, idempotent.
- Cortex and neuron install branches now guard `systemctl stop` with
  `[ ! -f /usr/lib/systemd/system/...service ] || sudo systemctl stop`
  so fresh installs (no unit file yet) don't short-circuit the install
  step under set -e.
- dnf output is captured into a variable and only printed (with a
  [host]   prefix per line) on failure, so success stays quiet and
  failures show the actual diagnostic instead of being eaten by
  &> /dev/null.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:55:02 +03:00
aad314cdfa feat(neuron): graceful unload-on-shutdown via SIGTERM/SIGINT
Stage 6 of the candle-native pivot. Adds first-class deactivation:
neuron now drains in-flight requests on SIGTERM (systemd stop) or
SIGINT (Ctrl-C), then unloads every loaded model before the process
exits — releasing CUDA contexts and VRAM cleanly rather than leaving
the OS to reclaim them.

Mechanism:
- startup::shutdown_signal() resolves on either ctrl_c() or a
  SIGTERM listener.
- axum::serve(...).with_graceful_shutdown(shutdown_signal()) stops
  accepting new connections, lets active requests finish, then
  returns control to main.
- startup::unload_all_models(&registry) iterates list_all_models()
  and calls unload per entry. Per-model failures are logged warnings;
  cleanup continues. Empty registry is a fast no-op.
- main holds an Arc<NeuronState> reference past axum's lifetime so
  the registry is still reachable for the unload sweep.

data/neuron.service:
- TimeoutStopSec=120s — generous bound for big-model unloads before
  systemd escalates to SIGKILL.
- KillSignal=SIGTERM — explicit, matches the handler.

Two non-gated tests cover the empty-registry no-op and the no-models-
loaded path. Real load-then-unload-on-shutdown is exercised by the
cuda-integration test from Stage 2 (which calls unload_model directly)
and observable on a real GPU host by stopping the service and
watching nvidia-smi.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:58:07 +03:00
6779b7526a feat(neuron): load default_models on service activation
All checks were successful
CI / Format (push) Successful in 34s
CI / Clippy (push) Successful in 2m13s
CI / Test (push) Successful in 4m6s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
Stage 5 of the candle-native pivot. Adds first-class support for
auto-loading a configured set of models when the neuron service
activates.

Config:
- NeuronConfig.default_models: Vec<ModelSpec> (defaults to []).
- neuron.example.toml ships a commented [[default_models]] example.

Activation flow (crates/neuron/src/startup.rs::load_default_models):
- Sequential — VRAM contention makes parallel loads risky.
- Per-entry timing logged at info level on success.
- Failures logged as warnings; the next entry is still attempted.
- An empty list short-circuits without log noise.

Called from main.rs after the registry is built and before the axum
listener binds, so /models reflects the loaded state from the very
first request.

data/neuron.service gains TimeoutStartSec=1800s. With activation
blocked on potentially slow first-time HF downloads + GGUF
materialisation, systemd's default 90s would kill larger model loads
mid-flight.

Two non-gated tests in tests/activation.rs cover the
continues-past-failure and empty-list paths using a synthetically
unknown harness name to fail loads fast without touching the network.
The cuda-integration test from earlier stages still exercises the
real load/unload lifecycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:56:08 +03:00
84f5662df1 feat(neuron): OpenAI-compatible SSE streaming chat completions
Stage 4 of the candle-native pivot. /v1/chat/completions now switches
to text/event-stream when the request sets stream: true, emitting one
chat.completion.chunk per generated token followed by the OpenAI
[DONE] terminator.

Pipeline:
- chat_completion_stream creates a bounded mpsc::channel<ChatCompletionChunk>(32),
  sends the leading role chunk, then spawns a blocking task that
  acquires the per-model arch lock and runs the streaming generation
  loop.
- run_inference_streaming tracks a cumulative decoded prefix so each
  chunk's delta.content is the substring added since the last chunk —
  safe across BPE byte-fallback boundaries that would otherwise split
  multi-byte UTF-8 chars.
- The blocking task aborts cleanly if blocking_send fails (client
  disconnected), so generation stops when the SSE consumer hangs up.
- Final chunk carries finish_reason ("stop" on EOS, "length" on
  max_tokens). The handler appends data: [DONE] after the channel
  closes.

The Stage 3 streaming 501 placeholder test is repurposed: with the
streaming path live, an unloaded model now hits the same 404 surface
as the non-streaming path (the model lookup happens first).

cortex-gateway's existing proxy is unchanged — it already forwards
SSE bytes verbatim from Phase 2 work, so the candle SSE format passes
through unmodified.

Neuron Cargo.toml gains futures + tokio-stream (both already in
workspace deps) for ReceiverStream and stream combinators.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:53:14 +03:00
249c9442e8 chore: track deployment script
All checks were successful
CI / Format (push) Successful in 37s
CI / Clippy (push) Successful in 2m2s
CI / Test (push) Successful in 3m59s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
2026-05-18 17:50:35 +03:00
5e17081fb4 ci(prerelease): drop redundant rustup install step
The build-cortex and build-neuron jobs were running a copied-from-
mistralrs rustup install step. Both jobs use runner images that
already provide rust via dnf:

- runner-rust installs rust/cargo/clippy/rustfmt directly.
- runner-cuda-13.0 extends runner-rust.

Running 'rustup update stable' on top would install a parallel
rustup-managed toolchain and shadow the dnf one — confusing and
unnecessary. The existing ci.yml already trusts the dnf toolchain
without any install step, so match that behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:47:29 +03:00
03bed93fee add asset/manifest.yml describing fleet hosts and neuron flavours
All checks were successful
CI / Format (push) Successful in 28s
CI / Clippy (push) Successful in 2m54s
CI / Test (push) Successful in 5m37s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
Adds a single source of truth for which hosts run cortex vs neuron
and which CUDA compute-capability flavour each neuron host needs:

  cortex   : hanzalova.internal
  neurons  :
    beast      → helexa-neuron-blackwell  (2x RTX 5090, sm_120)
    benjy      → helexa-neuron-ada        (RTX 4090,    sm_89)
    quadbrat   → helexa-neuron-ampere     (RTX 3060,    sm_86)

script/deploy.sh (gitignored, local-only) is updated locally to read
hosts and flavours from this manifest and dnf install the correct
helexa-neuron-<flavour> package per host. Using
'dnf install --refresh --allowerasing' lets it swap out the previous
bare helexa-neuron RPM or a different flavour without manual
intervention; the spec Conflicts: clauses keep at most one flavour
resident.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:37:14 +03:00
4a5211d830 ci(prerelease): add ampere flavour alongside ada and blackwell
Adds ampere (CUDA compute capability sm_86) to both the build-neuron
and package-neuron matrices, so helexa-neuron-ampere RPMs are built
and published alongside helexa-neuron-ada and helexa-neuron-blackwell.

The prerelease spec already lists ampere in its Conflicts: clause, so
no spec change is needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:28:19 +03:00
6d2dc5ff1a fix(ci): give fmt/clippy/test distinct CARGO_TARGET_DIR to avoid races
After the candle deps were added, cargo builds run long enough that
the parallel fmt/clippy/test jobs (all on the `rust` runner label,
which appears to use act in host-executor mode) start racing each
other's intermediate temp files under
  /root/.cache/act/<hash>/hostexecutor/target/debug/deps/

Concretely the test job hit:
  error: No such file or directory at path
  "target/debug/deps/.tmprlicL7"
  Compiling unicode-ident
because another job's cargo invocation cleaned up the temp file
mid-compile. fmt and clippy happened to finish without their own
target races landing fatally, so only test failed visibly.

Set CARGO_TARGET_DIR=target-${{ github.job }} at the workflow level
so each job writes to its own target directory. sccache still backs
the actual rustc cache, so the rebuild penalty is just metadata not
full recompiles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:26:29 +03:00
b713dbe669 fix(ci): pass GPG secrets via env to avoid Gitea log leakage
Some checks failed
CI / Format (push) Successful in 28s
CI / Test (push) Failing after 43s
CI / Clippy (push) Successful in 2m9s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
The previous "Import signing key" step inlined ${{ secrets.RPM_SIGNING_KEY }}
and ${{ secrets.RPM_SIGNING_KEY_ID }} directly into the run: block.
Template expansion writes the literal secret value into the rendered
shell script, and Gitea logs the rendered script — Gitea's masker may
not reliably scrub multi-line keys, so values can leak.

Move both secrets into the step's env: block (the same pattern the
"Set up SSH" step already uses) and reference $VARs in the script.
The script body now contains only variable names; the secret values
live in the process environment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:13:52 +03:00
5c957d08ec ci: add build-prerelease workflow for CUDA RPMs on rpm.lair.cafe
Some checks failed
CI / Format (push) Successful in 36s
CI / Test (push) Failing after 53s
CI / Clippy (push) Successful in 2m35s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
Adds a manually-triggered workflow that builds CUDA-flavoured neuron
binaries and a CPU cortex binary, packages them as Fedora RPMs, signs
them, and rsyncs to the unstable channel at
https://rpm.lair.cafe/fedora/43/x86_64/unstable/. Mirrors the build
pipeline used by grenade/mistralrs-package.

Pipeline:
- prepare: derive {version,short_sha,commit_date} from the checkout;
  the prerelease Release stamp "0.1.YYYYMMDDgitSHORTSHA" sorts below
  the eventual "1" stable release.
- build-cortex: cargo build --release -p cortex-cli on a rust runner.
- build-neuron: matrix over ada (sm_89) and blackwell (sm_120) on
  cuda-13.0 runners; cargo build with features "cuda cudnn flash-attn"
  and CUDA_COMPUTE_CAP set per flavour.
- package-{cortex,neuron}: rpmbuild on the rpm runner against the new
  prebuilt-binary specs in rpm/.
- publish: import signing key, sign RPMs, rsync to oolon, createrepo_c
  --update, then regenerate packages.json for the UI.

New specs are prebuilt-binary variants — they consume the artifact
from the build job rather than running cargo at rpmbuild time. Each
helexa-neuron-{flavour} package Conflicts with the other flavours and
with helexa-neuron (the future source-build stable package) so one
flavour is installed at a time on a given host.

neuron crate gains cudnn and flash-attn feature flags forwarding to
the corresponding candle features, so the CI build command compiles
those kernels into the binary.

sccache is intentionally NOT used in the prerelease jobs — CUDA
compute cap isn't in its cache key, so flavours would mis-hit each
other. Each prerelease build is a clean cargo build.

Required Gitea secrets (already in place for cortex.spec / COPR
workflow):
- RPM_SIGNING_KEY, RPM_SIGNING_KEY_ID
- RSYNC_SSH_KEY

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:01:35 +03:00
729317d1ef feat(neuron): OpenAI-compatible non-streaming chat completion
Stage 3 of the candle-native pivot. neuron now serves
POST /v1/chat/completions backed by candle's quantized_qwen3 forward
pass on a per-model serialised generation loop, returning the standard
OpenAI ChatCompletionResponse envelope.

Pipeline per request:
- Look up the LoadedModel by request.model (404 if absent).
- Apply the Qwen3 chat template across all messages.
- Tokenize, then spawn_blocking onto tokio's blocking pool to acquire
  the per-model arch lock and run prefill + greedy/temperature/top-p
  sampling via LogitsProcessor.
- Stop on <|im_end|>/<|endoftext|> EOS or max_tokens (finish_reason
  "stop" vs "length").
- Decode with skip_special_tokens=true, build OpenAI response with
  prompt/completion/total usage counts.

Supporting changes:
- HarnessRegistry now stores Arc<dyn Harness> and caches a typed
  Arc<CandleHarness> so inference routes bypass dyn-Trait dispatch.
- LoadedModel.arch becomes Arc<Mutex<ModelArch>> so the lock guard
  can be moved into spawn_blocking.
- NeuronState gains an Option<Arc<CandleHarness>> field for the new
  inference route.
- Typed InferenceError lets the handler map ModelNotLoaded → 404 and
  other failures → 500 without string-matching anyhow messages.
- stream=true returns 501 until Stage 4 wires up SSE.
- Two leftover mistral.rs string references in proxy.rs and cortex-cli
  (missed during the Stage 1 sweep) are corrected here.

Three new default-feature tests cover the no-candle 503, model-not-
loaded 404, and stream=true 501 paths. The cuda-integration test from
Stage 2 still covers real load/unload; a streaming-feature gated test
exercising actual generation will arrive with Stage 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:47:58 +03:00
5c2bd1a1da feat(neuron): wire candle harness load/unload via GGUF
Stage 2 of the candle-native pivot. Fleshes out CandleHarness with a
LoadedModel registry keyed by model_id, hf-hub-backed GGUF download,
and Qwen3 quantized weight construction via candle-transformers'
quantized_qwen3 module. unload_model drops the entry; Drop on the
candle ModelWeights frees device memory.

Device selection prefers CUDA (gated behind the new `cuda` feature),
falling back to CPU when CUDA is unavailable so default builds work
on non-GPU hosts. The candle CUDA toolchain isn't pulled in unless
`--features cuda` is passed, keeping CI green on CPU runners.

Config gains a [harness.candle] block with an optional hf_cache path.
HarnessRegistry::from_configs now takes HarnessSettings so per-harness
config flows through.

A gated tests/candle_lifecycle.rs exercises real load → list → unload
→ list-empty when run with `--features cuda-integration` against a
host with HF network access. The default-feature test in tests/api.rs
covers the wrong-harness rejection path without needing the network.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:02:49 +03:00
3cccc2c56b refactor(neuron): cut mistralrs/llamacpp, scaffold candle harness
Stage 1 of the candle-native pivot. Replaces the external-process
harness model (mistralrs over HTTP, llamacpp placeholder) with an
in-process Harness trait whose sole implementation is candle. The
trait keeps its shape so future engines slot in additively, but
start/stop default to no-ops and HarnessConfig drops endpoint and
systemd_unit since no harness needs external supervision.

Behaviour is unchanged on the wire: load_model returns a "not
implemented yet (Stage 2)" error and list_models is empty. The
gateway-side proxy, poller, and router are untouched.

CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are
marked superseded; the staged plan lives in
~/.claude/plans/create-a-more-aggressive-calm-naur.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:53:04 +03:00
7f797b0265 ci: parallelise fmt/clippy/test and drop sccache install step
All checks were successful
CI / Format (push) Successful in 33s
CI / Clippy (push) Successful in 1m31s
CI / Test (push) Successful in 2m11s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 13:55:17 +03:00
5a0360c1d5 ci: use container runner labels for CI jobs
Some checks failed
CI / Format, lint, build, test (push) Successful in 4m20s
CI / Build cortex SRPM (push) Has been cancelled
CI / Build neuron SRPM (push) Has been cancelled
CI / Publish cortex to COPR (push) Has been cancelled
CI / Publish neuron to COPR (push) Has been cancelled
CI / Bump version in source (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 13:29:42 +03:00
472c0e8737 fix(rpm): ship firewalld service definitions with correct ports
Some checks failed
CI / Format, lint, build, test (push) Has been cancelled
CI / Build cortex SRPM (push) Has been cancelled
CI / Build neuron SRPM (push) Has been cancelled
CI / Publish cortex to COPR (push) Has been cancelled
CI / Publish neuron to COPR (push) Has been cancelled
CI / Bump version in source (push) Has been cancelled
cortex: opens 31313/tcp (API) and 31314/tcp (metrics)
neuron: opens 13131/tcp

Installs to /usr/lib/firewalld/services/ so firewall-cmd
--add-service=cortex / --add-service=helexa-neuron works
out of the box.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 12:52:20 +03:00
Gitea Actions
b9d8e30058 chore: bump version to 0.1.16 2026-04-16 15:04:21 +00:00
25f75fe552 chore: ignore local deploy script
All checks were successful
CI / Format, lint, build, test (push) Successful in 1m15s
CI / Build cortex SRPM (push) Successful in 43s
CI / Build neuron SRPM (push) Successful in 44s
CI / Publish cortex to COPR (push) Successful in 7m23s
CI / Publish neuron to COPR (push) Successful in 15m58s
CI / Bump version in source (push) Successful in 31s
v0.1.16
2026-04-16 17:45:25 +03:00
3f94c50817 chore: move default ports out of common-collision ranges
Previous defaults collided with well-trodden infra services and with
the Linux ephemeral port range:

- cortex API     8000 — common dev-server default (Django, minio UI)
- cortex metrics 9100 — Prometheus node_exporter default
- neuron API     9090 — Cockpit default on Fedora, Prometheus self

Move to helexa-themed palindromic ports, all below Linux's
32768-60999 ephemeral range and not registered to any well-known
service:

- cortex API     31313
- cortex metrics 31314
- neuron API     13131

Updated places:
- cortex.example.toml, neuron.example.toml defaults
- default impls in cortex-core and neuron config
- cortex-cli --endpoint default for the status subcommand
- doc comments citing example URLs
- README.md and CLAUDE.md snippets

Consumers already on the old ports need a one-line edit in their
/etc/cortex/cortex.toml or /etc/neuron/neuron.toml to match;
firewall rules and prometheus scrape configs will also need
updating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 17:45:25 +03:00
3e1fb60076 ci: drop actions/cache for cargo registry and target
The cache round-trip (download + unpack) was consistently taking
around 6 minutes, noticeably longer than the ~3 minute cold build
it was meant to accelerate. Net-negative on CI time — remove it.

sccache with the S3 backend still provides dep-level caching at a
much lower overhead, so we keep the majority of the cache benefit
without paying the actions/cache tarball cost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 17:45:25 +03:00
Gitea Actions
9bf987888c chore: bump version to 0.1.14 2026-04-16 16:57:24 +03:00
abe4ff7ccc ci: publish both packages to a single helexa/helexa COPR project
All checks were successful
CI / Format, lint, build, test (push) Successful in 9m50s
CI / Build neuron SRPM (push) Successful in 43s
CI / Build cortex SRPM (push) Successful in 48s
CI / Publish neuron to COPR (push) Successful in 6m14s
CI / Publish cortex to COPR (push) Successful in 7m53s
CI / Bump version in source (push) Successful in 31s
Consolidates the previous helexa/cortex and helexa/helexa-neuron COPR
projects into one shared project. Hosts enable a single repo and get
access to both packages — cortex for gateway hosts and helexa-neuron
for GPU nodes. Reduces the "which copr do I enable on this host"
friction, and makes it clear the two packages are parts of the same
helexa project suite.

CI keeps two independent publish jobs (copr-cortex and copr-neuron)
running in parallel; they now both target helexa/helexa with their
respective SRPMs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.1.14
2026-04-16 16:37:47 +03:00
7c3390a4e1 fix(rpm): rename neuron package to helexa-neuron
Fedora's official repos ship a package named `neuron` — the NEURON
neural-simulation environment from Yale (see
https://src.fedoraproject.org/rpms/neuron). Having our own `neuron`
in the helexa COPR caused dnf5 to silently no-op `dnf install neuron`
because of the name collision, even with the COPR repo enabled and
keys imported. The only workarounds were full NEVRA (`dnf install
neuron-0.1.12-1.fc43.x86_64`) or a local file install — neither
acceptable for end-users.

Rename the RPM package to `helexa-neuron`. Keep binary (/usr/bin/neuron),
systemd unit (neuron.service), system user (neuron), and config dir
(/etc/neuron) unchanged — those are project-local contexts where the
short name is unambiguous. Follows Fedora subpackage-style naming
except with a vendor prefix rather than a parent-package prefix,
because neuron is an independent package from cortex (installed on
different hosts) and neither depends on the other.

Changes:
- neuron.spec -> helexa-neuron.spec (git rename)
- Name: neuron -> helexa-neuron (with comment explaining why)
- CI: srpm-neuron job now builds helexa-neuron-VERSION.tar.gz with the
  matching top-level dir prefix, publishes to helexa/helexa-neuron COPR
- CI: bump-version job references helexa-neuron.spec
- CLAUDE.md: install instructions updated

Old helexa/neuron COPR project can be deleted after the first
helexa/helexa-neuron build lands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:37:47 +03:00
2ff062da0e ci: commit generated %changelog entries back to main
Previously the srpm-* jobs generated a fresh %changelog entry and
shipped it to COPR, but the version-stamped spec pushed back to main
by the bump-version job only updated the Version: line — not the
%changelog section. The result: SRPM and in-tree spec diverged and
a fresh clone of the repo showed a perpetually empty changelog.

Run the rpm-changelog action in bump-version too. Now the committed
specs track the SRPMs: each release leaves a dated %changelog entry
in main covering commits since the previous tag, visible in git log
and in the repo's spec browser.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:37:03 +03:00
Gitea Actions
357f858a29 chore: bump version to 0.1.12 2026-04-16 15:47:21 +03:00
556e5293dc fix(rpm): explicitly Provides user(name) to satisfy systemd unit Requires
All checks were successful
CI / Format, lint, build, test (push) Successful in 2m59s
CI / Build cortex SRPM (push) Successful in 44s
CI / Build neuron SRPM (push) Successful in 49s
CI / Publish neuron to COPR (push) Successful in 8m17s
CI / Publish cortex to COPR (push) Successful in 9m56s
CI / Bump version in source (push) Successful in 30s
Diagnosing the persistent "Nothing to do" on v0.1.10 surfaced that
removing %attr(,,name) from %files wasn't enough. systemd-rpm-macros
ships its own rpm dep generator (/usr/lib/rpm/systemd.req) that parses
User=/Group= directives from every .service file the package ships
and emits Requires: user(NAME)/group(NAME) accordingly.

Rpmbuild log from v0.1.10 shows these Requires are still emitted even
after the %attr removal. Meanwhile the sysusers provides-generator
emits group(NAME) in both unversioned and versioned forms, but only
a versioned user(NAME) = <base64> when the u-line has GECOS/home/shell
fields. The asymmetry leaves Requires: user(NAME) unresolvable.

Add explicit Provides: user(NAME) back to both specs, with a comment
documenting the actual cause (systemd unit parsing, not file attrs)
so the next person touching these specs doesn't repeat the mistake.

Why monsoon didn't hit this: it creates its user in %pre via
groupadd/useradd (not sysusers.d), so no Provides are generated at
all — matching the Requires: user(monsoon) by luck of the rpm solver
treating unknown symbols as soft-fails for that path. Ours went through
the sysusers Provides code path and hit the asymmetry instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.1.12
2026-04-16 15:32:51 +03:00
1d90238b01 ci: migrate rpm changelog generation to reusable action
Replace the local .gitea/scripts/generate-rpm-changelog.sh with the
shared composite action at https://git.lair.cafe/actions/rpm-changelog@v1.
Behaviour is identical — collect commits since the previous v* tag,
filter bump-version and merge noise, prepend a dated entry to the
spec — but the logic now lives in one place that other projects can
consume.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:32:51 +03:00
d99b25fb8a ci: auto-generate rpm changelog entry per release
On every tag push, build a %changelog entry from the git log since
the previous v* tag and prepend it to each spec. Stops the initial
entry from drifting further and catches bogus-date / stale-version
warnings automatically since the generated date always matches the
day the CI runs.

The generator drops "chore: bump version" commits (bot-authored,
noisy in user-facing changelogs) and merge commits. Author defaults
to the gitea-actions identity but can be overridden via
CHANGELOG_AUTHOR env var if a human release is desired.

Requires fetch-depth: 0 on checkout so git describe can see prior
tags and git log can reach them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:32:51 +03:00
034da319f1 fix(rpm): correct weekday in changelog entry
April 15 2026 was a Wednesday, not Tuesday. rpmbuild validates the
day-of-week against the date and warns on mismatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:32:51 +03:00
Gitea Actions
7ece281617 chore: bump version to 0.1.10 2026-04-16 15:06:18 +03:00
3bb5b3c425 fix(rpm): drop %attr(,,user) on config files to avoid dnf silent filter
All checks were successful
CI / Format, lint, build, test (push) Successful in 1m11s
CI / Publish cortex to COPR (push) Successful in 11m3s
CI / Build cortex SRPM (push) Successful in 43s
CI / Build neuron SRPM (push) Successful in 43s
CI / Publish neuron to COPR (push) Successful in 8m56s
CI / Bump version in source (push) Successful in 30s
Using %attr(,,cortex) / %attr(,,neuron) on config files caused rpm's
auto-dep-generator to emit Requires: user(name) and group(name) on
each package. When those Requires couldn't be resolved — whether due
to sysusers Provides mismatches, missing GPG keys, or dnf5 cache
state — dnf5 silently filtered the package out of the candidate set
and reported "Nothing to do" rather than an unsatisfied-dep error.

Adopt the pattern that already works reliably across our infra
(grenade/monsoon): ship config files as default root:root with 0644
perms, don't declare user/group ownership in the rpm file list.
systemd-sysusers still creates the service user via the shipped
sysusers.d file; the service drops to that user at runtime via the
User= directive in the unit.

This removes the user(cortex)/user(neuron) Requires entirely, which
is the root cause of the dnf5 filtering. File permission tightening
can be reintroduced later — either via a separate secrets file with
different mode bits, or by moving secret material to /var/lib/<svc>/
where the service drop-privileges account already has write access.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.1.10
2026-04-16 14:50:17 +03:00
Gitea Actions
9fa51ad874 chore: bump version to 0.1.8 2026-04-16 10:56:07 +00:00
9697fbae73 fix(neuron): run service as neuron user, not cortex
All checks were successful
CI / Format, lint, build, test (push) Successful in 2m22s
CI / Build cortex SRPM (push) Successful in 43s
CI / Build neuron SRPM (push) Successful in 43s
CI / Publish neuron to COPR (push) Successful in 8m49s
CI / Publish cortex to COPR (push) Successful in 11m22s
CI / Bump version in source (push) Successful in 31s
neuron and cortex are independent packages installable on different
hosts. Having neuron run under a 'cortex' system user implied a
shared identity that doesn't exist. Give neuron its own user/group.

- New data/neuron-sysusers.conf declares the neuron user/group with
  home /var/lib/neuron.
- systemd unit User/Group changed to neuron.
- Spec file attrs, explicit Provides, and %sysusers_create_compat
  updated to reference the neuron user.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.1.8
2026-04-16 13:32:36 +03:00
Gitea Actions
2ce1060cb8 chore: bump version to 0.1.7 2026-04-16 13:25:34 +03:00
142e91c3f7 fix(neuron): install config at /etc/neuron/, not /etc/cortex/
All checks were successful
CI / Format, lint, build, test (push) Successful in 4m45s
CI / Build neuron SRPM (push) Successful in 44s
CI / Build cortex SRPM (push) Successful in 45s
CI / Publish neuron to COPR (push) Successful in 8m52s
CI / Publish cortex to COPR (push) Successful in 11m17s
CI / Bump version in source (push) Successful in 30s
The neuron package was shipping its config at /etc/cortex/neuron.toml,
which implied a shared config directory between two independent
packages. Move to /etc/neuron/neuron.toml — neuron owns its own etc
dir, consistent with its own /usr/lib/sysusers.d/neuron.conf and
/usr/lib/systemd/system/neuron.service. Updated the systemd unit's
ExecStart path and the example toml header to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.1.7
2026-04-16 13:07:06 +03:00
Gitea Actions
52c8b4c983 chore: bump version to 0.1.5 2026-04-16 13:01:42 +03:00
4a9a4fc775 ci: migrate copr publish to reusable action
All checks were successful
CI / Format, lint, build, test (push) Successful in 1m26s
CI / Build neuron SRPM (push) Successful in 45s
CI / Build cortex SRPM (push) Successful in 44s
CI / Publish neuron to COPR (push) Successful in 8m22s
CI / Publish cortex to COPR (push) Successful in 11m0s
CI / Bump version in source (push) Successful in 30s
Replace the in-repo .gitea/scripts/copr-build.sh and per-job
copr-cli configuration with the shared composite action at
https://git.lair.cafe/actions/copr-publish@v1. Behaviour is
identical — submit, watch, dump per-chroot logs — but the logic
now lives in a single place that other projects can consume.

Removes the actions/checkout step from both COPR jobs since the
build script is no longer local to this repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.1.5
2026-04-16 12:34:39 +03:00
53a3c1e157 fix(rpm): explicitly Provides user(cortex)/group(cortex)
All checks were successful
CI / Format, lint, build, test (push) Successful in 57s
CI / Build cortex SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
dnf5 was silently rejecting neuron-0.1.3 with "Nothing to do" because
it had an unresolvable Requires. Inspection showed:

  Requires: user(cortex)               ← unversioned
  Provides: user(cortex) = <base64>    ← versioned only, no unversioned

rpm's sysusers provides-generator only emits the unversioned user()
provide when the u-line is minimal. Our sysusers.conf specifies GECOS,
home dir, and shell, which pushes the generator to versioned-only.
The matching Requires (auto-generated from %attr(,,cortex) on config
files) is unversioned, so resolution failed silently.

Explicitly declare Provides: user(cortex) and Provides: group(cortex)
to guarantee the unversioned forms exist. group(cortex) was already
emitted unversioned but adding it for symmetry and to protect against
future generator changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:06:05 +03:00
5c7d63c658 ci: dump COPR per-chroot build logs to CI output
Previously the COPR publish steps only surfaced copr-cli's status
updates (pending/importing/running). When a build failed, diagnosing
required clicking through to the COPR web UI. Now we submit with
--nowait, watch the build, then use copr-cli download-build to fetch
each chroot's builder-live.log and cat them as collapsible ::group::
blocks in the CI output.

Logic is factored into .gitea/scripts/copr-build.sh so cortex and
neuron jobs share it. Both COPR jobs now check out the repo to access
the script.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:06:05 +03:00
Gitea Actions
f161412f91 chore: bump version to 0.1.3 2026-04-16 11:41:11 +03:00
ba5020138f fix(rpm): rename sysusers files to match package names
All checks were successful
CI / Format, lint, build, test (push) Successful in 3m35s
CI / Build cortex SRPM (push) Successful in 1m46s
CI / Build neuron SRPM (push) Successful in 1m41s
CI / Publish cortex to COPR (push) Successful in 7m14s
CI / Publish neuron to COPR (push) Successful in 5m44s
CI / Bump version in source (push) Successful in 30s
cortex-gateway.conf/cortex-neuron.conf implied a hierarchy or coupling
that doesn't exist — cortex and neuron are independent packages.
Each package's sysusers.d file now matches the package name:
cortex ships cortex.conf, neuron ships neuron.conf. Content is still
identical (both create the cortex system user/group), and filenames
remain distinct so the packages can coinstall.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.1.3
2026-04-16 11:20:08 +03:00
209150771e fix(rpm): use sysusers.d for cortex user/group creation
Both packages set %attr(...,cortex) on their config files, which
caused RPM's auto-dep-generator to emit Requires: group(cortex) /
user(cortex). The %pre scriptlets that actually created the group
ran too late — dnf rejected neuron installation on hosts without
cortex because nothing Provided group(cortex).

Switch to systemd-sysusers declarative user creation: each package
ships its own named sysusers.d file (cortex-gateway.conf and
cortex-neuron.conf — different names so both packages can coinstall)
with identical content defining the cortex user/group. RPM's
user/group dep generator now emits Provides: user(cortex) and
Provides: group(cortex) automatically from the sysusers.d files,
satisfying the auto-generated Requires. Either package installs
standalone; both can coinstall on the gateway host if desired.

Also added Requires: systemd since %sysusers_create_compat depends
on systemd-sysusers being present on the target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 11:18:37 +03:00
Gitea Actions
7c60af3464 chore: bump version to 0.1.2 2026-04-16 11:03:29 +03:00
ada76b0153 fix(rpm): add missing native build dependencies
All checks were successful
CI / Format, lint, build, test (push) Successful in 4m34s
CI / Build neuron SRPM (push) Successful in 1m49s
CI / Build cortex SRPM (push) Successful in 44s
CI / Publish cortex to COPR (push) Successful in 7m14s
CI / Publish neuron to COPR (push) Successful in 5m43s
CI / Bump version in source (push) Successful in 52s
COPR build failed on openssl-sys because openssl headers were not
available in the mock chroot. Adding:

- pkgconfig(openssl): fixes the immediate openssl-sys failure.
  Kept as a build dep because we plan to add optional mTLS between
  cortex and neuron, which requires native-tls/openssl at build time.
- cmake, gcc-c++: aws-lc-sys (pulled via rustls) compiles libcrypto
  via cmake and includes C++ sources. Would be the next failure after
  openssl.
- perl-interpreter: catchall for -sys crate build scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.1.2
2026-04-16 10:49:20 +03:00
15ded3a5bd ci: cache target/, disable incremental, drop redundant build
Three complementary tweaks to close the gap sccache alone can't:

- CARGO_INCREMENTAL=0: reclaims the 17 incremental-mode cache misses
  per run and prevents cargo from writing incremental fingerprints
  that defeat sccache. Incremental mode is useless in CI anyway since
  each run starts from scratch.
- actions/cache for ~/.cargo and target/: sidesteps sccache's
  structural limits (proc-macro non-cacheables, clippy-vs-rustc
  separate namespaces) by caching the whole build output keyed on
  Cargo.lock. Also caches ~/.cargo/bin so the installed sccache
  binary survives between runs.
- Drop the separate 'cargo build' step: 'cargo test --workspace'
  builds everything anyway, so the standalone build was a full
  redundant workspace compile pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:44:45 +03:00
7befa882d5 fix: yaml syntax
Some checks failed
CI / Format, lint, build, test (push) Successful in 1m42s
CI / Build neuron SRPM (push) Successful in 42s
CI / Build cortex SRPM (push) Successful in 1m40s
CI / Publish neuron to COPR (push) Failing after 4m11s
CI / Publish cortex to COPR (push) Failing after 3m16s
CI / Bump version in source (push) Has been skipped
v0.1.1
2026-04-16 09:25:02 +03:00
d03fae960a fix(ci): unset RUSTC_WRAPPER during sccache install
All checks were successful
CI / Format, lint, build, test (push) Successful in 2m40s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
The workflow-level env set RUSTC_WRAPPER=sccache for every step,
including the install step itself. cargo install sccache then
tried to invoke `sccache rustc -vV` to detect the toolchain before
sccache existed on PATH, failing with "No such file or directory".
Override RUSTC_WRAPPER to empty on the install step so cargo uses
rustc directly; subsequent steps still inherit the wrapper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:31:26 +03:00