cortex

Author	SHA1	Message	Date
rob thijssen	e9d0a75dd5	ci(prerelease): auto-build on every push to main Some checks failed build-prerelease / Build cortex binary (push) Blocked by required conditions Details CI / Clippy (push) Waiting to run Details CI / Test (push) Waiting to run Details build-prerelease / Resolve version stamps (push) Successful in 33s Details CI / Format (push) Successful in 36s Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package cortex RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details CI / Build cortex SRPM (push) Has been cancelled Details CI / Build neuron SRPM (push) Has been cancelled Details CI / Publish cortex to COPR (push) Has been cancelled Details CI / Publish neuron to COPR (push) Has been cancelled Details CI / Bump version in source (push) Has been cancelled Details build-prerelease / Build neuron-blackwell (push) Has been cancelled Details The build-prerelease workflow was workflow_dispatch-only, which meant every commit needed a manual run dispatch before any host could upgrade. That left rolling fixes (e.g. f9f5fa4's StateDirectory fix) sitting on main with no published RPM behind them, so deploy.sh silently fell back to an older prerelease. Add 'push: branches: [main]' alongside the existing workflow_dispatch trigger; the unstable channel now tracks head automatically. The concurrency group is keyed on ${{ github.ref }} with cancel-in-progress so successive rapid-fire pushes coalesce to one build (latest wins) rather than queueing every intermediate commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 13:13:36 +03:00
rob thijssen	6cf87e328f	chore(neuron): log load_model failures server-side with full chain The HTTP handler now emits a tracing::warn on load_model failures with the expanded anyhow chain (format!("{e:#}")) before returning the 400. journalctl -u neuron will surface the underlying hf-hub / materialisation error without needing to capture the curl response body separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 13:08:54 +03:00
rob thijssen	f9f5fa41b6	fix(neuron): surface full anyhow chain + ensure $HOME exists at start Some checks failed CI / Format (push) Successful in 30s Details CI / Test (push) Failing after 49s Details CI / Clippy (push) Successful in 2m16s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Two fixes uncovered by the live validation against beast/benjy/quadbrat: 1. api.rs swallowed everything beyond the outermost anyhow context. The validation script reported '{"error":"fetch GGUF ...gguf"}' but the actual underlying hf-hub failure (cache dir creation, network, auth, etc.) was hidden. Switching every error response to format!("{e:#}") expands the full cause chain via anyhow's alternate Display format. 2. The neuron systemd unit declared the service user but never ensured /var/lib/neuron (its $HOME) existed. hf-hub defaults its cache to ~/.cache/huggingface/hub — when $HOME is absent the cache dir creation fails and the download aborts. Adding `StateDirectory=neuron` makes systemd create + chown that directory at activation; no spec change needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 08:17:37 +03:00
rob thijssen	ed4d71db09	fix(validate-neuron): default to unsloth GGUF + capture curl errors Two reasons the previous run silently bailed after POST /models/load: 1. Default model was Qwen/Qwen3-0.6B-GGUF (official). That repo ships ONLY Q8_0 — no Q4_K_M, no Q4_0, nothing else. The GGUF filename matcher in CandleHarness::resolve_files returned "no GGUF file matching quant Q4_K_M" and the load endpoint returned an error, but the script used `curl --silent --fail` and swallowed it. 2. /models/load is synchronous (it awaits the full HF download + GGUF parse). curl --max-time 30 was way too short for a 400 MB fresh download. Fixes: - Default model is now unsloth/Qwen3-0.6B-GGUF, which mirrors the full Q-spectrum (Q2_K through Q8_0 plus BF16) so Q4_K_M actually exists. - trigger_load / run_probe now use --write-out to capture HTTP code and emit the response body on non-2xx, so failures surface a real diagnostic instead of an opaque set -e abort. - LOAD_TIMEOUT bumped to 600s; INFER_TIMEOUT to 120s. - Probe payload built via `yq -n` so JSON quoting is reliable regardless of the prompt text. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 08:14:31 +03:00
rob thijssen	39010c779f	add script/validate-neuron.sh — end-to-end candle harness smoke test Loads a small public Qwen3 GGUF on a target neuron host, fires a deterministic reasoning probe ("What is the capital of France?"), and asserts the response contains 'Paris'. Used to validate the candle harness on a real GPU host before the Stage 7 TP work begins, and as a regression check after future neuron builds. Defaults to beast.hanzalova.internal + Qwen/Qwen3-1.7B-GGUF + Q4_K_M; all three are positional args so the same script tests any node / model combination. Polls /models after triggering the load since /models/load returns once the materialisation is queued, not finished. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 07:58:05 +03:00
rob thijssen	57d7ef8d3c	chore: revert dnf. runner user has no system privs All checks were successful CI / Format (push) Successful in 38s Details CI / Clippy (push) Successful in 2m20s Details CI / Test (push) Successful in 4m42s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details	2026-05-19 07:16:38 +03:00
rob thijssen	0e9671dd7d	fix(ci): drop sudo from dnf install (runner runs as root, no sudo) All checks were successful CI / Format (push) Successful in 36s Details CI / Clippy (push) Successful in 2m13s Details CI / Test (push) Successful in 4m17s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details The act runner container has no sudo binary; the runner user already runs as root inside the container. Existing steps (rpmbuild, gpg, etc) already invoke privileged commands directly without sudo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 07:06:52 +03:00
rob thijssen	e29c9e35f0	fix(ci): ensure rust toolchain present on cuda-13.0 runner The currently-published runner-cuda-13.0 image (gongfoo) is missing rust/cargo despite inheriting from runner-rust. Build-neuron fails immediately with 'cargo: command not found' even though build-cortex on the bare 'rust' runner builds fine. Add a defensive `dnf install rust cargo clippy` step at the top of build-neuron. Idempotent — on a properly-built runner image this is a fast no-op; on the current broken image it installs the toolchain in a few seconds. The runner image itself should be rebuilt in gongfoo so this step becomes redundant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 07:04:57 +03:00
rob thijssen	8a2334eacb	deploy: dnf-native version check + lair.cafe repo bootstrap Replaces the string compare of 'git describe --tags' vs the binary's self-reported --version (which lies about prereleases — every 0.1.16-* RPM reports just "0.1.16") with the dnf-native question of "is the installed package current against what the repo offers". Mechanism: - installed_nvr(): rpm -q --qf '%{version}-%{release}' for the resident package, falling back to "(not installed)". Capturing rpm's output through a variable keeps its "package X is not installed" stdout message out of the result on failure. - needs_update(): probes rpm -q first (treats absent as "needs work"), then asks dnf check-update --refresh -q. Other dnf failures collapse into "needs update" so the subsequent install surfaces a real error rather than this check swallowing one silently. - ensure_lair_repo(): probes for /etc/yum.repos.d/lair-cafe-unstable.repo and adds it with `dnf config-manager addrepo` when missing. The upstream .repo file ships enabled=0 (unstable channel doesn't auto-engage on fetch), so we then run `dnf config-manager setopt lair-cafe-unstable.enabled=1` every run — cheap, idempotent. - Cortex and neuron install branches now guard `systemctl stop` with `[ ! -f /usr/lib/systemd/system/...service ] \|\| sudo systemctl stop` so fresh installs (no unit file yet) don't short-circuit the install step under set -e. - dnf output is captured into a variable and only printed (with a [host] prefix per line) on failure, so success stays quiet and failures show the actual diagnostic instead of being eaten by &> /dev/null. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 18:55:02 +03:00
rob thijssen	aad314cdfa	feat(neuron): graceful unload-on-shutdown via SIGTERM/SIGINT Stage 6 of the candle-native pivot. Adds first-class deactivation: neuron now drains in-flight requests on SIGTERM (systemd stop) or SIGINT (Ctrl-C), then unloads every loaded model before the process exits — releasing CUDA contexts and VRAM cleanly rather than leaving the OS to reclaim them. Mechanism: - startup::shutdown_signal() resolves on either ctrl_c() or a SIGTERM listener. - axum::serve(...).with_graceful_shutdown(shutdown_signal()) stops accepting new connections, lets active requests finish, then returns control to main. - startup::unload_all_models(&registry) iterates list_all_models() and calls unload per entry. Per-model failures are logged warnings; cleanup continues. Empty registry is a fast no-op. - main holds an Arc<NeuronState> reference past axum's lifetime so the registry is still reachable for the unload sweep. data/neuron.service: - TimeoutStopSec=120s — generous bound for big-model unloads before systemd escalates to SIGKILL. - KillSignal=SIGTERM — explicit, matches the handler. Two non-gated tests cover the empty-registry no-op and the no-models- loaded path. Real load-then-unload-on-shutdown is exercised by the cuda-integration test from Stage 2 (which calls unload_model directly) and observable on a real GPU host by stopping the service and watching nvidia-smi. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:58:07 +03:00
rob thijssen	6779b7526a	feat(neuron): load default_models on service activation All checks were successful CI / Format (push) Successful in 34s Details CI / Clippy (push) Successful in 2m13s Details CI / Test (push) Successful in 4m6s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Stage 5 of the candle-native pivot. Adds first-class support for auto-loading a configured set of models when the neuron service activates. Config: - NeuronConfig.default_models: Vec<ModelSpec> (defaults to []). - neuron.example.toml ships a commented [[default_models]] example. Activation flow (crates/neuron/src/startup.rs::load_default_models): - Sequential — VRAM contention makes parallel loads risky. - Per-entry timing logged at info level on success. - Failures logged as warnings; the next entry is still attempted. - An empty list short-circuits without log noise. Called from main.rs after the registry is built and before the axum listener binds, so /models reflects the loaded state from the very first request. data/neuron.service gains TimeoutStartSec=1800s. With activation blocked on potentially slow first-time HF downloads + GGUF materialisation, systemd's default 90s would kill larger model loads mid-flight. Two non-gated tests in tests/activation.rs cover the continues-past-failure and empty-list paths using a synthetically unknown harness name to fail loads fast without touching the network. The cuda-integration test from earlier stages still exercises the real load/unload lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:56:08 +03:00
rob thijssen	84f5662df1	feat(neuron): OpenAI-compatible SSE streaming chat completions Stage 4 of the candle-native pivot. /v1/chat/completions now switches to text/event-stream when the request sets stream: true, emitting one chat.completion.chunk per generated token followed by the OpenAI [DONE] terminator. Pipeline: - chat_completion_stream creates a bounded mpsc::channel<ChatCompletionChunk>(32), sends the leading role chunk, then spawns a blocking task that acquires the per-model arch lock and runs the streaming generation loop. - run_inference_streaming tracks a cumulative decoded prefix so each chunk's delta.content is the substring added since the last chunk — safe across BPE byte-fallback boundaries that would otherwise split multi-byte UTF-8 chars. - The blocking task aborts cleanly if blocking_send fails (client disconnected), so generation stops when the SSE consumer hangs up. - Final chunk carries finish_reason ("stop" on EOS, "length" on max_tokens). The handler appends data: [DONE] after the channel closes. The Stage 3 streaming 501 placeholder test is repurposed: with the streaming path live, an unloaded model now hits the same 404 surface as the non-streaming path (the model lookup happens first). cortex-gateway's existing proxy is unchanged — it already forwards SSE bytes verbatim from Phase 2 work, so the candle SSE format passes through unmodified. Neuron Cargo.toml gains futures + tokio-stream (both already in workspace deps) for ReceiverStream and stream combinators. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:53:14 +03:00
rob thijssen	249c9442e8	chore: track deployment script All checks were successful CI / Format (push) Successful in 37s Details CI / Clippy (push) Successful in 2m2s Details CI / Test (push) Successful in 3m59s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details	2026-05-18 17:50:35 +03:00
rob thijssen	5e17081fb4	ci(prerelease): drop redundant rustup install step The build-cortex and build-neuron jobs were running a copied-from- mistralrs rustup install step. Both jobs use runner images that already provide rust via dnf: - runner-rust installs rust/cargo/clippy/rustfmt directly. - runner-cuda-13.0 extends runner-rust. Running 'rustup update stable' on top would install a parallel rustup-managed toolchain and shadow the dnf one — confusing and unnecessary. The existing ci.yml already trusts the dnf toolchain without any install step, so match that behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:47:29 +03:00
rob thijssen	03bed93fee	add asset/manifest.yml describing fleet hosts and neuron flavours All checks were successful CI / Format (push) Successful in 28s Details CI / Clippy (push) Successful in 2m54s Details CI / Test (push) Successful in 5m37s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Adds a single source of truth for which hosts run cortex vs neuron and which CUDA compute-capability flavour each neuron host needs: cortex : hanzalova.internal neurons : beast → helexa-neuron-blackwell (2x RTX 5090, sm_120) benjy → helexa-neuron-ada (RTX 4090, sm_89) quadbrat → helexa-neuron-ampere (RTX 3060, sm_86) script/deploy.sh (gitignored, local-only) is updated locally to read hosts and flavours from this manifest and dnf install the correct helexa-neuron-<flavour> package per host. Using 'dnf install --refresh --allowerasing' lets it swap out the previous bare helexa-neuron RPM or a different flavour without manual intervention; the spec Conflicts: clauses keep at most one flavour resident. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:37:14 +03:00
rob thijssen	4a5211d830	ci(prerelease): add ampere flavour alongside ada and blackwell Adds ampere (CUDA compute capability sm_86) to both the build-neuron and package-neuron matrices, so helexa-neuron-ampere RPMs are built and published alongside helexa-neuron-ada and helexa-neuron-blackwell. The prerelease spec already lists ampere in its Conflicts: clause, so no spec change is needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:28:19 +03:00
rob thijssen	6d2dc5ff1a	fix(ci): give fmt/clippy/test distinct CARGO_TARGET_DIR to avoid races After the candle deps were added, cargo builds run long enough that the parallel fmt/clippy/test jobs (all on the `rust` runner label, which appears to use act in host-executor mode) start racing each other's intermediate temp files under /root/.cache/act/<hash>/hostexecutor/target/debug/deps/ Concretely the test job hit: error: No such file or directory at path "target/debug/deps/.tmprlicL7" Compiling unicode-ident because another job's cargo invocation cleaned up the temp file mid-compile. fmt and clippy happened to finish without their own target races landing fatally, so only test failed visibly. Set CARGO_TARGET_DIR=target-${{ github.job }} at the workflow level so each job writes to its own target directory. sccache still backs the actual rustc cache, so the rebuild penalty is just metadata not full recompiles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:26:29 +03:00
rob thijssen	b713dbe669	fix(ci): pass GPG secrets via env to avoid Gitea log leakage Some checks failed CI / Format (push) Successful in 28s Details CI / Test (push) Failing after 43s Details CI / Clippy (push) Successful in 2m9s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details The previous "Import signing key" step inlined ${{ secrets.RPM_SIGNING_KEY }} and ${{ secrets.RPM_SIGNING_KEY_ID }} directly into the run: block. Template expansion writes the literal secret value into the rendered shell script, and Gitea logs the rendered script — Gitea's masker may not reliably scrub multi-line keys, so values can leak. Move both secrets into the step's env: block (the same pattern the "Set up SSH" step already uses) and reference $VARs in the script. The script body now contains only variable names; the secret values live in the process environment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:13:52 +03:00
rob thijssen	5c957d08ec	ci: add build-prerelease workflow for CUDA RPMs on rpm.lair.cafe Some checks failed CI / Format (push) Successful in 36s Details CI / Test (push) Failing after 53s Details CI / Clippy (push) Successful in 2m35s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Adds a manually-triggered workflow that builds CUDA-flavoured neuron binaries and a CPU cortex binary, packages them as Fedora RPMs, signs them, and rsyncs to the unstable channel at https://rpm.lair.cafe/fedora/43/x86_64/unstable/. Mirrors the build pipeline used by grenade/mistralrs-package. Pipeline: - prepare: derive {version,short_sha,commit_date} from the checkout; the prerelease Release stamp "0.1.YYYYMMDDgitSHORTSHA" sorts below the eventual "1" stable release. - build-cortex: cargo build --release -p cortex-cli on a rust runner. - build-neuron: matrix over ada (sm_89) and blackwell (sm_120) on cuda-13.0 runners; cargo build with features "cuda cudnn flash-attn" and CUDA_COMPUTE_CAP set per flavour. - package-{cortex,neuron}: rpmbuild on the rpm runner against the new prebuilt-binary specs in rpm/. - publish: import signing key, sign RPMs, rsync to oolon, createrepo_c --update, then regenerate packages.json for the UI. New specs are prebuilt-binary variants — they consume the artifact from the build job rather than running cargo at rpmbuild time. Each helexa-neuron-{flavour} package Conflicts with the other flavours and with helexa-neuron (the future source-build stable package) so one flavour is installed at a time on a given host. neuron crate gains cudnn and flash-attn feature flags forwarding to the corresponding candle features, so the CI build command compiles those kernels into the binary. sccache is intentionally NOT used in the prerelease jobs — CUDA compute cap isn't in its cache key, so flavours would mis-hit each other. Each prerelease build is a clean cargo build. Required Gitea secrets (already in place for cortex.spec / COPR workflow): - RPM_SIGNING_KEY, RPM_SIGNING_KEY_ID - RSYNC_SSH_KEY Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:01:35 +03:00
rob thijssen	729317d1ef	feat(neuron): OpenAI-compatible non-streaming chat completion Stage 3 of the candle-native pivot. neuron now serves POST /v1/chat/completions backed by candle's quantized_qwen3 forward pass on a per-model serialised generation loop, returning the standard OpenAI ChatCompletionResponse envelope. Pipeline per request: - Look up the LoadedModel by request.model (404 if absent). - Apply the Qwen3 chat template across all messages. - Tokenize, then spawn_blocking onto tokio's blocking pool to acquire the per-model arch lock and run prefill + greedy/temperature/top-p sampling via LogitsProcessor. - Stop on <\|im_end\|>/<\|endoftext\|> EOS or max_tokens (finish_reason "stop" vs "length"). - Decode with skip_special_tokens=true, build OpenAI response with prompt/completion/total usage counts. Supporting changes: - HarnessRegistry now stores Arc<dyn Harness> and caches a typed Arc<CandleHarness> so inference routes bypass dyn-Trait dispatch. - LoadedModel.arch becomes Arc<Mutex<ModelArch>> so the lock guard can be moved into spawn_blocking. - NeuronState gains an Option<Arc<CandleHarness>> field for the new inference route. - Typed InferenceError lets the handler map ModelNotLoaded → 404 and other failures → 500 without string-matching anyhow messages. - stream=true returns 501 until Stage 4 wires up SSE. - Two leftover mistral.rs string references in proxy.rs and cortex-cli (missed during the Stage 1 sweep) are corrected here. Three new default-feature tests cover the no-candle 503, model-not- loaded 404, and stream=true 501 paths. The cuda-integration test from Stage 2 still covers real load/unload; a streaming-feature gated test exercising actual generation will arrive with Stage 4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:47:58 +03:00
rob thijssen	5c2bd1a1da	feat(neuron): wire candle harness load/unload via GGUF Stage 2 of the candle-native pivot. Fleshes out CandleHarness with a LoadedModel registry keyed by model_id, hf-hub-backed GGUF download, and Qwen3 quantized weight construction via candle-transformers' quantized_qwen3 module. unload_model drops the entry; Drop on the candle ModelWeights frees device memory. Device selection prefers CUDA (gated behind the new `cuda` feature), falling back to CPU when CUDA is unavailable so default builds work on non-GPU hosts. The candle CUDA toolchain isn't pulled in unless `--features cuda` is passed, keeping CI green on CPU runners. Config gains a [harness.candle] block with an optional hf_cache path. HarnessRegistry::from_configs now takes HarnessSettings so per-harness config flows through. A gated tests/candle_lifecycle.rs exercises real load → list → unload → list-empty when run with `--features cuda-integration` against a host with HF network access. The default-feature test in tests/api.rs covers the wrong-harness rejection path without needing the network. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:02:49 +03:00
rob thijssen	3cccc2c56b	refactor(neuron): cut mistralrs/llamacpp, scaffold candle harness Stage 1 of the candle-native pivot. Replaces the external-process harness model (mistralrs over HTTP, llamacpp placeholder) with an in-process Harness trait whose sole implementation is candle. The trait keeps its shape so future engines slot in additively, but start/stop default to no-ops and HarnessConfig drops endpoint and systemd_unit since no harness needs external supervision. Behaviour is unchanged on the wire: load_model returns a "not implemented yet (Stage 2)" error and list_models is empty. The gateway-side proxy, poller, and router are untouched. CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are marked superseded; the staged plan lives in ~/.claude/plans/create-a-more-aggressive-calm-naur.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:53:04 +03:00
rob thijssen	7f797b0265	ci: parallelise fmt/clippy/test and drop sccache install step All checks were successful CI / Format (push) Successful in 33s Details CI / Clippy (push) Successful in 1m31s Details CI / Test (push) Successful in 2m11s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 13:55:17 +03:00
rob thijssen	5a0360c1d5	ci: use container runner labels for CI jobs Some checks failed CI / Format, lint, build, test (push) Successful in 4m20s Details CI / Build cortex SRPM (push) Has been cancelled Details CI / Build neuron SRPM (push) Has been cancelled Details CI / Publish cortex to COPR (push) Has been cancelled Details CI / Publish neuron to COPR (push) Has been cancelled Details CI / Bump version in source (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 13:29:42 +03:00
rob thijssen	472c0e8737	fix(rpm): ship firewalld service definitions with correct ports Some checks failed CI / Format, lint, build, test (push) Has been cancelled Details CI / Build cortex SRPM (push) Has been cancelled Details CI / Build neuron SRPM (push) Has been cancelled Details CI / Publish cortex to COPR (push) Has been cancelled Details CI / Publish neuron to COPR (push) Has been cancelled Details CI / Bump version in source (push) Has been cancelled Details cortex: opens 31313/tcp (API) and 31314/tcp (metrics) neuron: opens 13131/tcp Installs to /usr/lib/firewalld/services/ so firewall-cmd --add-service=cortex / --add-service=helexa-neuron works out of the box. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 12:52:20 +03:00
Gitea Actions	b9d8e30058	chore: bump version to 0.1.16	2026-04-16 15:04:21 +00:00
rob thijssen	25f75fe552	chore: ignore local deploy script All checks were successful CI / Format, lint, build, test (push) Successful in 1m15s Details CI / Build cortex SRPM (push) Successful in 43s Details CI / Build neuron SRPM (push) Successful in 44s Details CI / Publish cortex to COPR (push) Successful in 7m23s Details CI / Publish neuron to COPR (push) Successful in 15m58s Details CI / Bump version in source (push) Successful in 31s Details v0.1.16	2026-04-16 17:45:25 +03:00
rob thijssen	3f94c50817	chore: move default ports out of common-collision ranges Previous defaults collided with well-trodden infra services and with the Linux ephemeral port range: - cortex API 8000 — common dev-server default (Django, minio UI) - cortex metrics 9100 — Prometheus node_exporter default - neuron API 9090 — Cockpit default on Fedora, Prometheus self Move to helexa-themed palindromic ports, all below Linux's 32768-60999 ephemeral range and not registered to any well-known service: - cortex API 31313 - cortex metrics 31314 - neuron API 13131 Updated places: - cortex.example.toml, neuron.example.toml defaults - default impls in cortex-core and neuron config - cortex-cli --endpoint default for the status subcommand - doc comments citing example URLs - README.md and CLAUDE.md snippets Consumers already on the old ports need a one-line edit in their /etc/cortex/cortex.toml or /etc/neuron/neuron.toml to match; firewall rules and prometheus scrape configs will also need updating. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 17:45:25 +03:00
rob thijssen	3e1fb60076	ci: drop actions/cache for cargo registry and target The cache round-trip (download + unpack) was consistently taking around 6 minutes, noticeably longer than the ~3 minute cold build it was meant to accelerate. Net-negative on CI time — remove it. sccache with the S3 backend still provides dep-level caching at a much lower overhead, so we keep the majority of the cache benefit without paying the actions/cache tarball cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 17:45:25 +03:00
Gitea Actions	9bf987888c	chore: bump version to 0.1.14	2026-04-16 16:57:24 +03:00
rob thijssen	abe4ff7ccc	ci: publish both packages to a single helexa/helexa COPR project All checks were successful CI / Format, lint, build, test (push) Successful in 9m50s Details CI / Build neuron SRPM (push) Successful in 43s Details CI / Build cortex SRPM (push) Successful in 48s Details CI / Publish neuron to COPR (push) Successful in 6m14s Details CI / Publish cortex to COPR (push) Successful in 7m53s Details CI / Bump version in source (push) Successful in 31s Details Consolidates the previous helexa/cortex and helexa/helexa-neuron COPR projects into one shared project. Hosts enable a single repo and get access to both packages — cortex for gateway hosts and helexa-neuron for GPU nodes. Reduces the "which copr do I enable on this host" friction, and makes it clear the two packages are parts of the same helexa project suite. CI keeps two independent publish jobs (copr-cortex and copr-neuron) running in parallel; they now both target helexa/helexa with their respective SRPMs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.14	2026-04-16 16:37:47 +03:00
rob thijssen	7c3390a4e1	fix(rpm): rename neuron package to helexa-neuron Fedora's official repos ship a package named `neuron` — the NEURON neural-simulation environment from Yale (see https://src.fedoraproject.org/rpms/neuron). Having our own `neuron` in the helexa COPR caused dnf5 to silently no-op `dnf install neuron` because of the name collision, even with the COPR repo enabled and keys imported. The only workarounds were full NEVRA (`dnf install neuron-0.1.12-1.fc43.x86_64`) or a local file install — neither acceptable for end-users. Rename the RPM package to `helexa-neuron`. Keep binary (/usr/bin/neuron), systemd unit (neuron.service), system user (neuron), and config dir (/etc/neuron) unchanged — those are project-local contexts where the short name is unambiguous. Follows Fedora subpackage-style naming except with a vendor prefix rather than a parent-package prefix, because neuron is an independent package from cortex (installed on different hosts) and neither depends on the other. Changes: - neuron.spec -> helexa-neuron.spec (git rename) - Name: neuron -> helexa-neuron (with comment explaining why) - CI: srpm-neuron job now builds helexa-neuron-VERSION.tar.gz with the matching top-level dir prefix, publishes to helexa/helexa-neuron COPR - CI: bump-version job references helexa-neuron.spec - CLAUDE.md: install instructions updated Old helexa/neuron COPR project can be deleted after the first helexa/helexa-neuron build lands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:37:47 +03:00
rob thijssen	2ff062da0e	ci: commit generated %changelog entries back to main Previously the srpm-* jobs generated a fresh %changelog entry and shipped it to COPR, but the version-stamped spec pushed back to main by the bump-version job only updated the Version: line — not the %changelog section. The result: SRPM and in-tree spec diverged and a fresh clone of the repo showed a perpetually empty changelog. Run the rpm-changelog action in bump-version too. Now the committed specs track the SRPMs: each release leaves a dated %changelog entry in main covering commits since the previous tag, visible in git log and in the repo's spec browser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:37:03 +03:00
Gitea Actions	357f858a29	chore: bump version to 0.1.12	2026-04-16 15:47:21 +03:00
rob thijssen	556e5293dc	fix(rpm): explicitly Provides user(name) to satisfy systemd unit Requires All checks were successful CI / Format, lint, build, test (push) Successful in 2m59s Details CI / Build cortex SRPM (push) Successful in 44s Details CI / Build neuron SRPM (push) Successful in 49s Details CI / Publish neuron to COPR (push) Successful in 8m17s Details CI / Publish cortex to COPR (push) Successful in 9m56s Details CI / Bump version in source (push) Successful in 30s Details Diagnosing the persistent "Nothing to do" on v0.1.10 surfaced that removing %attr(,,name) from %files wasn't enough. systemd-rpm-macros ships its own rpm dep generator (/usr/lib/rpm/systemd.req) that parses User=/Group= directives from every .service file the package ships and emits Requires: user(NAME)/group(NAME) accordingly. Rpmbuild log from v0.1.10 shows these Requires are still emitted even after the %attr removal. Meanwhile the sysusers provides-generator emits group(NAME) in both unversioned and versioned forms, but only a versioned user(NAME) = <base64> when the u-line has GECOS/home/shell fields. The asymmetry leaves Requires: user(NAME) unresolvable. Add explicit Provides: user(NAME) back to both specs, with a comment documenting the actual cause (systemd unit parsing, not file attrs) so the next person touching these specs doesn't repeat the mistake. Why monsoon didn't hit this: it creates its user in %pre via groupadd/useradd (not sysusers.d), so no Provides are generated at all — matching the Requires: user(monsoon) by luck of the rpm solver treating unknown symbols as soft-fails for that path. Ours went through the sysusers Provides code path and hit the asymmetry instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.12	2026-04-16 15:32:51 +03:00
rob thijssen	1d90238b01	ci: migrate rpm changelog generation to reusable action Replace the local .gitea/scripts/generate-rpm-changelog.sh with the shared composite action at https://git.lair.cafe/actions/rpm-changelog@v1. Behaviour is identical — collect commits since the previous v* tag, filter bump-version and merge noise, prepend a dated entry to the spec — but the logic now lives in one place that other projects can consume. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:32:51 +03:00
rob thijssen	d99b25fb8a	ci: auto-generate rpm changelog entry per release On every tag push, build a %changelog entry from the git log since the previous v* tag and prepend it to each spec. Stops the initial entry from drifting further and catches bogus-date / stale-version warnings automatically since the generated date always matches the day the CI runs. The generator drops "chore: bump version" commits (bot-authored, noisy in user-facing changelogs) and merge commits. Author defaults to the gitea-actions identity but can be overridden via CHANGELOG_AUTHOR env var if a human release is desired. Requires fetch-depth: 0 on checkout so git describe can see prior tags and git log can reach them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:32:51 +03:00
rob thijssen	034da319f1	fix(rpm): correct weekday in changelog entry April 15 2026 was a Wednesday, not Tuesday. rpmbuild validates the day-of-week against the date and warns on mismatch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:32:51 +03:00
Gitea Actions	7ece281617	chore: bump version to 0.1.10	2026-04-16 15:06:18 +03:00
rob thijssen	3bb5b3c425	fix(rpm): drop %attr(,,user) on config files to avoid dnf silent filter All checks were successful CI / Format, lint, build, test (push) Successful in 1m11s Details CI / Publish cortex to COPR (push) Successful in 11m3s Details CI / Build cortex SRPM (push) Successful in 43s Details CI / Build neuron SRPM (push) Successful in 43s Details CI / Publish neuron to COPR (push) Successful in 8m56s Details CI / Bump version in source (push) Successful in 30s Details Using %attr(,,cortex) / %attr(,,neuron) on config files caused rpm's auto-dep-generator to emit Requires: user(name) and group(name) on each package. When those Requires couldn't be resolved — whether due to sysusers Provides mismatches, missing GPG keys, or dnf5 cache state — dnf5 silently filtered the package out of the candidate set and reported "Nothing to do" rather than an unsatisfied-dep error. Adopt the pattern that already works reliably across our infra (grenade/monsoon): ship config files as default root:root with 0644 perms, don't declare user/group ownership in the rpm file list. systemd-sysusers still creates the service user via the shipped sysusers.d file; the service drops to that user at runtime via the User= directive in the unit. This removes the user(cortex)/user(neuron) Requires entirely, which is the root cause of the dnf5 filtering. File permission tightening can be reintroduced later — either via a separate secrets file with different mode bits, or by moving secret material to /var/lib/<svc>/ where the service drop-privileges account already has write access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.10	2026-04-16 14:50:17 +03:00
Gitea Actions	9fa51ad874	chore: bump version to 0.1.8	2026-04-16 10:56:07 +00:00
rob thijssen	9697fbae73	fix(neuron): run service as neuron user, not cortex All checks were successful CI / Format, lint, build, test (push) Successful in 2m22s Details CI / Build cortex SRPM (push) Successful in 43s Details CI / Build neuron SRPM (push) Successful in 43s Details CI / Publish neuron to COPR (push) Successful in 8m49s Details CI / Publish cortex to COPR (push) Successful in 11m22s Details CI / Bump version in source (push) Successful in 31s Details neuron and cortex are independent packages installable on different hosts. Having neuron run under a 'cortex' system user implied a shared identity that doesn't exist. Give neuron its own user/group. - New data/neuron-sysusers.conf declares the neuron user/group with home /var/lib/neuron. - systemd unit User/Group changed to neuron. - Spec file attrs, explicit Provides, and %sysusers_create_compat updated to reference the neuron user. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.8	2026-04-16 13:32:36 +03:00
Gitea Actions	2ce1060cb8	chore: bump version to 0.1.7	2026-04-16 13:25:34 +03:00
rob thijssen	142e91c3f7	fix(neuron): install config at /etc/neuron/, not /etc/cortex/ All checks were successful CI / Format, lint, build, test (push) Successful in 4m45s Details CI / Build neuron SRPM (push) Successful in 44s Details CI / Build cortex SRPM (push) Successful in 45s Details CI / Publish neuron to COPR (push) Successful in 8m52s Details CI / Publish cortex to COPR (push) Successful in 11m17s Details CI / Bump version in source (push) Successful in 30s Details The neuron package was shipping its config at /etc/cortex/neuron.toml, which implied a shared config directory between two independent packages. Move to /etc/neuron/neuron.toml — neuron owns its own etc dir, consistent with its own /usr/lib/sysusers.d/neuron.conf and /usr/lib/systemd/system/neuron.service. Updated the systemd unit's ExecStart path and the example toml header to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.7	2026-04-16 13:07:06 +03:00
Gitea Actions	52c8b4c983	chore: bump version to 0.1.5	2026-04-16 13:01:42 +03:00
rob thijssen	4a9a4fc775	ci: migrate copr publish to reusable action All checks were successful CI / Format, lint, build, test (push) Successful in 1m26s Details CI / Build neuron SRPM (push) Successful in 45s Details CI / Build cortex SRPM (push) Successful in 44s Details CI / Publish neuron to COPR (push) Successful in 8m22s Details CI / Publish cortex to COPR (push) Successful in 11m0s Details CI / Bump version in source (push) Successful in 30s Details Replace the in-repo .gitea/scripts/copr-build.sh and per-job copr-cli configuration with the shared composite action at https://git.lair.cafe/actions/copr-publish@v1. Behaviour is identical — submit, watch, dump per-chroot logs — but the logic now lives in a single place that other projects can consume. Removes the actions/checkout step from both COPR jobs since the build script is no longer local to this repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.5	2026-04-16 12:34:39 +03:00
rob thijssen	53a3c1e157	fix(rpm): explicitly Provides user(cortex)/group(cortex) All checks were successful CI / Format, lint, build, test (push) Successful in 57s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details dnf5 was silently rejecting neuron-0.1.3 with "Nothing to do" because it had an unresolvable Requires. Inspection showed: Requires: user(cortex) ← unversioned Provides: user(cortex) = <base64> ← versioned only, no unversioned rpm's sysusers provides-generator only emits the unversioned user() provide when the u-line is minimal. Our sysusers.conf specifies GECOS, home dir, and shell, which pushes the generator to versioned-only. The matching Requires (auto-generated from %attr(,,cortex) on config files) is unversioned, so resolution failed silently. Explicitly declare Provides: user(cortex) and Provides: group(cortex) to guarantee the unversioned forms exist. group(cortex) was already emitted unversioned but adding it for symmetry and to protect against future generator changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:06:05 +03:00
rob thijssen	5c7d63c658	ci: dump COPR per-chroot build logs to CI output Previously the COPR publish steps only surfaced copr-cli's status updates (pending/importing/running). When a build failed, diagnosing required clicking through to the COPR web UI. Now we submit with --nowait, watch the build, then use copr-cli download-build to fetch each chroot's builder-live.log and cat them as collapsible ::group:: blocks in the CI output. Logic is factored into .gitea/scripts/copr-build.sh so cortex and neuron jobs share it. Both COPR jobs now check out the repo to access the script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:06:05 +03:00
Gitea Actions	f161412f91	chore: bump version to 0.1.3	2026-04-16 11:41:11 +03:00
rob thijssen	ba5020138f	fix(rpm): rename sysusers files to match package names All checks were successful CI / Format, lint, build, test (push) Successful in 3m35s Details CI / Build cortex SRPM (push) Successful in 1m46s Details CI / Build neuron SRPM (push) Successful in 1m41s Details CI / Publish cortex to COPR (push) Successful in 7m14s Details CI / Publish neuron to COPR (push) Successful in 5m44s Details CI / Bump version in source (push) Successful in 30s Details cortex-gateway.conf/cortex-neuron.conf implied a hierarchy or coupling that doesn't exist — cortex and neuron are independent packages. Each package's sysusers.d file now matches the package name: cortex ships cortex.conf, neuron ships neuron.conf. Content is still identical (both create the cortex system user/group), and filenames remain distinct so the packages can coinstall. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> v0.1.3	2026-04-16 11:20:08 +03:00

1 2 3

124 Commits