act launches step shells without sourcing /etc/profile, so the
gitea_runner user's PATH lacks /usr/local/cuda-13.0/bin. cudarc's
build.rs panics with ENOENT on `nvcc --version` under the neuron
crate's cuda-version-from-build-system feature. build-prerelease.yml
already does this export — mirror it here.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI run 255 job 3 (CUDA type-check) fails with:
error: could not execute process `*** rustc -vV` (never executed)
Caused by: No such file or directory (os error 2)
The redacted `***` is `sccache`. The ci.yml workflow-level env block
sets `RUSTC_WRAPPER: sccache` because the generic `rust` runner has
sccache installed and routes the cache to caveman.kosherinata.internal.
The new `cuda-check` job runs on `cuda-13.0` (where nvcc lives), and
that runner doesn't carry sccache on PATH — so cargo's first action
(`sccache rustc -vV` to probe the compiler version) fails before
borrow-check even starts.
`build-prerelease.yml`, which uses the same `cuda-13.0` runner for
the actual release neuron builds, deliberately does NOT set
RUSTC_WRAPPER. That's the pattern this commit applies.
Fix: override `RUSTC_WRAPPER` (plus the SCCACHE_* and AWS_* env
locally on the job. We lose caching on the cuda-check job (it's
borrow-check-only and finishes in a couple minutes anyway), but
the gate runs.
The job's purpose — fail fast on `#[cfg(feature = "cuda")]`
borrowck errors that the default-feature gate misses — is what
matters, and that purpose was undermined by the env inheritance.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 2 of the Responses rollout: native `/v1/responses` endpoint on
neuron that consumes the same InferenceEvent stream as
`/v1/chat/completions` but emits it as the Responses API's named
SSE event family. No gateway-side translation.
## Surface
- `cortex-core::responses` envelope types: `ResponsesRequest`,
`ResponsesInput` (text | items), `ResponsesInputItem` (message |
function_call | function_call_output | reasoning),
`ResponsesContentPart` (input_text | input_image | output_text),
`ResponsesResponse`, `ResponsesOutputItem`, `ResponsesUsage`. Plus
a `events::*` constant module so the projector and the wire shape
stay in sync without string-typos.
- `neuron::wire::openai_responses`:
- `request_to_chat(req)` flattens Responses input + instructions
into a `ChatCompletionRequest` the candle harness already
understands. Text-only Parts collapse to a string; mixed
text+image Parts go to chat's content-array shape; reasoning
items drop; function_call / function_call_output round-trip
via tool_calls / tool_call_id metadata so the surface is
consistent for the day the harness emits tool calls.
- `project_responses_stream(rx, meta)` reads InferenceEvents
and emits the eight named events that compose a Responses
stream: response.created → output_item.added → content_part.added
→ output_text.delta×N → output_text.done → content_part.done
→ output_item.done → response.completed. Synthesises start
frames if the producer skips Start (poisoned model, early
disconnect) so the stream stays coherent.
- `build_response(meta, text, reason, usage)` for the
non-streaming path.
- `CandleHarness::inference_stream(req)` extracted from
`chat_completion_stream`, returning a typed `InferenceStream`
(event receiver + id/created/model_id metadata). Both
`chat_completion_stream` and the new `responses_stream` are now
thin wrappers that pick their wire projection. TP path got the
same treatment (`chat_completion_tp_stream` → `inference_tp_stream`).
- `POST /v1/responses` route on neuron. Non-streaming returns one
buffered `ResponsesResponse`; streaming returns axum SSE with
both event names and JSON data per frame (Responses, unlike
chat completions, uses named `event:` lines). Reused
`inference_error_response` helper hoisted out so the chat and
responses handlers share the InferenceError → HTTP mapping.
## CI
Also bundles the `cuda-check` runner-label fix from feedback on
commit 1859777: `runs-on: rpm` doesn't ship the CUDA toolkit so
cudarc's nvcc-version build script blew up. Switched to
`runs-on: cuda-13.0` per the existing labels.
## Scope cuts (documented in the modules)
- `previous_response_id` rejected at translate time with 400
(`code: chained_conversation_not_supported`) — stateful chained
conversations need a persistence layer we haven't built.
- Reasoning items dropped (no Qwen3 `<think>` routing yet).
- Single output item per response (one `"message"` carrying text);
`function_call` items reserved but not synthesised.
- Streaming events cover the core set; `response.in_progress`
and the web_search / image_generation event families are
out-of-scope.
22 new tests: 5 in cortex-core (envelope round-trips), 13 in
neuron::wire (request translator + projector + non-streaming
builder), 4 in neuron's tests/api.rs (route surface — 503 when no
candle, 400 on previous_response_id, 404 on missing model for
both stream and non-stream).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Run 244 caught a use-of-moved-value in a `#[cfg(feature = "cuda")]`
block that the default-feature workspace clippy/test gate had no
chance of seeing. The error appeared only when the RPM build
workflow compiled with `--features cuda` — 30+ minutes after push.
Add a `cuda-check` job to ci.yml that runs `cargo check -p neuron
--features cuda --all-targets` on the rpm runner (where nvcc /
cudarc build deps live; the generic `rust` runner doesn't have
them). Borrow-check only — we never run tests here, the runner
has no GPU. Same retry pattern as clippy/test.
Both SRPM jobs (`srpm-cortex`, `srpm-neuron`) now gate on
`cuda-check` so a CUDA build break can't reach the release pipeline.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sccache occasionally fails mid-compile with race-condition errors that
clear on a re-run without any code changes. Rather than tracking that
down right now, wrap the two affected steps in a bash loop that retries
up to three times with a 5-second pause. Real failures still surface;
they just take ~10s longer to fail.
fmt is left as a single invocation — it's a one-shot syntactic check,
not a build, and isn't subject to the same sccache races.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two CI hygiene fixes uncovered while validating against the live fleet.
1. Same-day prerelease packages were being ordered by RPM-vercmp's
alpha-vs-digit precedence on the git SHA fragment, not by commit
chronology. With release stamps like "0.1.${YYYYMMDD}git${SHA}",
two commits on the same day produce the same numeric prefix and
rpmvercmp falls back to comparing the alphanumeric SHA suffixes,
where digit-leading SHAs are ranked above alpha-leading ones —
completely unrelated to which commit landed first. Verified with
rpmdev-vercmp:
gitabc1234 < gitdef5678 (old scheme — purely lexicographic)
Bumping the timestamp prefix to second-precision (%Y%m%d%H%M%S)
makes the numeric prefix strictly monotonic for any chronologically-
ordered commits, so the SHA fragment becomes a debug identifier
only — never participates in version ordering.
2. ci.yml and build-prerelease.yml both target the `rust` runner label
and both auto-trigger on push to main. The act-based runner reuses
/root/.cache/act/<hash>/hostexecutor/ across concurrent jobs, so
ci.yml's clippy and build-prerelease.yml's build-cortex were racing
each other's checkout/cleanup steps and corrupting in-flight
compile artifacts. Real fix is in gongfoo; workflow-level workaround
is a shared concurrency group with cancel-in-progress=false so the
two workflows queue sequentially on the same ref.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the candle deps were added, cargo builds run long enough that
the parallel fmt/clippy/test jobs (all on the `rust` runner label,
which appears to use act in host-executor mode) start racing each
other's intermediate temp files under
/root/.cache/act/<hash>/hostexecutor/target/debug/deps/
Concretely the test job hit:
error: No such file or directory at path
"target/debug/deps/.tmprlicL7"
Compiling unicode-ident
because another job's cargo invocation cleaned up the temp file
mid-compile. fmt and clippy happened to finish without their own
target races landing fatally, so only test failed visibly.
Set CARGO_TARGET_DIR=target-${{ github.job }} at the workflow level
so each job writes to its own target directory. sccache still backs
the actual rustc cache, so the rebuild penalty is just metadata not
full recompiles.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cache round-trip (download + unpack) was consistently taking
around 6 minutes, noticeably longer than the ~3 minute cold build
it was meant to accelerate. Net-negative on CI time — remove it.
sccache with the S3 backend still provides dep-level caching at a
much lower overhead, so we keep the majority of the cache benefit
without paying the actions/cache tarball cost.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidates the previous helexa/cortex and helexa/helexa-neuron COPR
projects into one shared project. Hosts enable a single repo and get
access to both packages — cortex for gateway hosts and helexa-neuron
for GPU nodes. Reduces the "which copr do I enable on this host"
friction, and makes it clear the two packages are parts of the same
helexa project suite.
CI keeps two independent publish jobs (copr-cortex and copr-neuron)
running in parallel; they now both target helexa/helexa with their
respective SRPMs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fedora's official repos ship a package named `neuron` — the NEURON
neural-simulation environment from Yale (see
https://src.fedoraproject.org/rpms/neuron). Having our own `neuron`
in the helexa COPR caused dnf5 to silently no-op `dnf install neuron`
because of the name collision, even with the COPR repo enabled and
keys imported. The only workarounds were full NEVRA (`dnf install
neuron-0.1.12-1.fc43.x86_64`) or a local file install — neither
acceptable for end-users.
Rename the RPM package to `helexa-neuron`. Keep binary (/usr/bin/neuron),
systemd unit (neuron.service), system user (neuron), and config dir
(/etc/neuron) unchanged — those are project-local contexts where the
short name is unambiguous. Follows Fedora subpackage-style naming
except with a vendor prefix rather than a parent-package prefix,
because neuron is an independent package from cortex (installed on
different hosts) and neither depends on the other.
Changes:
- neuron.spec -> helexa-neuron.spec (git rename)
- Name: neuron -> helexa-neuron (with comment explaining why)
- CI: srpm-neuron job now builds helexa-neuron-VERSION.tar.gz with the
matching top-level dir prefix, publishes to helexa/helexa-neuron COPR
- CI: bump-version job references helexa-neuron.spec
- CLAUDE.md: install instructions updated
Old helexa/neuron COPR project can be deleted after the first
helexa/helexa-neuron build lands.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously the srpm-* jobs generated a fresh %changelog entry and
shipped it to COPR, but the version-stamped spec pushed back to main
by the bump-version job only updated the Version: line — not the
%changelog section. The result: SRPM and in-tree spec diverged and
a fresh clone of the repo showed a perpetually empty changelog.
Run the rpm-changelog action in bump-version too. Now the committed
specs track the SRPMs: each release leaves a dated %changelog entry
in main covering commits since the previous tag, visible in git log
and in the repo's spec browser.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the local .gitea/scripts/generate-rpm-changelog.sh with the
shared composite action at https://git.lair.cafe/actions/rpm-changelog@v1.
Behaviour is identical — collect commits since the previous v* tag,
filter bump-version and merge noise, prepend a dated entry to the
spec — but the logic now lives in one place that other projects can
consume.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On every tag push, build a %changelog entry from the git log since
the previous v* tag and prepend it to each spec. Stops the initial
entry from drifting further and catches bogus-date / stale-version
warnings automatically since the generated date always matches the
day the CI runs.
The generator drops "chore: bump version" commits (bot-authored,
noisy in user-facing changelogs) and merge commits. Author defaults
to the gitea-actions identity but can be overridden via
CHANGELOG_AUTHOR env var if a human release is desired.
Requires fetch-depth: 0 on checkout so git describe can see prior
tags and git log can reach them.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the in-repo .gitea/scripts/copr-build.sh and per-job
copr-cli configuration with the shared composite action at
https://git.lair.cafe/actions/copr-publish@v1. Behaviour is
identical — submit, watch, dump per-chroot logs — but the logic
now lives in a single place that other projects can consume.
Removes the actions/checkout step from both COPR jobs since the
build script is no longer local to this repo.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously the COPR publish steps only surfaced copr-cli's status
updates (pending/importing/running). When a build failed, diagnosing
required clicking through to the COPR web UI. Now we submit with
--nowait, watch the build, then use copr-cli download-build to fetch
each chroot's builder-live.log and cat them as collapsible ::group::
blocks in the CI output.
Logic is factored into .gitea/scripts/copr-build.sh so cortex and
neuron jobs share it. Both COPR jobs now check out the repo to access
the script.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three complementary tweaks to close the gap sccache alone can't:
- CARGO_INCREMENTAL=0: reclaims the 17 incremental-mode cache misses
per run and prevents cargo from writing incremental fingerprints
that defeat sccache. Incremental mode is useless in CI anyway since
each run starts from scratch.
- actions/cache for ~/.cargo and target/: sidesteps sccache's
structural limits (proc-macro non-cacheables, clippy-vs-rustc
separate namespaces) by caching the whole build output keyed on
Cargo.lock. Also caches ~/.cargo/bin so the installed sccache
binary survives between runs.
- Drop the separate 'cargo build' step: 'cargo test --workspace'
builds everything anyway, so the standalone build was a full
redundant workspace compile pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The workflow-level env set RUSTC_WRAPPER=sccache for every step,
including the install step itself. cargo install sccache then
tried to invoke `sccache rustc -vV` to detect the toolchain before
sccache existed on PATH, failing with "No such file or directory".
Override RUSTC_WRAPPER to empty on the install step so cargo uses
rustc directly; subsequent steps still inherit the wrapper.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The distro sccache package lacks S3 support. Install from cargo
with --features s3 if the existing binary can't connect to the
S3 backend. Skips install if already present and working.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All Rust compilation steps now use sccache backed by MinIO S3
at caveman.kosherinata.internal:9000. Credentials via repo secrets
SCCACHE_S3_ACCESS_KEY and SCCACHE_S3_SECRET_KEY. Cache is shared
across all bare metal runners.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Token is only needed for the authenticated push, not the public
checkout. Set remote URL with token inline before pushing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cortex.spec: gateway binary, cortex.service systemd unit,
cortex.toml + models.toml config files
- neuron.spec: neuron binary, neuron.service systemd unit,
neuron.toml config file
- Parallel CI: srpm-cortex and srpm-neuron jobs build SRPMs
concurrently, then publish to separate COPR repos
(helexa/cortex and helexa/neuron)
- bump-version job: after both COPR publishes succeed, stamps
tag version into Cargo.toml, specs, Cargo.lock and pushes
to main via GITEA_TOKEN
- Shared cortex user/group across both packages
- Example configs: cortex.example.toml, neuron.example.toml,
models.example.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cortex.spec: gateway binary, cortex.service systemd unit,
cortex.toml + models.toml config files
- neuron.spec: neuron binary, neuron.service systemd unit,
neuron.toml config file
- Parallel CI: srpm-cortex and srpm-neuron jobs build SRPMs
concurrently, then publish to separate COPR repos
(helexa/cortex and helexa/neuron)
- Shared cortex user/group across both packages
- Example configs: cortex.example.toml, neuron.example.toml,
models.example.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add .gitea/workflows/ci.yml with fmt/clippy/test on all branches
and SRPM build + COPR publish on version tags
- Add cortex.spec for Fedora RPM packaging
- Add GPL-3.0-or-later LICENSE file
- Add cortex.example.toml with generic hostnames; gitignore cortex.toml
- Scrub infrastructure-specific hostnames from README.md, CLAUDE.md,
and doc comments
- Fix unused imports and clippy warnings to pass -D warnings
- Fix missing deps (bytes, reqwest, serde_json) exposed during build
- Run cargo fmt across workspace
- Update SPDX license identifier to GPL-3.0-or-later
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>