Stage 1 of the candle-native pivot. Replaces the external-process
harness model (mistralrs over HTTP, llamacpp placeholder) with an
in-process Harness trait whose sole implementation is candle. The
trait keeps its shape so future engines slot in additively, but
start/stop default to no-ops and HarnessConfig drops endpoint and
systemd_unit since no harness needs external supervision.
Behaviour is unchanged on the wire: load_model returns a "not
implemented yet (Stage 2)" error and list_models is empty. The
gateway-side proxy, poller, and router are untouched.
CLAUDE.md Phase 11 (llama.cpp) and Phase 12 (mistral.rs COPR) are
marked superseded; the staged plan lives in
~/.claude/plans/create-a-more-aggressive-calm-naur.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous defaults collided with well-trodden infra services and with
the Linux ephemeral port range:
- cortex API 8000 — common dev-server default (Django, minio UI)
- cortex metrics 9100 — Prometheus node_exporter default
- neuron API 9090 — Cockpit default on Fedora, Prometheus self
Move to helexa-themed palindromic ports, all below Linux's
32768-60999 ephemeral range and not registered to any well-known
service:
- cortex API 31313
- cortex metrics 31314
- neuron API 13131
Updated places:
- cortex.example.toml, neuron.example.toml defaults
- default impls in cortex-core and neuron config
- cortex-cli --endpoint default for the status subcommand
- doc comments citing example URLs
- README.md and CLAUDE.md snippets
Consumers already on the old ports need a one-line edit in their
/etc/cortex/cortex.toml or /etc/neuron/neuron.toml to match;
firewall rules and prometheus scrape configs will also need
updating.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidates the previous helexa/cortex and helexa/helexa-neuron COPR
projects into one shared project. Hosts enable a single repo and get
access to both packages — cortex for gateway hosts and helexa-neuron
for GPU nodes. Reduces the "which copr do I enable on this host"
friction, and makes it clear the two packages are parts of the same
helexa project suite.
CI keeps two independent publish jobs (copr-cortex and copr-neuron)
running in parallel; they now both target helexa/helexa with their
respective SRPMs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fedora's official repos ship a package named `neuron` — the NEURON
neural-simulation environment from Yale (see
https://src.fedoraproject.org/rpms/neuron). Having our own `neuron`
in the helexa COPR caused dnf5 to silently no-op `dnf install neuron`
because of the name collision, even with the COPR repo enabled and
keys imported. The only workarounds were full NEVRA (`dnf install
neuron-0.1.12-1.fc43.x86_64`) or a local file install — neither
acceptable for end-users.
Rename the RPM package to `helexa-neuron`. Keep binary (/usr/bin/neuron),
systemd unit (neuron.service), system user (neuron), and config dir
(/etc/neuron) unchanged — those are project-local contexts where the
short name is unambiguous. Follows Fedora subpackage-style naming
except with a vendor prefix rather than a parent-package prefix,
because neuron is an independent package from cortex (installed on
different hosts) and neither depends on the other.
Changes:
- neuron.spec -> helexa-neuron.spec (git rename)
- Name: neuron -> helexa-neuron (with comment explaining why)
- CI: srpm-neuron job now builds helexa-neuron-VERSION.tar.gz with the
matching top-level dir prefix, publishes to helexa/helexa-neuron COPR
- CI: bump-version job references helexa-neuron.spec
- CLAUDE.md: install instructions updated
Old helexa/neuron COPR project can be deleted after the first
helexa/helexa-neuron build lands.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cortex.spec: gateway binary, cortex.service systemd unit,
cortex.toml + models.toml config files
- neuron.spec: neuron binary, neuron.service systemd unit,
neuron.toml config file
- Parallel CI: srpm-cortex and srpm-neuron jobs build SRPMs
concurrently, then publish to separate COPR repos
(helexa/cortex and helexa/neuron)
- Shared cortex user/group across both packages
- Example configs: cortex.example.toml, neuron.example.toml,
models.example.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Package name, lib name, and binary all now just "neuron" without
the cortex- prefix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace NodeConfig (static vram_mb, pinned) with NeuronEndpoint.
Hardware discovery and model pinning now come from neuron API and
models.toml catalogue respectively.
- config.rs: nodes -> neurons, add models_config path
- catalogue.rs: ModelProfile with pinned_on, ModelCatalogue
- poller.rs: poll neuron GET /models (ModelInfo format)
- router.rs: resolve inference endpoint via neuron GET /models/{id}/endpoint
- evictor.rs: call neuron POST /models/unload
- node.rs: remove vram_mb, pinned fields (come from discovery/catalogue)
- All 22 gateway tests updated to mock neuron API
- Remove MistralModelsResponse, ModelLifecycleRequest (no longer needed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MistralRsHarness: Harness trait impl wrapping mistral.rs HTTP API
(list/load/unload models, health check, start/stop via systemd)
- HarnessRegistry: maps harness name -> Box<dyn Harness>, built from
neuron.toml config
- Neuron API endpoints: GET /models, POST /models/load,
POST /models/unload, GET /models/:id/endpoint
- NeuronConfig: figment-based config loading from neuron.toml
- Integration test: full model lifecycle through mock mistral.rs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Emit cortex_requests_total, cortex_request_duration_seconds,
cortex_request_errors_total, and cortex_cold_starts_total with
model and node labels on every proxied request.
Add install_test_recorder() for testing metrics without HTTP listener.
Integration test verifies counters and histograms appear after proxy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire up openai_to_anthropic in the /v1/messages handler: buffer
upstream OpenAI response, parse, translate to Anthropic format
(stop_reason mapping, usage field names, content blocks).
5 integration tests covering round-trip translation, system prompt,
content blocks, and error cases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract public poll_once() from poll_loop() for testability.
4 tests proving the poller correctly discovers models, updates
gateway state, marks unreachable nodes unhealthy, and prunes
stale models.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Confirms the existing proxy streams SSE chunks incrementally:
- 5-chunk test with 50ms delays verifies time spread between first
and last chunk arrival (not buffered)
- Verifies data: [DONE] terminator is forwarded
No src/ changes needed — Body::from_stream(bytes_stream()) already
handles SSE correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 tests proving the scaffold works end-to-end:
- chat completion proxied through gateway to mock backend
- /health endpoint with healthy node
- /v1/models returns seeded model list
- 404 for unknown model
- 404 when no healthy nodes available
- 400 when request body missing model field
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>