cortex

helexa/cortex

Fork 0

Commit Graph

Author	SHA1	Message	Date
rob thijssen	800498f530	feat(neuron): bind listener before pre-warm, surface activation in /health Some checks failed build-prerelease / Resolve version stamps (push) Successful in 33s Details CI / Format (push) Successful in 41s Details CI / Clippy (push) Successful in 2m26s Details build-prerelease / Build neuron-blackwell (push) Successful in 3m34s Details CI / Test (push) Successful in 4m44s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details build-prerelease / Build cortex binary (push) Successful in 4m29s Details build-prerelease / Package cortex RPM (push) Successful in 1m23s Details build-prerelease / Build neuron-ada (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled Details build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled Details build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled Details build-prerelease / Build neuron-ampere (push) Has been cancelled Details Two coupled changes addressing the 2026-05-26 validate-neuron failure where a fresh deploy of beast had /health unreachable for ~5 minutes while Qwen3.6-27B q5k materialised, even though systemd reported the unit as active. 1. main.rs no longer awaits load_default_models before binding axum. The listener binds first; pre-warm runs in a spawned background task that holds a read lock on the harness registry for the duration of its sequential load loop. Concurrent on-demand /models/load and /v1/chat/completions traffic still flow. 2. /health gains an `activation` field carrying: state pre_warming \| ready pending model ids queued but not started in_progress model id currently loading (Option) completed model ids loaded successfully this activation failed [{model_id, error}] for failed entries The field is `#[serde(default)]` so a pre-change cortex polling a new neuron — or vice versa — keeps working. `ActivationTracker` (new module `neuron::activation`) owns the RwLock-wrapped state; load_default_models takes a tracker reference and updates it per-model. NeuronState holds an Arc clone for the /health handler. Tests updated to construct trackers and assert state transitions (empty noop, two failures → ready with both in `failed`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 15:18:04 +03:00
rob thijssen	6779b7526a	feat(neuron): load default_models on service activation All checks were successful CI / Format (push) Successful in 34s Details CI / Clippy (push) Successful in 2m13s Details CI / Test (push) Successful in 4m6s Details CI / Build cortex SRPM (push) Has been skipped Details CI / Build neuron SRPM (push) Has been skipped Details CI / Publish cortex to COPR (push) Has been skipped Details CI / Publish neuron to COPR (push) Has been skipped Details CI / Bump version in source (push) Has been skipped Details Stage 5 of the candle-native pivot. Adds first-class support for auto-loading a configured set of models when the neuron service activates. Config: - NeuronConfig.default_models: Vec<ModelSpec> (defaults to []). - neuron.example.toml ships a commented [[default_models]] example. Activation flow (crates/neuron/src/startup.rs::load_default_models): - Sequential — VRAM contention makes parallel loads risky. - Per-entry timing logged at info level on success. - Failures logged as warnings; the next entry is still attempted. - An empty list short-circuits without log noise. Called from main.rs after the registry is built and before the axum listener binds, so /models reflects the loaded state from the very first request. data/neuron.service gains TimeoutStartSec=1800s. With activation blocked on potentially slow first-time HF downloads + GGUF materialisation, systemd's default 90s would kill larger model loads mid-flight. Two non-gated tests in tests/activation.rs cover the continues-past-failure and empty-list paths using a synthetically unknown harness name to fail loads fast without touching the network. The cuda-integration test from earlier stages still exercises the real load/unload lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:56:08 +03:00

Author

SHA1

Message

Date

rob thijssen

800498f530

feat(neuron): bind listener before pre-warm, surface activation in /health

build-prerelease / Resolve version stamps (push) Successful in 33s

Details

CI / Format (push) Successful in 41s

Details

CI / Clippy (push) Successful in 2m26s

Details

build-prerelease / Build neuron-blackwell (push) Successful in 3m34s

Details

CI / Test (push) Successful in 4m44s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

build-prerelease / Build cortex binary (push) Successful in 4m29s

Details

build-prerelease / Package cortex RPM (push) Successful in 1m23s

Details

build-prerelease / Build neuron-ada (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled

Details

build-prerelease / Build neuron-ampere (push) Has been cancelled

Details

Two coupled changes addressing the 2026-05-26 validate-neuron failure
where a fresh deploy of beast had /health unreachable for ~5 minutes
while Qwen3.6-27B q5k materialised, even though systemd reported the
unit as active.

1. main.rs no longer awaits load_default_models before binding axum.
   The listener binds first; pre-warm runs in a spawned background
   task that holds a read lock on the harness registry for the
   duration of its sequential load loop. Concurrent on-demand
   /models/load and /v1/chat/completions traffic still flow.

2. /health gains an `activation` field carrying:
     state         pre_warming | ready
     pending       model ids queued but not started
     in_progress   model id currently loading (Option)
     completed     model ids loaded successfully this activation
     failed        [{model_id, error}] for failed entries
   The field is `#[serde(default)]` so a pre-change cortex polling a
   new neuron — or vice versa — keeps working.

`ActivationTracker` (new module `neuron::activation`) owns the
RwLock-wrapped state; load_default_models takes a tracker reference
and updates it per-model. NeuronState holds an Arc clone for the
/health handler.

Tests updated to construct trackers and assert state transitions
(empty noop, two failures → ready with both in `failed`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-26 15:18:04 +03:00

rob thijssen

6779b7526a

feat(neuron): load default_models on service activation

CI / Format (push) Successful in 34s

Details

CI / Clippy (push) Successful in 2m13s

Details

CI / Test (push) Successful in 4m6s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

Stage 5 of the candle-native pivot. Adds first-class support for
auto-loading a configured set of models when the neuron service
activates.

Config:
- NeuronConfig.default_models: Vec<ModelSpec> (defaults to []).
- neuron.example.toml ships a commented [[default_models]] example.

Activation flow (crates/neuron/src/startup.rs::load_default_models):
- Sequential — VRAM contention makes parallel loads risky.
- Per-entry timing logged at info level on success.
- Failures logged as warnings; the next entry is still attempted.
- An empty list short-circuits without log noise.

Called from main.rs after the registry is built and before the axum
listener binds, so /models reflects the loaded state from the very
first request.

data/neuron.service gains TimeoutStartSec=1800s. With activation
blocked on potentially slow first-time HF downloads + GGUF
materialisation, systemd's default 90s would kill larger model loads
mid-flight.

Two non-gated tests in tests/activation.rs cover the
continues-past-failure and empty-list paths using a synthetically
unknown harness name to fail loads fast without touching the network.
The cuda-integration test from earlier stages still exercises the
real load/unload lifecycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 17:56:08 +03:00

2 Commits