helexa

helexa/helexa

Fork 0

Files

History

rob thijssen 4b28a64b34

CI / Format (push) Successful in 39s

Details

CI / CUDA type-check (push) Successful in 1m38s

Details

CI / Clippy (push) Successful in 2m19s

Details

CI / Test (push) Successful in 4m17s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

build-prerelease / Resolve version stamps + change detection (push) Successful in 31s

Details

build-prerelease / Lint (fmt + clippy) (push) Successful in 2m14s

Details

build-prerelease / Build neuron-blackwell (push) Successful in 1m42s

Details

build-prerelease / Build neuron-ada (push) Successful in 2m15s

Details

build-prerelease / Build neuron-ampere (push) Successful in 2m17s

Details

build-prerelease / Build helexa-bench binary (push) Successful in 2m23s

Details

build-prerelease / Build cortex binary (push) Successful in 2m29s

Details

build-prerelease / Test (push) Successful in 4m28s

Details

build-prerelease / Package cortex RPM (push) Successful in 1m15s

Details

build-prerelease / Package helexa-bench RPM (push) Successful in 1m17s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 1m41s

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 1m40s

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 1m45s

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 51s

Details

feat(#67 phase 5b): enforce the derived input as the prompt cap

The request path now rejects prompts above the model's self-derived input
budget, not the static NEURON_MAX_PROMPT_TOKENS — so a VRAM-tight host
(where the VRAM ceiling binds below the static cap) rejects an
over-budget prompt up front instead of accepting it and OOMing
mid-prefill.

- derived_input_cap: AtomicUsize on LoadedModel + TpLoadedModel; refreshed
  by LoadedHandle::derived_limit (runs on every /models poll). 0 = not
  derived yet.
- effective_prompt_cap(): cached derived input when >0, else the static
  max_prompt_tokens() (cold-start / no-profile fallback).
- validate_request takes the cap as a param; all 4 call sites
  (chat_completion, inference_stream, inference_tp_stream, TP
  chat_completion) pass the in-scope model's effective_prompt_cap().
- doc/context-limits.md: enforcement note updated from "remaining" to
  landed.

Reads the cap lock-free from the sync validate path (no per-request VRAM
query); the cap tracks live state via the poll-driven derivation. With
this, advertise and enforce agree and both track the resident model.

fmt/clippy/test green; CUDA paths type-checked in CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-17 14:26:37 +03:00

cortex-cli

feat(neuron): OpenAI-compatible non-streaming chat completion

2026-05-18 16:47:58 +03:00

cortex-core

feat(#67 phase 4): advertise neuron-computed limit on /models; drop catalogue override

2026-06-17 14:10:20 +03:00

cortex-gateway

feat(#67 phase 4): advertise neuron-computed limit on /models; drop catalogue override

2026-06-17 14:10:20 +03:00

helexa-acp

chore: rename repo cortex -> helexa

2026-06-12 10:54:01 +03:00

helexa-bench

feat(cortex): per-model limit + cost on /v1/models; remove max_model_len

2026-06-17 09:26:55 +03:00

neuron

feat(#67 phase 5b): enforce the derived input as the prompt cap

2026-06-17 14:26:37 +03:00