cortex

helexa/cortex

Fork 0

Files

History

rob thijssen 34f9b77d9d

build-prerelease / Resolve version stamps (push) Successful in 37s

Details

CI / Format (push) Successful in 41s

Details

CI / Clippy (push) Successful in 2m20s

Details

CI / Test (push) Successful in 4m40s

Details

build-prerelease / Build cortex binary (push) Successful in 4m20s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

build-prerelease / Build neuron-blackwell (push) Successful in 3m58s

Details

build-prerelease / Build neuron-ampere (push) Successful in 5m14s

Details

build-prerelease / Package cortex RPM (push) Successful in 9m25s

Details

build-prerelease / Build neuron-ada (push) Successful in 5m12s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m56s

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m55s

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s

Details

feat(stage-8e-2d): route quantized matmul by M (prefill vs decode)

MaybeQuantLinear::forward picks between two QMatMul paths:

- M > 8 (prefill): QMatMul::forward_via_f16 dequantises the weight
  once into f16 and runs a real cuBLAS-backed GEMM. The dequant cost
  is fixed per call, so it's amortised across the M tokens.
- M <= 8 (decode): QMatMul::forward uses candle's GGUF GEMV kernel
  on the quantized blocks directly. Requires f32 inputs so we still
  cast in/out at the boundary in that arm.

Earlier 8e-2c sent everything through the GGUF GEMV kernel, which
is excellent at GEMV (decode) but doesn't have a real batched GEMM
path — prefill regressed ~4x. This restores prefill to roughly the
bf16 cuBLAS GEMM throughput while keeping the decode gain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-21 21:15:32 +03:00

cortex-cli

feat(neuron): OpenAI-compatible non-streaming chat completion

2026-05-18 16:47:58 +03:00

cortex-core

feat(cortex): unified /v1/models — catalogue × topology feasibility + cold-load

2026-05-20 07:39:04 +03:00

cortex-gateway

feat(cortex): unified /v1/models — catalogue × topology feasibility + cold-load

2026-05-20 07:39:04 +03:00

neuron

feat(stage-8e-2d): route quantized matmul by M (prefill vs decode)

2026-05-21 21:15:32 +03:00