cortex

helexa/cortex

Fork 0

Files

History

rob thijssen 05dc0bad18

CI / Clippy (push) Waiting to run

Details

CI / Test (push) Waiting to run

Details

build-prerelease / Resolve version stamps (push) Successful in 37s

Details

CI / Format (push) Successful in 38s

Details

build-prerelease / Build cortex binary (push) Has started running

Details

build-prerelease / Build neuron-blackwell (push) Has been cancelled

Details

build-prerelease / Build neuron-ampere (push) Has been cancelled

Details

build-prerelease / Build neuron-ada (push) Has been cancelled

Details

build-prerelease / Package cortex RPM (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled

Details

CI / Build cortex SRPM (push) Has been cancelled

Details

CI / Build neuron SRPM (push) Has been cancelled

Details

CI / Publish cortex to COPR (push) Has been cancelled

Details

CI / Publish neuron to COPR (push) Has been cancelled

Details

CI / Bump version in source (push) Has been cancelled

Details

feat(stage-8d-3): wire causal_conv1d_update/full CUDA kernels

Replaces the per-layer conv1d + silu sequence in both single-GPU and
TP linear-attention forward paths with a shared run_causal_conv1d
helper that dispatches to:

- causal_conv1d_update for decode (seq_len=1 with existing conv_state)
- causal_conv1d_full for prefill / fresh start (zero-pads internally)

Both kernels fuse the depthwise conv + SiLU into a single launch — 4×
fewer cuda launches per linear-attention layer vs the candle conv1d +
candle_nn::ops::silu combo. Falls back to the original Rust path on
cpu.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-21 11:49:41 +03:00

src

feat(stage-8d-3): wire causal_conv1d_update/full CUDA kernels

2026-05-21 11:49:41 +03:00

tests

Stage 7a-ii: real NCCL handshake behind the worker pool

2026-05-19 16:40:01 +03:00

build.rs

feat(stage-8d-1): import mistralrs GDN CUDA kernels — build infra only

2026-05-21 11:34:11 +03:00

Cargo.toml

feat(stage-8d-1): import mistralrs GDN CUDA kernels — build infra only

2026-05-21 11:34:11 +03:00