cortex

helexa/cortex

Fork 0

Files

History

rob thijssen f72dee094f

build-prerelease / Package cortex RPM (push) Blocked by required conditions

Details

build-prerelease / Resolve version stamps (push) Successful in 35s

Details

CI / Format (push) Successful in 37s

Details

CI / Clippy (push) Successful in 2m12s

Details

CI / Test (push) Successful in 5m3s

Details

build-prerelease / Build neuron-blackwell (push) Successful in 3m39s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

build-prerelease / Build cortex binary (push) Successful in 5m7s

Details

build-prerelease / Build neuron-ada (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Has been cancelled

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Has been cancelled

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Has been cancelled

Details

build-prerelease / Build neuron-ampere (push) Has been cancelled

Details

feat(tp): Stage 7c-i — streaming SSE through TP

`chat_completion_stream` no longer returns an error for TP loads. The
new `chat_completion_tp_stream` mirrors the non-streaming TP path
(clear_kv_cache, prefill, sample, decode loop) but emits one
`ChatCompletionChunk` per generated token over an mpsc channel so the
handler can write a streaming SSE response.

Unlike the single-GPU streaming path (which runs candle's forward
inside `spawn_blocking` and uses `blocking_send`), the TP loop is
itself async — every `pool.generate_step` already awaits the leader's
own spawn_blocking forward plus every worker's recv_only. So the
orchestration runs as a plain `tokio::spawn` task using `Sender::send`.

The shared `emit_chunk` helper tracks the cumulative decoded prefix and
emits the delta — same UTF-8-safe BPE boundary handling as the
single-GPU streaming path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 07:32:46 +03:00

src

feat(tp): Stage 7c-i — streaming SSE through TP

2026-05-20 07:32:46 +03:00

tests

Stage 7a-ii: real NCCL handshake behind the worker pool

2026-05-19 16:40:01 +03:00

Cargo.toml

fix(tp): add half dep + drop double-wrapped .w() on CudaDevice::alloc

2026-05-19 19:11:59 +03:00