cortex

helexa/cortex

Fork 0

Files

History

rob thijssen ed2d09864e

CI / Format (push) Successful in 30s

Details

CI / Clippy (push) Successful in 2m51s

Details

CI / Test (push) Successful in 5m52s

Details

CI / CUDA type-check (push) Failing after 50s

Details

CI / Build cortex SRPM (push) Has been skipped

Details

CI / Publish cortex to COPR (push) Has been skipped

Details

CI / Build neuron SRPM (push) Has been skipped

Details

CI / Publish neuron to COPR (push) Has been skipped

Details

CI / Bump version in source (push) Has been skipped

Details

feat(neuron): TP-vision Stage 3 — wire TP chat + stream vision prefill

End-to-end TP-vision: an image request to a TP-loaded Qwen3.6-27B now
conditions on the image across both ranks.

- TpLoadedModel carries has_vision / image_token_id / lm_tokens_per_image,
  populated at load via the shared VisionMeta::from_config_path (same
  config.json the shards loaded from; Stage 1 materialises the replicated
  tower on every rank).
- LoadedHandle::capabilities() now advertises "vision" for TP loads with
  a tower (cortex-gateway already unions this into /v1/models via C3).
- The TP rejection guards (chat_completion_tp + inference_tp_stream) are
  now conditional on !has_vision — text-only TP models still 400 cleanly,
  vision-capable ones fall through.
- chat_completion_tp_inner and the streaming orchestration task detect
  images (request_has_images), expand <|image_pad|> to the per-image
  patch count, and run a single-shot generate_step_with_images prefill
  (every rank encodes + splices its replicated tower) before the
  unchanged decode loop. Text requests keep chunked_prefill_tp.
- extract_image_data_uris ships the source data URIs to every rank for
  identical per-rank preprocessing.

prompt_tokens now reflects the patch expansion, so usage accounting and
KV offsets match the single-GPU baseline.

TP entry points are cuda-gated (validated by CI's CUDA type-check);
capabilities() + extract_image_data_uris + VisionMeta reuse compile on
the non-cuda build. Full workspace test green.

Refs TP-vision plan Stage 3. Implements #12.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-04 15:14:44 +03:00

cortex-cli

feat(neuron): OpenAI-compatible non-streaming chat completion

2026-05-18 16:47:58 +03:00

cortex-core

feat(cortex-gateway): C3 — propagate vision capabilities through /v1/models

2026-06-04 13:49:54 +03:00

cortex-gateway

feat(cortex-gateway): C3 — propagate vision capabilities through /v1/models

2026-06-04 13:49:54 +03:00

helexa-acp

feat(neuron): strip reasoning from chat completions by default

2026-05-31 17:55:04 +03:00

neuron

feat(neuron): TP-vision Stage 3 — wire TP chat + stream vision prefill

2026-06-04 15:14:44 +03:00