helexa

helexa/helexa

Fork 0

Files

History

rob thijssen 8f9e956d17

build-prerelease / Resolve version stamps + change detection (push) Successful in 33s

Details

build-prerelease / Build cortex binary (push) Has been skipped

Details

build-prerelease / Build helexa-bench binary (push) Has been skipped

Details

build-prerelease / Package cortex RPM (push) Has been skipped

Details

build-prerelease / Package helexa-bench RPM (push) Has been skipped

Details

build-prerelease / Build neuron-blackwell (push) Successful in 1m44s

Details

build-prerelease / Build neuron-ada (push) Successful in 2m14s

Details

build-prerelease / Lint (fmt + clippy) (push) Successful in 2m16s

Details

build-prerelease / Build neuron-ampere (push) Successful in 2m55s

Details

build-prerelease / Test (push) Successful in 4m24s

Details

build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 1m41s

Details

build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 1m43s

Details

build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 1m45s

Details

build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 53s

Details

fix(neuron): emit OpenAI-standard nested error envelopes (#60 )

InferenceError responses were a flat `{"error": "..."}` string. OpenAI
clients (opencode, the openai SDK) reach into `error.type`/`error.code`
to drive behaviour — most importantly `code == "context_length_exceeded"`
triggers auto-compaction + retry instead of a hard failure. A flat string
is invisible to that logic.

Rewrite `inference_error_response` to emit the nested envelope
`{"error": {"message","type","code","param", ...diagnostics}}` and map:

- ModelNotLoaded   → 404 invalid_request_error / model_not_found
- PromptTooLong    → 400 invalid_request_error / context_length_exceeded
  (message: "maximum context length is N tokens", + prompt_len/max)
- InsufficientVram → 503 api_error / insufficient_vram
- VisionUnsupported→ 400 invalid_request_error / vision_unsupported
- TemplateRenderFailed → 422 invalid_request_error / template_render_failed
- Other            → 500 api_error / null code

Diagnostic extras ride inside the error object so the envelope shape is
stable. Both inline match blocks in the chat-completions handler
(streaming + non-streaming) now defer to the shared helper, which the
responses handler already used — one source of truth.

Adds 4 unit tests covering the envelope shape and codes. Also fixes a
pre-existing clippy lint (cloned_ref_to_slice_refs) in qwen3_5 snapshot
test surfaced by a newer clippy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-15 20:42:14 +03:00

cortex-cli

feat(neuron): OpenAI-compatible non-streaming chat completion

2026-05-18 16:47:58 +03:00

cortex-core

feat: advertise max_model_len on /v1/models so clients can compact

2026-06-15 19:11:13 +03:00

cortex-gateway

feat(neuron): emit usage on the streaming path so clients can track context

2026-06-15 19:43:59 +03:00

helexa-acp

chore: rename repo cortex -> helexa

2026-06-12 10:54:01 +03:00

helexa-bench

feat(bench): show GPUs as the resource name instead of hostnames

2026-06-14 16:29:13 +03:00

neuron

fix(neuron): emit OpenAI-standard nested error envelopes (#60 )

2026-06-15 20:42:14 +03:00