chore(neuron): bump default max_tokens from 512 to 8192
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 44s
CI / Format (push) Successful in 45s
CI / Clippy (push) Successful in 2m41s
build-prerelease / Build neuron-blackwell (push) Successful in 5m35s
build-prerelease / Build cortex binary (push) Successful in 4m32s
CI / Test (push) Successful in 5m29s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Package cortex RPM (push) Successful in 1m20s
build-prerelease / Build neuron-ampere (push) Successful in 8m6s
build-prerelease / Build neuron-ada (push) Successful in 5m19s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m55s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m57s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 44s
CI / Format (push) Successful in 45s
CI / Clippy (push) Successful in 2m41s
build-prerelease / Build neuron-blackwell (push) Successful in 5m35s
build-prerelease / Build cortex binary (push) Successful in 4m32s
CI / Test (push) Successful in 5m29s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Package cortex RPM (push) Successful in 1m20s
build-prerelease / Build neuron-ampere (push) Successful in 8m6s
build-prerelease / Build neuron-ada (push) Successful in 5m19s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m55s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m57s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s
512 is too low for any modern coding model — clients that don't explicitly set max_tokens get clipped responses with no diagnostic. Bump the fallback at all four inference call sites (single-GPU streaming + non-streaming, TP leader + non-leader) to 8192, which fits comfortably within Qwen3-class context windows after a typical agent prompt and lines up with what helexa-acp / a0 / curl clients reasonably expect. Clients that explicitly set max_tokens (now including helexa-acp via HELEXA_ACP_MAX_TOKENS / per-endpoint TOML) override this. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1381,7 +1381,7 @@ impl CandleHarness {
|
|||||||
|
|
||||||
let temperature = request.temperature.unwrap_or(0.7);
|
let temperature = request.temperature.unwrap_or(0.7);
|
||||||
let top_p = request.top_p;
|
let top_p = request.top_p;
|
||||||
let max_new = request.max_tokens.unwrap_or(512) as usize;
|
let max_new = request.max_tokens.unwrap_or(8192) as usize;
|
||||||
let seed = unix_subsec_nanos();
|
let seed = unix_subsec_nanos();
|
||||||
|
|
||||||
let eos_id = loaded
|
let eos_id = loaded
|
||||||
@@ -1620,7 +1620,7 @@ impl CandleHarness {
|
|||||||
|
|
||||||
let temperature = request.temperature.unwrap_or(0.7);
|
let temperature = request.temperature.unwrap_or(0.7);
|
||||||
let top_p = request.top_p;
|
let top_p = request.top_p;
|
||||||
let max_new = request.max_tokens.unwrap_or(512) as usize;
|
let max_new = request.max_tokens.unwrap_or(8192) as usize;
|
||||||
let seed = unix_subsec_nanos();
|
let seed = unix_subsec_nanos();
|
||||||
|
|
||||||
let eos_id = loaded
|
let eos_id = loaded
|
||||||
@@ -2264,7 +2264,7 @@ impl CandleHarness {
|
|||||||
|
|
||||||
let temperature = request.temperature.unwrap_or(0.7);
|
let temperature = request.temperature.unwrap_or(0.7);
|
||||||
let top_p = request.top_p;
|
let top_p = request.top_p;
|
||||||
let max_new = request.max_tokens.unwrap_or(512) as usize;
|
let max_new = request.max_tokens.unwrap_or(8192) as usize;
|
||||||
let seed = unix_subsec_nanos();
|
let seed = unix_subsec_nanos();
|
||||||
|
|
||||||
let eos_id = tp
|
let eos_id = tp
|
||||||
@@ -2598,7 +2598,7 @@ async fn chat_completion_tp_inner(
|
|||||||
|
|
||||||
let temperature = request.temperature.unwrap_or(0.7);
|
let temperature = request.temperature.unwrap_or(0.7);
|
||||||
let top_p = request.top_p;
|
let top_p = request.top_p;
|
||||||
let max_new = request.max_tokens.unwrap_or(512) as usize;
|
let max_new = request.max_tokens.unwrap_or(8192) as usize;
|
||||||
let seed = unix_subsec_nanos();
|
let seed = unix_subsec_nanos();
|
||||||
|
|
||||||
let eos_id = tp
|
let eos_id = tp
|
||||||
|
|||||||
Reference in New Issue
Block a user