fix: increase max_tokens to 8192 for R1 reasoning overhead
R1 models use 500-2000 tokens for <think> blocks before the final response. 4096 was too tight — the model would exhaust the budget mid-thought and never emit the JSON. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -63,7 +63,7 @@ impl ClaudeClient {
|
|||||||
) -> Result<(String, Option<Usage>)> {
|
) -> Result<(String, Option<Usage>)> {
|
||||||
let body = MessagesRequest {
|
let body = MessagesRequest {
|
||||||
model: self.model.clone(),
|
model: self.model.clone(),
|
||||||
max_tokens: 4096,
|
max_tokens: 8192,
|
||||||
system: system.to_string(),
|
system: system.to_string(),
|
||||||
messages: messages.to_vec(),
|
messages: messages.to_vec(),
|
||||||
};
|
};
|
||||||
|
|||||||
Reference in New Issue
Block a user