fix: log R1 thinking, catch repeated DSL errors, add unsupported indicators

Three improvements from the 2026-03-09T18:45:04 run analysis: **R1 thinking visibility (claude.rs, agent.rs)** extract_think_content() returns the raw <think> block content before it is stripped. agent.rs logs it at DEBUG level so 'RUST_LOG=debug' lets you see why the model keeps repeating a mistake — currently the think block is silently discarded after stripping. **Prompt: unsupported indicators and bollinger_upper Expr mistake (prompts.rs)** - bollinger_upper / bollinger_lower used as {"kind":"bollinger_upper",...} was the dominant failure in iters 9-15. Added explicit correction: use {"kind":"func","name":"bollinger_upper","period":20} in Expr context, never as a standalone kind. - roc, hma, vwap, macd, cci, stoch are NOT in the swym schema. Added a clear "NOT supported" list alongside the supported func names. **Repeated API error detection in diagnose_history (agent.rs)** If the same "unknown variant `X`" error appears 2+ times in the last 4 iterations, a targeted diagnosis note is emitted naming the bad variant and pointing to the DSL reference. This surfaces in the next iteration prompt so the model gets actionable feedback before it wastes another backtest budget on the same mistake. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 18:58:50 +02:00
parent 51e452b607
commit 55e41b6795
3 changed files with 62 additions and 0 deletions
--- a/src/agent.rs
+++ b/src/agent.rs
@@ -265,6 +265,12 @@ pub async fn run(cli: &Cli) -> Result<()> {
            content: response_text.clone(),
        });

+        // Log R1 reasoning chain at debug level so it can be inspected when
+        // the model makes repeated DSL mistakes (run with RUST_LOG=debug).
+        if let Some(thinking) = claude::extract_think_content(&response_text) {
+            debug!("R1 thinking ({} chars):\n{}", thinking.len(), thinking);
+        }
+
        // Extract strategy JSON
        let strategy = match claude::extract_json(&response_text) {
            Ok(s) => s,
@@ -665,6 +671,48 @@ pub fn diagnose_history(history: &[IterationRecord]) -> (String, bool) {
        }
    }

+    // --- Repeated API error detection ---
+    // If the same DSL error variant has appeared in 2+ consecutive iterations,
+    // call it out explicitly so the model knows exactly what to fix.
+    {
+        let recent_errors: Vec<String> = history
+            .iter()
+            .rev()
+            .take(4)
+            .flat_map(|rec| rec.results.iter())
+            .filter_map(|r| r.error_message.as_deref())
+            .filter(|e| e.contains("unknown variant"))
+            .map(|e| {
+                // Extract the variant name: "unknown variant `foo`, expected ..."
+                e.split('`')
+                    .nth(1)
+                    .unwrap_or(e)
+                    .to_string()
+            })
+            .collect();
+
+        if recent_errors.len() >= 2 {
+            // Find the most frequent bad variant
+            let mut counts: std::collections::HashMap<&str, usize> = std::collections::HashMap::new();
+            for v in &recent_errors {
+                *counts.entry(v.as_str()).or_default() += 1;
+            }
+            if let Some((bad_variant, count)) = counts.into_iter().max_by_key(|(_, c)| *c) {
+                if count >= 2 {
+                    notes.push(format!(
+                        "⚠ DSL ERROR (repeated {count}×): the swym API rejected \
+                         `{bad_variant}` as an unknown variant. \
+                         Check the 'Critical: expression kinds' section — \
+                         `{bad_variant}` may be a FuncName (use inside \
+                         {{\"kind\":\"func\",\"name\":\"{bad_variant}\",...}}) \
+                         or it may not be supported at all. \
+                         Use ONLY the documented kinds and func names."
+                    ));
+                }
+            }
+        }
+    }
+
    // --- Zero-trade check ---
    let zero_trade_iters = history
        .iter()
--- a/src/claude.rs
+++ b/src/claude.rs
@@ -213,6 +213,14 @@ fn lmstudio_context_length(json: &Value, model_id: &str) -> Option<u32> {
    None
 }

+/// Return the content of the first `<think>` block, if any.
+/// Used for debug logging of R1 reasoning chains.
+pub fn extract_think_content(text: &str) -> Option<String> {
+    let start = text.find("<think>")? + "<think>".len();
+    let end = text[start..].find("</think>").map(|i| start + i)?;
+    Some(text[start..end].trim().to_string())
+}
+
 /// Extract a JSON object from a model response text.
 /// Handles markdown code fences and R1-style `<think>...</think>` blocks.
 pub fn extract_json(text: &str) -> Result<Value> {
--- a/src/prompts.rs
+++ b/src/prompts.rs
@@ -151,6 +151,12 @@ Common mistakes to NEVER make:
  `bollinger_upper`, `bollinger_lower`.
 - `volume` is a candle FIELD, not a func name. Access it as `{{"kind":"field","field":"volume"}}`.
  To compute EMA of volume: `{{"kind":"apply_func","name":"ema","period":20,"expr":{{"kind":"field","field":"volume"}}}}`.
+- `bollinger_upper` and `bollinger_lower` are FUNC NAMES, not Expr kinds. To compare close to the upper band:
+  `{{"kind":"compare","left":{{"kind":"field","field":"close"}},"op":">","right":{{"kind":"func","name":"bollinger_upper","period":20}}}}`
+  NEVER write `{{"kind":"bollinger_upper",...}}` — `bollinger_upper` is not an Expr kind.
+- `roc` (rate of change), `hma` (Hull MA), `vwap`, `macd`, `cci`, `stoch` are NOT supported.
+  Use `sma`, `ema`, `wma`, `rsi`, `atr`, `adx`, `supertrend`, `std_dev`, `sum`, `highest`, `lowest`,
+  `bollinger_upper`, `bollinger_lower` only.

 ## Working examples