docs: add CLAUDE.md for future Claude Code instances

Add comprehensive guidance document covering architecture, data flows, development commands, DSL schema reference, and common patterns for working with the scout strategy search agent. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
chore: attempt dedupe guidance in prompt
2026-03-12 05:38:28 +02:00 · 2026-03-11 18:15:24 +02:00 · 2026-03-10 18:40:15 +02:00 · 2026-03-10 18:28:54 +02:00 · 2026-03-10 18:13:06 +02:00 · 2026-03-10 14:21:55 +02:00
8 changed files with 1295 additions and 76 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,116 @@
 # CLAUDE.md
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 ## Project Overview
 `scout` is an autonomous strategy search agent for the [swym](https://swym.rs) backtesting platform. It runs a loop: asks Claude to generate trading strategies → submits backtests to swym → evaluates results → feeds learnings back → repeats. Promising strategies are automatically validated on out-of-sample data to filter overfitting.
 ## Architecture
 ### Core Modules
 - **`agent.rs`** - Main orchestration logic. Contains the `run()` function that implements the search loop, strategy validation, and learning feedback. Key types: `IterationRecord`, `LedgerEntry`, `validate_strategy()`, `diagnose_history()`.
 - **`claude.rs`** - Claude API client. Handles model communication, JSON extraction from responses, and context length detection for R1-family models with thinking blocks.
 - **`swym.rs`** - Swym backtesting API client. Wraps all swym API calls: candle coverage, strategy validation, backtest submission, polling, and metrics retrieval.
 - **`prompts.rs`** - System and user prompts for the LLM. Generates the DSL schema context and iteration-specific prompts with prior results.
 - **`config.rs`** - CLI argument parsing and configuration. Defines `Cli` struct with all command-line flags and environment variables.
 ### Key Data Flows
 1. **Strategy Generation**: `agent::run()` → `claude::chat()` → extracts JSON strategy → validates → submits to swym
 2. **Backtest Execution**: `swym::submit_backtest()` → `swym::poll_until_done()` → `BacktestResult::from_response()`
 3. **Learning Loop**: `load_prior_summary()` reads `run_ledger.jsonl` → fetches metrics via `swym::compare_runs()` → formats compact summary → appends to iteration prompt
 4. **OOS Validation**: Promising in-sample results trigger re-backtest on held-out data → strategies passing both phases saved to `validated_*.json`
 ### Important Patterns
 - **Deduplication**: Strategies are deduplicated by full JSON serialization using a HashMap (`tested_strategies`). Identical strategies are skipped with a warning.
 - **Validation**: Two-stage validation—client-side (structure, quantity parsing, exit rules) and server-side (DSL schema validation via `/strategies/validate`).
 - **Context Management**: Conversation history is trimmed to keep last 6 messages (3 exchanges) to avoid token limits. Prior results are summarized in the next prompt.
 - **Error Recovery**: Consecutive failures (3×) trigger abort. Transient API errors are logged but don't stop the run.
 - **Ledger Persistence**: Each backtest writes a `LedgerEntry` to `run_ledger.jsonl` for cross-run learning. Uses atomic O_APPEND writes.
 ## Development Commands
 ```bash
 # Build
 cargo build
 # Run with default config
 cargo run
 # Run with custom flags
 cargo run -- \
  --swym-url https://dev.swym.hanzalova.internal/api/v1 \
  --max-iterations 50 \
  --instruments binance_spot:BTCUSDC,binance_spot:ETHUSDC
 # Run tests
 cargo test
 # Run with debug logging
 RUST_LOG=debug cargo run
 ```
 ## DSL Schema
 Strategies are JSON objects with the schema defined in `src/dsl-schema.json`. The DSL uses a rule-based structure with `when` (entry conditions) and `then` (exit actions). Key concepts:
 - **Indicators**: `{"kind":"indicator","name":"...","params":{...}}`
 - **Comparators**: `{"kind":"compare","lhs":"...","op":"...","rhs":"..."}`
 - **Functions**: `{"kind":"func","name":"...","args":[...]}`
 See `src/dsl-schema.json` for the complete schema and `prompts.rs::system_prompt()` for how it's presented to Claude.
 ## Model Families
 The code supports different Claude model families via `ModelFamily` enum in `config.rs`:
 - **Sonnet**: Standard model, no special handling
 - **Opus**: Larger context, higher cost
 - **R1**: Has thinking blocks (`<think>...</think>`) that need to be stripped before JSON extraction
 Context length is auto-detected from the server's `/api/v1/models` endpoint (LM Studio) or `/v1/models/{id}` (OpenAI-compatible). Output token budget is set to half the context window.
 ## Output Files
 - `strategy_001.json` through `strategy_NNN.json` - Every strategy attempted (full JSON)
 - `validated_001.json` through `validated_NNN.json` - Strategies that passed OOS validation (includes in-sample + OOS metrics)
 - `best_strategy.json` - Strategy with highest average Sharpe across instruments
 - `run_ledger.jsonl` - Persistent record of all backtests for learning across runs
 ## Common Tasks
 ### Adding a new CLI flag
 1. Add field to `Cli` struct in `config.rs`
 2. Add clap derive attribute with `#[arg(short, long, env = "VAR_NAME")]`
 3. Use the flag in `agent::run()` via `cli.flag_name`
 ### Extending the DSL
 1. Update `src/dsl-schema.json` with new expression kinds
 2. Add validation logic in `validate_strategy()` if needed
 3. Update prompts in `prompts.rs` to guide the model
 ### Modifying the learning loop
 1. Edit `load_prior_summary()` in `agent.rs` to change how prior results are formatted
 2. Adjust `diagnose_history()` to add new diagnostics or change convergence detection
 3. Update `prompts.rs::iteration_prompt()` to incorporate new information
 ### Adding new validation checks
 Add to `validate_strategy()` in `agent.rs`. Returns `(hard_errors, warnings)` where hard errors block submission and warnings are logged but allow the backtest to proceed.
 ## Testing Strategy
 The codebase uses `anyhow` for error handling and `tracing` for logging. Key test areas:
 - Strategy JSON extraction from various response formats
 - Context length detection from LM Studio/OpenAI endpoints
 - Ledger entry serialization/deserialization
 - Backtest result parsing from swym API responses
 - Deduplication logic
 - Convergence detection in `diagnose_history()`
--- a/docs/plan/cross-run-learning.md
+++ b/docs/plan/cross-run-learning.md
@@ -0,0 +1,133 @@
 # Plan: Cross-run learning via run ledger and compare endpoint
 ## Context
 Scout currently starts from scratch every run — no memory of prior iterations. The upstream
 patch `e47c18` adds:
 1. **Enriched `result_summary`**: sortino_ratio, calmar_ratio, max_drawdown, pnl_return,
   avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs
 2. **Compare endpoint**: `GET /api/v1/paper-runs/compare?ids=uuid1,uuid2,...` returns
   `RunMetricsSummary` for up to 50 runs in one call
 Goal: persist enough state across runs so that iteration 1 of a new run starts informed by
 all previous runs' strategies and outcomes.
 ## Changes
 ### 1. Run ledger — persist strategy + run_id per backtest (`src/agent.rs`)
 After each successful `run_single_backtest`, append a JSONL entry to `{output_dir}/run_ledger.jsonl`:
 ```json
 {"run_id":"uuid","instrument":"BTCUSDC","candle_interval":"4h","strategy":{...},"timestamp":"2026-03-10T12:38:15Z"}
 ```
 One line per instrument-backtest (3 per iteration for 3 instruments). The strategy JSON is
 duplicated across instrument entries for the same iteration — this keeps the format flat and
 self-contained.
 Use `OpenOptions::append(true).create(true)` — no locking needed since scout is single-threaded.
 ### 2. Load prior runs on startup (`src/agent.rs`)
 At the top of `run()`, before the iteration loop:
 1. Read `run_ledger.jsonl` if it exists (ignore if missing — first run)
 2. Collect all `run_id`s
 3. Call `swym.compare_runs(&run_ids)` (batching in groups of 50)
 4. Join metrics back to strategies from the ledger
 5. Group by strategy (entries with the same strategy JSON share an iteration)
 6. Rank by average sharpe across instruments
 7. Build a `prior_results_summary: Option<String>` for the initial prompt
 ### 3. Compare endpoint client (`src/swym.rs`)
 Add `RunMetricsSummary` struct:
 ```rust
 pub struct RunMetricsSummary {
    pub id: Uuid,
    pub status: String,
    pub candle_interval: Option<String>,
    pub total_positions: Option<u32>,
    pub win_rate: Option<f64>,
    pub profit_factor: Option<f64>,
    pub net_pnl: Option<f64>,
    pub sharpe_ratio: Option<f64>,
    pub sortino_ratio: Option<f64>,
    pub calmar_ratio: Option<f64>,
    pub max_drawdown: Option<f64>,
    pub pnl_return: Option<f64>,
    pub avg_win: Option<f64>,
    pub avg_loss: Option<f64>,
    pub max_win: Option<f64>,
    pub max_loss: Option<f64>,
    pub avg_hold_duration_secs: Option<f64>,
 }
 ```
 Add `SwymClient::compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>>`:
 - `GET {base_url}/paper-runs/compare?ids={comma_separated}`
 - Parse JSON array response using `parse_number()` for decimal strings
 ### 4. Enrich `BacktestResult` with new fields (`src/swym.rs`)
 Add to `BacktestResult`: `sortino_ratio`, `calmar_ratio`, `max_drawdown`, `pnl_return`,
 `avg_win`, `avg_loss`, `max_win`, `max_loss`, `avg_hold_duration_secs`.
 Parse all in `from_response()` via existing `parse_number()`.
 Update `summary_line()` to include `max_dd={:.1}%` and `sortino={:.2}` when present —
 these two are the most useful additions for the model's reasoning.
 ### 5. Prior-results-aware initial prompt (`src/prompts.rs`)
 Modify `initial_prompt()` to accept `prior_summary: Option<&str>`.
 When present, insert before the "Design a trading strategy" instruction:
 ```
 ## Learnings from {N} prior backtests across {M} strategies
 {top 5 strategies ranked by avg sharpe, each showing:}
 - Interval, rule count, avg metrics across instruments
 - One-line description of the strategy approach (extracted from rule comments)
 - Full strategy JSON for the top 1-2
 {compact table of all prior strategies' avg metrics}
 Use these insights to avoid repeating failed approaches and to build on what worked.
 ```
 Limit to ~2000 tokens of prior context to avoid crowding the prompt. If many prior runs,
 show only the top 5 + bottom 3 (worst performers to avoid), plus a count of total runs.
 ### 6. Ledger entry struct (`src/agent.rs`)
 ```rust
 #[derive(Serialize, Deserialize)]
 struct LedgerEntry {
    run_id: Uuid,
    instrument: String,
    candle_interval: String,
    strategy: Value,
    timestamp: String,
 }
 ```
 ## Files to modify
 - `src/swym.rs` — `RunMetricsSummary` struct, `compare_runs()` method, enrich `BacktestResult`
  with new fields, update `summary_line()`
 - `src/agent.rs` — `LedgerEntry` struct, append-to-ledger after backtest, load-ledger-on-startup,
  call compare endpoint, build prior summary, pass to initial prompt
 - `src/prompts.rs` — `initial_prompt()` accepts optional prior summary
 ## Verification
 1. `cargo build --release`
 2. Run once → confirm `run_ledger.jsonl` is created with entries
 3. Run again → confirm:
   - Ledger is loaded, compare endpoint is called
   - Iteration 1 prompt includes prior results summary (visible at debug log level)
   - New entries are appended (not overwritten)
 4. Check that enriched metrics (sortino, max_drawdown) appear in summary_line output
--- a/src/agent.rs
+++ b/src/agent.rs
@@ -1,14 +1,26 @@
 use std::io::Write as IoWrite;
 use std::path::Path;
 use std::time::Duration;
 use anyhow::{Context, Result};
 use serde::{Deserialize, Serialize};
 use serde_json::Value;
 use tracing::{debug, error, info, warn};
 use uuid::Uuid;
 use crate::claude::{self, ClaudeClient, Message};
 use crate::config::{Cli, Instrument};
 use crate::prompts;
-use crate::swym::{BacktestResult, SwymClient};
+use crate::swym::{BacktestResult, RunMetricsSummary, SwymClient};
 /// Persistent record of a single completed backtest, written to the run ledger.
 #[derive(Debug, Serialize, Deserialize)]
 struct LedgerEntry {
    run_id: Uuid,
    instrument: String,
    candle_interval: String,
    strategy: Value,
 }
 /// A single iteration's record: strategy + results across instruments.
 #[derive(Debug)]
@@ -190,14 +202,24 @@ pub async fn run(cli: &Cli) -> Result<()> {
    // Load DSL schema for the system prompt
    let schema = include_str!("dsl-schema.json");
-    let system = prompts::system_prompt(schema, claude.family());
+    let has_futures = instruments.iter().any(|i| i.is_futures());
    let system = prompts::system_prompt(schema, claude.family(), has_futures);
    info!("model family: {}", claude.family().name());
    // Resolve ledger path: explicit --ledger-file takes precedence, else <output_dir>/run_ledger.jsonl
    let ledger_path = cli.ledger_file.clone().unwrap_or_else(|| cli.output_dir.join("run_ledger.jsonl"));
    info!("ledger: {}", ledger_path.display());
    // Load prior runs from ledger and build cross-run context for iteration 1
    let prior_summary = load_prior_summary(&ledger_path, &swym).await;
    // Agent state
    let mut history: Vec<IterationRecord> = Vec::new();
    let mut conversation: Vec<Message> = Vec::new();
    let mut best_strategy: Option<(f64, Value)> = None; // (avg_sharpe, strategy)
    let mut consecutive_failures = 0u32;
    // Deduplication: track canonical strategy JSON → first iteration it was tested.
    let mut tested_strategies: std::collections::HashMap<String, u32> = std::collections::HashMap::new();
    let instrument_names: Vec<String> = instruments.iter().map(|i| i.symbol.clone()).collect();
@@ -206,7 +228,7 @@ pub async fn run(cli: &Cli) -> Result<()> {
        // Build the user prompt
        let user_msg = if iteration == 1 {
-            prompts::initial_prompt(&instrument_names, &available_intervals)
+            prompts::initial_prompt(&instrument_names, &available_intervals, prior_summary.as_deref(), has_futures)
        } else {
            let results_text = history
                .iter()
@@ -265,6 +287,12 @@ pub async fn run(cli: &Cli) -> Result<()> {
            content: response_text.clone(),
        });
        // Log R1 reasoning chain at debug level so it can be inspected when
        // the model makes repeated DSL mistakes (run with RUST_LOG=debug).
        if let Some(thinking) = claude::extract_think_content(&response_text) {
            debug!("R1 thinking ({} chars):\n{}", thinking.len(), thinking);
        }
        // Extract strategy JSON
        let strategy = match claude::extract_json(&response_text) {
            Ok(s) => s,
@@ -319,7 +347,7 @@ pub async fn run(cli: &Cli) -> Result<()> {
        let strat_path = cli.output_dir.join(format!("strategy_{iteration:03}.json"));
        std::fs::write(&strat_path, serde_json::to_string_pretty(&strategy)?)?;
-        // Hard validation errors: skip the expensive backtest and give immediate feedback.
+        // Hard client-side validation errors: skip without hitting the API.
        if !hard_errors.is_empty() {
            let record = IterationRecord {
                iteration,
@@ -332,6 +360,61 @@ pub async fn run(cli: &Cli) -> Result<()> {
            continue;
        }
        // Server-side validation: call /strategies/validate to get ALL DSL errors
        // at once before submitting a backtest. This avoids burning a full backtest
        // round-trip on a structurally invalid strategy and gives the model a
        // complete list of errors to fix in one shot.
        match swym.validate_strategy(&strategy).await {
            Ok(api_errors) if !api_errors.is_empty() => {
                for e in &api_errors {
                    warn!("  DSL error at {}: {}", e.path.as_deref().unwrap_or("(top-level)"), e.message);
                }
                let error_notes: Vec<String> = api_errors
                    .iter()
                    .map(|e| format!("DSL error at {}: {}", e.path.as_deref().unwrap_or("(top-level)"), e.message))
                    .collect();
                validation_notes.extend(error_notes);
                let record = IterationRecord {
                    iteration,
                    strategy: strategy.clone(),
                    results: vec![],
                    validation_notes,
                };
                info!("{}", record.summary());
                history.push(record);
                continue;
            }
            Ok(_) => {
                // Valid — proceed to backtest
            }
            Err(e) => {
                // Network/parse failure from the validate endpoint — log and proceed
                // anyway so a transient API issue doesn't stall the run.
                warn!("  strategy validation request failed (proceeding): {e:#}");
            }
        }
        // Deduplication check: skip strategies identical to one already tested this run.
        let strategy_key = serde_json::to_string(&strategy).unwrap_or_default();
        if let Some(&first_iter) = tested_strategies.get(&strategy_key) {
            warn!("duplicate strategy (identical to iteration {first_iter}), skipping backtest");
            let record = IterationRecord {
                iteration,
                strategy: strategy.clone(),
                results: vec![],
                validation_notes: vec![format!(
                    "DUPLICATE: this exact strategy was already tested in iteration {first_iter}. \
                     You submitted identical JSON. You MUST design a completely different strategy — \
                     different indicator family, different entry conditions, or different timeframe. \
                     Do NOT submit the same JSON again."
                )],
            };
            info!("{}", record.summary());
            history.push(record);
            continue;
        }
        tested_strategies.insert(strategy_key, iteration);
        // Run backtests against all instruments (in-sample)
        let mut results: Vec<BacktestResult> = Vec::new();
@@ -357,12 +440,13 @@ pub async fn run(cli: &Cli) -> Result<()> {
                            info!("  condition audit: {}", serde_json::to_string_pretty(audit).unwrap_or_default());
                        }
                    }
                    append_ledger_entry(&ledger_path, &result, &strategy);
                    results.push(result);
                }
                Err(e) => {
                    warn!("  backtest failed for {}: {e:#}", inst.symbol);
                    results.push(BacktestResult {
-                        run_id: uuid::Uuid::nil(),
+                        run_id: Uuid::nil(),
                        instrument: inst.symbol.clone(),
                        status: "failed".to_string(),
                        total_positions: None,
@@ -373,6 +457,15 @@ pub async fn run(cli: &Cli) -> Result<()> {
                        total_pnl: None,
                        net_pnl: None,
                        sharpe_ratio: None,
                        sortino_ratio: None,
                        calmar_ratio: None,
                        max_drawdown: None,
                        pnl_return: None,
                        avg_win: None,
                        avg_loss: None,
                        max_win: None,
                        max_loss: None,
                        avg_hold_duration_secs: None,
                        total_fees: None,
                        avg_bars_in_trade: None,
                        error_message: Some(e.to_string()),
@@ -510,6 +603,7 @@ async fn run_single_backtest(
            &inst.symbol,
            &inst.base(),
            &inst.quote(),
            inst.market_kind(),
            strategy,
            starts_at,
            finishes_at,
@@ -530,13 +624,180 @@ async fn run_single_backtest(
        .await
        .context("poll")?;
-    Ok(BacktestResult::from_response(
+    Ok(BacktestResult::from_response(&final_resp, &inst.symbol))
-        &final_resp,
+}
-        &inst.symbol,
+
-        &inst.exchange,
+/// Append a ledger entry for a completed backtest so future runs can learn from it.
-        &inst.base(),
+fn append_ledger_entry(ledger: &Path, result: &BacktestResult, strategy: &Value) {
-        &inst.quote(),
+    // Skip nil run_ids (error placeholders)
-    ))
+    if result.run_id == Uuid::nil() {
        return;
    }
    let entry = LedgerEntry {
        run_id: result.run_id,
        instrument: result.instrument.clone(),
        candle_interval: strategy["candle_interval"]
            .as_str()
            .unwrap_or("?")
            .to_string(),
        strategy: strategy.clone(),
    };
    // Append newline inside the serialised bytes so the entire write is a single
    // write_all() syscall — O_APPEND + single write() is atomic on Linux local
    // filesystems, making concurrent instances safe for typical entry sizes.
    let mut bytes = match serde_json::to_vec(&entry) {
        Ok(b) => b,
        Err(e) => {
            warn!("could not serialize ledger entry: {e}");
            return;
        }
    };
    bytes.push(b'\n');
    if let Err(e) = std::fs::OpenOptions::new()
        .append(true)
        .create(true)
        .open(ledger)
        .and_then(|mut f| f.write_all(&bytes))
    {
        warn!("could not write ledger entry: {e}");
    }
 }
 /// Load the run ledger, fetch metrics via the compare endpoint, and return a compact
 /// prior-results summary string for the initial prompt.  Returns `None` if the ledger
 /// is absent, empty, or the compare call fails.
 async fn load_prior_summary(ledger: &Path, swym: &SwymClient) -> Option<String> {
    let path = ledger;
    let contents = std::fs::read_to_string(&path).ok()?;
    // Parse all ledger entries
    let entries: Vec<LedgerEntry> = contents
        .lines()
        .filter(|l| !l.trim().is_empty())
        .filter_map(|l| serde_json::from_str(l).ok())
        .collect();
    if entries.is_empty() {
        return None;
    }
    info!("loaded {} ledger entries from previous runs", entries.len());
    // Fetch metrics for all run_ids
    let run_ids: Vec<Uuid> = entries.iter().map(|e| e.run_id).collect();
    let metrics = match swym.compare_runs(&run_ids).await {
        Ok(m) => m,
        Err(e) => {
            warn!("could not fetch prior run metrics: {e}");
            return None;
        }
    };
    // Build a map from run_id → metrics
    let metrics_map: std::collections::HashMap<Uuid, &RunMetricsSummary> =
        metrics.iter().map(|m| (m.id, m)).collect();
    // Group entries by strategy (use candle_interval + rules fingerprint)
    // We use the full strategy JSON as the grouping key.
    let mut strategy_groups: std::collections::HashMap<String, Vec<(&LedgerEntry, Option<&RunMetricsSummary>)>> =
        std::collections::HashMap::new();
    // Cap at 3 entries per unique strategy (one per instrument is enough).
    // Without this, a strategy repeated across many iterations swamps the summary.
    for entry in &entries {
        let key = serde_json::to_string(&entry.strategy).unwrap_or_default();
        let group = strategy_groups.entry(key).or_default();
        if group.len() < 3 {
            let m = metrics_map.get(&entry.run_id).copied();
            group.push((entry, m));
        }
    }
    // Compute avg sharpe per strategy group
    let mut strategies: Vec<(f64, &Value, Vec<(&LedgerEntry, Option<&RunMetricsSummary>)>)> = strategy_groups
        .into_values()
        .map(|group| {
            let sharpes: Vec<f64> = group
                .iter()
                .filter_map(|(_, m)| m.and_then(|m| m.sharpe_ratio))
                .collect();
            let avg_sharpe = if sharpes.is_empty() {
                f64::NEG_INFINITY
            } else {
                sharpes.iter().sum::<f64>() / sharpes.len() as f64
            };
            let strategy = &group[0].0.strategy;
            (avg_sharpe, strategy, group)
        })
        .collect();
    strategies.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap_or(std::cmp::Ordering::Equal));
    let total_strategies = strategies.len();
    let total_backtests = entries.len();
    // Build summary text — top 5 + bottom 3 (if distinct), capped at ~2000 chars
    let mut lines = vec![format!(
        "## Learnings from {} prior backtests across {} strategies\n",
        total_backtests, total_strategies
    )];
    lines.push("### Best strategies (ranked by avg Sharpe):".to_string());
    let show_top = strategies.len().min(5);
    for (avg_sharpe, strategy, group) in strategies.iter().take(show_top) {
        let interval = strategy["candle_interval"].as_str().unwrap_or("?");
        let rule_count = strategy["rules"].as_array().map(|r| r.len()).unwrap_or(0);
        // Collect per-instrument metrics
        let inst_lines: Vec<String> = group
            .iter()
            .filter_map(|(entry, m)| {
                let m = (*m)?;
                Some(format!(
                    "    {}: trades={} sharpe={:.3} net_pnl={:.2}{}",
                    entry.instrument,
                    m.total_positions.unwrap_or(0),
                    m.sharpe_ratio.unwrap_or(0.0),
                    m.net_pnl.unwrap_or(0.0),
                    m.max_drawdown.map(|d| format!(" max_dd={:.1}%", d * 100.0)).unwrap_or_default(),
                ))
            })
            .collect();
        // Pull the first rule comment as a strategy description
        let description = strategy["rules"][0]["comment"]
            .as_str()
            .unwrap_or("(no description)");
        lines.push(format!(
            "\n  [{interval}, {rule_count} rules, avg_sharpe={avg_sharpe:.3}] {description}"
        ));
        lines.extend(inst_lines);
        // Include full JSON only for the top 2
        let rank = strategies.iter().position(|(_, s, _)| std::ptr::eq(*s, *strategy)).unwrap_or(99);
        if rank < 2 {
            lines.push(format!(
                "  strategy JSON: {}",
                serde_json::to_string(strategy).unwrap_or_default()
            ));
        }
    }
    // Worst 3 (if we have more than 5)
    if strategies.len() > 5 {
        lines.push("\n### Worst strategies (avoid repeating these):".to_string());
        let worst_start = strategies.len().saturating_sub(3);
        for (avg_sharpe, strategy, _) in strategies.iter().skip(worst_start) {
            let interval = strategy["candle_interval"].as_str().unwrap_or("?");
            let description = strategy["rules"][0]["comment"].as_str().unwrap_or("(no description)");
            lines.push(format!("  [{interval}, avg_sharpe={avg_sharpe:.3}] {description}"));
        }
    }
    lines.push(format!(
        "\nUse these results to avoid repeating failed approaches and build on what worked.\n"
    ));
    let summary = lines.join("\n");
    // Truncate to ~6000 chars to stay within prompt budget
    if summary.len() > 6000 {
        Some(format!("{}…\n[truncated — {} total strategies]\n", &summary[..5900], total_strategies))
    } else {
        Some(summary)
    }
 }
 fn save_validated_strategy(
@@ -665,6 +926,48 @@ pub fn diagnose_history(history: &[IterationRecord]) -> (String, bool) {
        }
    }
    // --- Repeated API error detection ---
    // If the same DSL error variant has appeared in 2+ consecutive iterations,
    // call it out explicitly so the model knows exactly what to fix.
    {
        let recent_errors: Vec<String> = history
            .iter()
            .rev()
            .take(4)
            .flat_map(|rec| rec.results.iter())
            .filter_map(|r| r.error_message.as_deref())
            .filter(|e| e.contains("unknown variant"))
            .map(|e| {
                // Extract the variant name: "unknown variant `foo`, expected ..."
                e.split('`')
                    .nth(1)
                    .unwrap_or(e)
                    .to_string()
            })
            .collect();
        if recent_errors.len() >= 2 {
            // Find the most frequent bad variant
            let mut counts: std::collections::HashMap<&str, usize> = std::collections::HashMap::new();
            for v in &recent_errors {
                *counts.entry(v.as_str()).or_default() += 1;
            }
            if let Some((bad_variant, count)) = counts.into_iter().max_by_key(|(_, c)| *c) {
                if count >= 2 {
                    notes.push(format!(
                        "⚠ DSL ERROR (repeated {count}×): the swym API rejected \
                         `{bad_variant}` as an unknown variant. \
                         Check the 'Critical: expression kinds' section — \
                         `{bad_variant}` may be a FuncName (use inside \
                         {{\"kind\":\"func\",\"name\":\"{bad_variant}\",...}}) \
                         or it may not be supported at all. \
                         Use ONLY the documented kinds and func names."
                    ));
                }
            }
        }
    }
    // --- Zero-trade check ---
    let zero_trade_iters = history
        .iter()
--- a/src/claude.rs
+++ b/src/claude.rs
@@ -213,6 +213,14 @@ fn lmstudio_context_length(json: &Value, model_id: &str) -> Option<u32> {
    None
 }
 /// Return the content of the first `<think>` block, if any.
 /// Used for debug logging of R1 reasoning chains.
 pub fn extract_think_content(text: &str) -> Option<String> {
    let start = text.find("<think>")? + "<think>".len();
    let end = text[start..].find("</think>").map(|i| start + i)?;
    Some(text[start..end].trim().to_string())
 }
 /// Extract a JSON object from a model response text.
 /// Handles markdown code fences and R1-style `<think>...</think>` blocks.
 pub fn extract_json(text: &str) -> Result<Value> {
--- a/src/config.rs
+++ b/src/config.rs
@@ -118,6 +118,13 @@ pub struct Cli {
    #[arg(long, default_value = "./results")]
    pub output_dir: PathBuf,
    /// Path to the run ledger JSONL file used for cross-run learning.
    /// Defaults to <output_dir>/run_ledger.jsonl when not specified.
    /// Pass a different path to seed a new run from a specific ledger
    /// (e.g. a curated export from a previous campaign).
    #[arg(long)]
    pub ledger_file: Option<PathBuf>,
    /// Poll interval in seconds when waiting for backtest completion.
    #[arg(long, default_value_t = 2)]
    pub poll_interval_secs: u64,
@@ -167,4 +174,22 @@ impl Instrument {
        }
        "usdc".to_string()
    }
    /// Instrument kind for the paper-run config `instrument.kind` field.
    /// Derived from the exchange identifier (case-insensitive).
    pub fn market_kind(&self) -> &'static str {
        let e = self.exchange.to_ascii_lowercase();
        if e.contains("futures_usd") || e.contains("futures_um") {
            "futures_um"
        } else if e.contains("futures_coin") || e.contains("futures_cm") {
            "futures_cm"
        } else {
            "spot"
        }
    }
    /// True when this instrument is traded on a futures market.
    pub fn is_futures(&self) -> bool {
        self.market_kind() != "spot"
    }
 }
--- a/src/dsl-schema.json
+++ b/src/dsl-schema.json
@@ -66,11 +66,53 @@
      "properties": {
        "side": { "type": "string", "enum": ["buy", "sell"] },
        "quantity": {
-          "$ref": "#/definitions/DecimalString",
+          "description": "Per-order size in base asset units. Fixed decimal string (e.g. \"0.001\"), a declarative SizingMethod object, or a dynamic Expr object. When a method or Expr returns None the order is skipped; negative values are clamped to zero.",
-          "description": "Per-order size in base asset units, e.g. \"0.001\" for BTC."
+          "oneOf": [
            { "$ref": "#/definitions/DecimalString" },
            { "$ref": "#/definitions/SizingFixedSum" },
            { "$ref": "#/definitions/SizingPercentOfBalance" },
            { "$ref": "#/definitions/SizingFixedUnits" },
            { "$ref": "#/definitions/Expr" }
          ]
        },
        "reverse": {
          "type": "boolean",
          "default": false,
          "description": "Flip-through-zero flag (futures only). When true and an opposite position is currently open, the submitted order quantity becomes position_qty + configured_qty, closing the existing position and immediately opening a new one in the opposite direction in a single order. When flat the flag has no effect and configured_qty is used as normal. Omit or set false for standard close-only behaviour."
        }
      }
    },
    "SizingFixedSum": {
      "description": "Buy `amount` worth of quote currency at the current price. qty = amount / current_price.",
      "type": "object",
      "required": ["method", "amount"],
      "additionalProperties": false,
      "properties": {
        "method": { "const": "fixed_sum" },
        "amount": { "$ref": "#/definitions/DecimalString", "description": "Quote-currency amount, e.g. \"500\" means buy $500 worth." }
      }
    },
    "SizingPercentOfBalance": {
      "description": "Buy percent% of the named asset's free balance worth of base asset. qty = balance(asset) * percent/100 / current_price.",
      "type": "object",
      "required": ["method", "percent", "asset"],
      "additionalProperties": false,
      "properties": {
        "method": { "const": "percent_of_balance" },
        "percent": { "$ref": "#/definitions/DecimalString", "description": "Percentage, e.g. \"2\" means 2% of the free balance." },
        "asset": { "type": "string", "description": "Asset name to look up, e.g. \"usdc\". Matched case-insensitively." }
      }
    },
    "SizingFixedUnits": {
      "description": "Buy exactly `units` of base asset. Semantic alias for a fixed decimal quantity.",
      "type": "object",
      "required": ["method", "units"],
      "additionalProperties": false,
      "properties": {
        "method": { "const": "fixed_units" },
        "units": { "$ref": "#/definitions/DecimalString", "description": "Base asset quantity, e.g. \"0.01\" means 0.01 BTC." }
      }
    },
    "Rule": {
      "type": "object",
      "required": ["when", "then"],
@@ -280,7 +322,12 @@
        { "$ref": "#/definitions/ExprBinOp" },
        { "$ref": "#/definitions/ExprApplyFunc" },
        { "$ref": "#/definitions/ExprUnaryOp" },
-        { "$ref": "#/definitions/ExprBarsSince" }
+        { "$ref": "#/definitions/ExprBarsSince" },
        { "$ref": "#/definitions/ExprEntryPrice" },
        { "$ref": "#/definitions/ExprPositionQuantity" },
        { "$ref": "#/definitions/ExprUnrealisedPnl" },
        { "$ref": "#/definitions/ExprBarsSinceEntry" },
        { "$ref": "#/definitions/ExprBalance" }
      ]
    },
    "ExprLiteral": {
@@ -417,6 +464,55 @@
          "description": "Maximum bars to look back."
        }
      }
    },
    "ExprEntryPrice": {
      "description": "Volume-weighted average entry price of the current open position. Returns None when flat.",
      "type": "object",
      "required": ["kind"],
      "additionalProperties": false,
      "properties": {
        "kind": { "const": "entry_price" }
      }
    },
    "ExprPositionQuantity": {
      "description": "Absolute quantity of the current open position in base asset units. Returns None when flat.",
      "type": "object",
      "required": ["kind"],
      "additionalProperties": false,
      "properties": {
        "kind": { "const": "position_quantity" }
      }
    },
    "ExprUnrealisedPnl": {
      "description": "Estimated unrealised PnL of the current open position in quote asset. Returns None when flat.",
      "type": "object",
      "required": ["kind"],
      "additionalProperties": false,
      "properties": {
        "kind": { "const": "unrealised_pnl" }
      }
    },
    "ExprBarsSinceEntry": {
      "description": "Number of complete primary-interval bars elapsed since the current position was opened. Computed as floor((now - time_enter) / primary_interval_secs). Returns None when flat.",
      "type": "object",
      "required": ["kind"],
      "additionalProperties": false,
      "properties": {
        "kind": { "const": "bars_since_entry" }
      }
    },
    "ExprBalance": {
      "description": "Free balance of the named asset (matched case-insensitively). Returns None when the asset is not found or balance data is unavailable.",
      "type": "object",
      "required": ["kind", "asset"],
      "additionalProperties": false,
      "properties": {
        "kind": { "const": "balance" },
        "asset": {
          "type": "string",
          "description": "Internal asset name, e.g. \"usdt\", \"btc\". Case-insensitive."
        }
      }
    }
  }
 }
--- a/src/prompts.rs
+++ b/src/prompts.rs
@@ -4,7 +4,7 @@ use crate::config::ModelFamily;
 ///
 /// Accepts a `ModelFamily` so each family can receive tailored guidance
 /// while sharing the common DSL schema and strategy evaluation rules.
-pub fn system_prompt(dsl_schema: &str, family: &ModelFamily) -> String {
+pub fn system_prompt(dsl_schema: &str, family: &ModelFamily, has_futures: bool) -> String {
    let output_instructions = match family {
        ModelFamily::DeepSeekR1 => {
            "## Output format\n\n\
@@ -52,6 +52,10 @@ sma, ema, wma, rsi, std_dev, sum, highest, lowest, atr, supertrend, adx,
 bollinger_upper, bollinger_lower — applied to any candle field (open/high/low/close/volume)
 with configurable period and optional offset.
 These are FuncNames used INSIDE `{{"kind":"func","name":"...","period":N}}` expressions.
 `atr`, `adx`, and `supertrend` use OHLC internally and ignore the `field` parameter.
 To use ADX as a trend-strength filter: `{{"kind":"compare","left":{{"kind":"func","name":"adx","period":14}},"op":">","right":{{"kind":"literal","value":"25"}}}}`
 ### Composed indicators (apply_func)
 Apply rolling functions to arbitrary expressions: EMA of EMA, Hull MA (WMA of expression),
 VWAP (sum of close*volume / sum of volume), standard deviation of returns, etc.
@@ -70,11 +74,78 @@ bars_since_entry — complete bars elapsed since position was opened
 balance — free balance of a named asset (e.g. "usdt", "usdc")
 ### Quantity
-Action quantity MUST be a fixed decimal string that parses as a floating-point number,
+Action quantity accepts four forms — pick the simplest one for your intent:
-e.g. `"quantity": "0.001"`.
+
-NEVER use an expression object for quantity — only plain decimal strings are accepted.
+**1. Declarative sizing methods (preferred — instrument-agnostic, readable):**
-NEVER use placeholder strings like `"ATR_SIZED"`, `"FULL_BALANCE"`, `"percent_of_balance"`,
+
-`"dynamic"`, or any non-numeric string — these will be rejected immediately.
+Spend a fixed quote amount (e.g. $500 worth of base at current price):
 ```json
 {{"method":"fixed_sum","amount":"500"}}
 ```
 Spend a percentage of free quote balance (e.g. 5% of USDC):
 ```json
 {{"method":"percent_of_balance","percent":"5","asset":"usdc"}}
 ```
 Buy a fixed number of base units (semantic alias for a decimal string):
 ```json
 {{"method":"fixed_units","units":"0.01"}}
 ```
 **2. Plain decimal string** — use only when you have a specific reason:
 `"0.01"` (0.01 BTC, 3.0 ETH, 50.0 SOL — instrument-specific, not portable)
 **3. Expr** — for dynamic sizing not covered by the methods above, e.g. ATR-based:
 ```json
 {{"kind":"bin_op","op":"div",
  "left":{{"kind":"literal","value":"200"}},
  "right":{{"kind":"func","name":"atr","period":14}}}}
 ```
 CRITICAL — ATR sizing and balance limits: `N/atr(14)` expresses quantity in BASE asset units.
 For BTC, 4h ATR ≈ $1500–3000. So `1000/atr(14)` ≈ 0.4–0.7 BTC ≈ $32k–56k notional —
 silently rejected on a $10k account (fill returns None, 0 positions open, no error shown).
 The numerator N represents your intended dollar risk per trade. For a $10k account keep N ≤ 200.
 `200/atr(14)` ≈ 0.07–0.13 BTC ≈ $5.6k–10k notional — fits within a $10k account.
 Prefer `percent_of_balance` for most sizing. Only reach for ATR-based Expr sizing when you need
 volatility-scaled position risk, and keep the numerator proportional to your risk tolerance.
 **4. Exit rules** — use `position_quantity` to close the exact open size:
 ```json
 {{"kind":"position_quantity"}}
 ```
 Alternatively, `"9999"` works for exits: sell quantities are automatically capped to the open
 position size, so a large fixed number is equivalent to `position_quantity`.
 CRITICAL — the `"method"` vs `"kind"` distinction:
 - `"method"` belongs ONLY to the three declarative sizing objects: `fixed_sum`, `percent_of_balance`, `fixed_units`.
 - `"kind"` belongs to Expr objects: `position_quantity`, `bin_op`, `func`, `field`, `literal`, etc.
 - `{{"method":"position_quantity"}}` is ALWAYS WRONG. It will be rejected every time.
  CORRECT: `{{"kind":"position_quantity"}}`.
 - If you used `{{"method":"percent_of_balance",...}}` for the buy, use `{{"kind":"position_quantity"}}` for the sell.
  These are different object types — buy uses a SizingMethod (`method`), sell uses an Expr (`kind`).
 - `{{"method":"fixed_sum","amount":"100","multiplier":"2.0"}}` is WRONG — `fixed_sum` has no
  `multiplier` field. Only `amount` is accepted alongside `method`.
 - NEVER add extra fields to SizingMethod objects — they use `additionalProperties: false`.
 ### Reverse / flip-through-zero (futures only)
 Setting `"reverse": true` on a rule action enables a single-order position flip on futures.
 When an opposite position is open, quantity = `position_qty + configured_qty`, which closes
 the existing position and opens a new one in the opposite direction in one order (fees split
 proportionally). When flat the flag has no effect — `configured_qty` is used normally.
 This lets you collapse a 4-rule long+short strategy (separate open/close for each leg) into
 2 rules, reducing round-trip fees and keeping logic compact:
 ```json
 {{"side": "sell", "quantity": {{"method": "percent_of_balance", "percent": "10", "asset": "usdc"}}, "reverse": true}}
 ```
 Use `reverse` when you always want to be in a position — the signal flips you from long to
 short (or vice versa) rather than first exiting and then re-entering separately. Do NOT use
 `reverse` on spot markets (short selling is not supported there).
 ### Multi-timeframe
 Any expression can reference a different timeframe via "timeframe" field.
@@ -100,6 +171,13 @@ Use higher timeframes as trend filters, lower timeframes for entry precision.
 6. **Composite / hybrid**: Combine families. Trend filter + mean-reversion entry.
   Momentum confirmation + volatility sizing.
 7. **Supertrend consensus flip (futures only)**: Use `any_of` across multiple
   Supertrend configs (e.g. period=7/mul=1.5, period=10/mul=2.0, period=20/mul=3.0)
   so that ANY flip triggers a long or short entry. Combine with `"reverse": true`
   for an always-in-market approach where the opposite signal is the stop-loss.
   Varying multiplier tightens/loosens the band; varying period controls sensitivity.
   Risk: choppy markets generate many whipsaws — best on daily or 4h.
 ## Risk management (always include)
 Every strategy MUST have:
@@ -107,6 +185,10 @@ Every strategy MUST have:
 - A time-based exit: use bars_since_entry to avoid holding losers indefinitely
 - Reasonable position sizing: prefer ATR-based or percent-of-balance over fixed quantity
 Exception: always-in-market flip strategies (using `"reverse": true`) do not need an
 explicit stop-loss or time exit — the opposite signal acts as the stop. These are
 only valid on futures. See Example 6 and Example 7.
 {output_instructions}
 ## Interpreting backtest results
@@ -115,7 +197,11 @@ When I share results from previous iterations, use them to guide your next strat
 - **Zero trades**: The entry conditions are too restrictive or never co-occur.
  Relax thresholds, simplify conditions, or check if the indicator periods make
-  sense for the candle interval.
+  sense for the candle interval. Also check your position sizing — if using an
  ATR-based Expr quantity (`N/atr(14)`), a large N can produce a notional value
  exceeding your account balance (e.g. `1000/atr(14)` on BTC ≈ 0.4 BTC ≈ $32k),
  which is silently rejected by the fill engine. Switch to `percent_of_balance`
  or reduce N to ≤ 200 for a $10k account.
 - **Many trades but negative PnL**: The entry signal has no edge, or the exit
  logic is poor. Try different indicator combinations, add trend filters, or
@@ -146,11 +232,31 @@ Common mistakes to NEVER make:
 - `"kind": "bars_since_entry"` is a valid standalone Expr (no extra fields needed).
  Do NOT put `"bars_since_entry"` as a `"name"` inside `{{"kind":"func",...}}` — that is WRONG.
 - `"kind": "expr_field"` does NOT exist. Use `{{"kind":"field","field":"close"}}`.
 - Every Expr object MUST have a `"kind"` field. `{{"field":"close"}}` is WRONG — missing `"kind"`.
  CORRECT: `{{"kind":"field","field":"close"}}`. The `"kind"` is never optional.
  This applies to ALL field access including offset lookups:
  `{{"field":"volume","offset":-1}}` is WRONG. CORRECT: `{{"kind":"field","field":"volume","offset":-1}}`.
  `{{"field":"high","offset":-2}}` is WRONG. CORRECT: `{{"kind":"field","field":"high","offset":-2}}`.
 - `rsi`, `adx`, `supertrend` are NOT valid inside `apply_func`. Use only `apply_func`
  with `ApplyFuncName` values: `highest`, `lowest`, `sma`, `ema`, `wma`, `std_dev`, `sum`,
  `bollinger_upper`, `bollinger_lower`.
 - `volume` is a candle FIELD, not a func name. Access it as `{{"kind":"field","field":"volume"}}`.
-  To compute EMA of volume: `{{"kind":"apply_func","name":"ema","period":20,"expr":{{"kind":"field","field":"volume"}}}}`.
+  To compute EMA of volume: `{{"kind":"apply_func","name":"ema","period":20,"input":{{"kind":"field","field":"volume"}}}}`.
 - `bollinger_upper` and `bollinger_lower` are FUNC NAMES, not Expr kinds. To compare close to the upper band:
  `{{"kind":"compare","left":{{"kind":"field","field":"close"}},"op":">","right":{{"kind":"func","name":"bollinger_upper","period":20}}}}`
  NEVER write `{{"kind":"bollinger_upper",...}}` — `bollinger_upper` is not an Expr kind.
  NEVER set `"field":"bollinger_upper"` on a func Expr — `bollinger_upper`/`bollinger_lower` have no `field`
  parameter; they compute from close internally. Just `{{"kind":"func","name":"bollinger_upper","period":20}}`.
 - The `{{"kind":"bollinger",...}}` Condition (shorthand) only accepts `"band": "above_upper"` or
  `"band": "below_lower"`. There is NO `above_lower` or `below_upper` — those are invalid and will be
  rejected. Use `above_upper` (price above the upper band) or `below_lower` (price below the lower band).
 - `adx` is a FUNC NAME, not a Condition kind. To filter for strong trends (ADX > 25):
  `{{"kind":"compare","left":{{"kind":"func","name":"adx","period":14}},"op":">","right":{{"kind":"literal","value":"25"}}}}`
  NEVER write `{{"kind":"adx",...}}` — `adx` is not a Condition kind, it is a FuncName used inside `{{"kind":"func",...}}`.
 - `roc` (rate of change), `hma` (Hull MA), `ma` (generic), `vwap`, `macd`, `cci`, `stoch` are NOT supported.
  Use `sma`, `ema`, `wma`, `rsi`, `atr`, `adx`, `supertrend`, `std_dev`, `sum`, `highest`, `lowest`,
  `bollinger_upper`, `bollinger_lower` only. There is no generic `ma` — use `sma` or `ema` explicitly.
  Hull MA can be approximated as: WMA(2*WMA(n/2) - WMA(n)) using `apply_func`.
 ## Working examples
@@ -171,7 +277,7 @@ Common mistakes to NEVER make:
          {{"kind": "ema_trend", "period": 50, "direction": "above"}}
        ]
      }},
-      "then": {{"side": "buy", "quantity": "0.001"}}
+      "then": {{"side": "buy", "quantity": "0.01"}}
    }},
    {{
      "comment": "Sell: EMA9 crosses below EMA21, OR 2% stop-loss, OR 72-bar time exit",
@@ -199,7 +305,7 @@ Common mistakes to NEVER make:
          }}
        ]
      }},
-      "then": {{"side": "sell", "quantity": "0.001"}}
+      "then": {{"side": "sell", "quantity": {{"kind": "position_quantity"}}}}
    }}
  ]
 }}
@@ -222,7 +328,7 @@ Common mistakes to NEVER make:
          {{"kind": "bollinger", "period": 20, "band": "below_lower"}}
        ]
      }},
-      "then": {{"side": "buy", "quantity": "0.001"}}
+      "then": {{"side": "buy", "quantity": "0.01"}}
    }},
    {{
      "comment": "Sell: RSI recovers above 55, OR 3% stop-loss, OR 48-bar time exit",
@@ -250,7 +356,7 @@ Common mistakes to NEVER make:
          }}
        ]
      }},
-      "then": {{"side": "sell", "quantity": "0.001"}}
+      "then": {{"side": "sell", "quantity": {{"kind": "position_quantity"}}}}
    }}
  ]
 }}
@@ -277,7 +383,7 @@ Common mistakes to NEVER make:
          }}
        ]
      }},
-      "then": {{"side": "buy", "quantity": "0.001"}}
+      "then": {{"side": "buy", "quantity": "0.01"}}
    }},
    {{
      "comment": "Sell: 2-ATR stop-loss below entry price, OR 48-bar time exit",
@@ -312,38 +418,343 @@ Common mistakes to NEVER make:
          }}
        ]
      }},
-      "then": {{"side": "sell", "quantity": "0.001"}}
+      "then": {{"side": "sell", "quantity": {{"kind": "position_quantity"}}}}
    }}
  ]
 }}
 ```
 ### Example 4 — MACD crossover (composed from primitives)
 MACD has no native support, but can be composed from `func` and `apply_func`.
 The MACD line is `EMA(12) - EMA(26)`; the signal line is `EMA(9)` of the MACD line.
 ```json
 {{
  "type": "rule_based",
  "candle_interval": "4h",
  "rules": [
    {{
      "comment": "Buy: MACD line crosses above signal line",
      "when": {{
        "kind": "all_of",
        "conditions": [
          {{"kind": "position", "state": "flat"}},
          {{
            "kind": "cross_over",
            "left": {{
              "kind": "bin_op", "op": "sub",
              "left":  {{"kind": "func", "name": "ema", "period": 12}},
              "right": {{"kind": "func", "name": "ema", "period": 26}}
            }},
            "right": {{
              "kind": "apply_func", "name": "ema", "period": 9,
              "input": {{
                "kind": "bin_op", "op": "sub",
                "left":  {{"kind": "func", "name": "ema", "period": 12}},
                "right": {{"kind": "func", "name": "ema", "period": 26}}
              }}
            }}
          }}
        ]
      }},
      "then": {{"side": "buy", "quantity": "0.01"}}
    }},
    {{
      "comment": "Sell: MACD crosses below signal, OR 2% stop-loss, OR 72-bar time exit",
      "when": {{
        "kind": "all_of",
        "conditions": [
          {{"kind": "position", "state": "long"}},
          {{
            "kind": "any_of",
            "conditions": [
              {{
                "kind": "cross_under",
                "left": {{
                  "kind": "bin_op", "op": "sub",
                  "left":  {{"kind": "func", "name": "ema", "period": 12}},
                  "right": {{"kind": "func", "name": "ema", "period": 26}}
                }},
                "right": {{
                  "kind": "apply_func", "name": "ema", "period": 9,
                  "input": {{
                    "kind": "bin_op", "op": "sub",
                    "left":  {{"kind": "func", "name": "ema", "period": 12}},
                    "right": {{"kind": "func", "name": "ema", "period": 26}}
                  }}
                }}
              }},
              {{
                "kind": "compare",
                "left": {{"kind": "field", "field": "close"}},
                "op": "<",
                "right": {{"kind": "bin_op", "op": "mul",
                           "left": {{"kind": "entry_price"}},
                           "right": {{"kind": "literal", "value": "0.98"}}}}
              }},
              {{
                "kind": "compare",
                "left": {{"kind": "bars_since_entry"}},
                "op": ">=",
                "right": {{"kind": "literal", "value": "72"}}
              }}
            ]
          }}
        ]
      }},
      "then": {{"side": "sell", "quantity": {{"kind": "position_quantity"}}}}
    }}
  ]
 }}
 ```
 Key pattern: `apply_func` wraps any `Expr` tree using the `"input"` field (NOT `"expr"`).
 This enables EMA-of-expression (signal line), WMA-of-expression (Hull MA), or std_dev-of-returns.
 There is NO native `macd` func name — always compose it as `bin_op(sub, func(ema,12), func(ema,26))` as shown above.
 CRITICAL: `apply_func` uses `"input"`, not `"expr"`. Writing `"expr":` will be rejected by the API.
 ## Anti-patterns to avoid
 - Don't use the same indicator for both entry and exit (circular logic)
 - Don't set RSI thresholds at extreme values (< 10 or > 90) — too rare to fire
 - Don't use very short periods (< 5) on high timeframes — noisy
 - Don't use very long periods (> 100) on low timeframes — too slow to react
 - Don't switch to 15m or shorter intervals when results are poor — higher frequency amplifies
  fees and noise, making edge harder to find. Prefer 1h or 4h. If Sharpe is negative across
  intervals, the issue is signal logic, not timeframe — fix the signal before changing interval.
 - Don't create strategies with more than 5-6 conditions — overfitting risk
 - Don't ignore fees — a strategy needs to overcome 0.1% per round trip
- Always gate buy rules with position state "flat" and sell rules with "long"
+- Spot markets are long-only: gate buy (entry) rules with state "flat" and sell (exit) rules with state "long". Never add a short-entry (sell when flat) rule on spot.
- Never add a short-entry (sell when flat) rule — spot markets are long-only
+- Futures markets support both directions: long entry = buy when flat; long exit = sell when long; short entry = sell when flat; short exit (cover) = buy when short. Always include a stop-loss and time exit for both long and short legs.
- Never use an expression object for `quantity` — it must always be a plain decimal string like `"0.001"`
+- Never use a placeholder string for `quantity` — `"ATR_SIZED"`, `"FULL_BALANCE"`, `"dynamic"`, etc. are all invalid and will be rejected.
- Never use a placeholder string for `quantity` — `"ATR_SIZED"`, `"FULL_BALANCE"`, `"dynamic"`, etc. are all invalid and will be rejected. Use `"0.001"` or similar.
+- Don't use large ATR-based sizing numerators. `N/atr(14)` gives BASE units; for BTC (ATR ≈ $2000
-"##
+  on 4h), `1000/atr(14)` ≈ 0.5 BTC ≈ $40k — silently rejected on a $10k account. Keep N ≤ 200
  or use `percent_of_balance`. The condition audit may show entry conditions firing while 0 positions
  open — this is the typical symptom of an oversized ATR quantity.
 - `{{"method":"position_quantity"}}` is WRONG for exit rules — use `{{"kind":"position_quantity"}}` (see Quantity section above).
 {futures_examples}"##,
        futures_examples = if has_futures { FUTURES_SHORT_EXAMPLES } else { "" },
    )
 }
 /// Short-entry and short-exit strategy examples, injected into the system prompt when
 /// futures instruments are present.
 const FUTURES_SHORT_EXAMPLES: &str = r##"
 ### Example 5 — Futures short: EMA trend-following short with ATR stop
 On futures you can also short. Short entry = `"side": "sell"` when `"state": "flat"`;
 short exit (cover) = `"side": "buy"` when `"state": "short"`. Stop-loss for a short
 is price rising above entry, e.g. entry_price * 1.02. You may run long and short legs
 in the same strategy (4 rules total), or a short-only strategy (2 rules).
 ```json
 {
  "type": "rule_based",
  "candle_interval": "4h",
  "rules": [
    {
      "comment": "Short entry: EMA9 crosses below EMA21 while price is below EMA50 (downtrend)",
      "when": {
        "kind": "all_of",
        "conditions": [
          {"kind": "position", "state": "flat"},
          {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "below"},
          {"kind": "ema_trend", "period": 50, "direction": "below"}
        ]
      },
      "then": {"side": "sell", "quantity": {"method": "percent_of_balance", "percent": "10", "asset": "usdc"}}
    },
    {
      "comment": "Short exit: EMA9 crosses back above EMA21, OR 2% stop-loss, OR 48-bar time exit",
      "when": {
        "kind": "all_of",
        "conditions": [
          {"kind": "position", "state": "short"},
          {
            "kind": "any_of",
            "conditions": [
              {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "above"},
              {
                "kind": "compare",
                "left": {"kind": "field", "field": "close"},
                "op": ">",
                "right": {"kind": "bin_op", "op": "mul", "left": {"kind": "entry_price"}, "right": {"kind": "literal", "value": "1.02"}}
              },
              {
                "kind": "compare",
                "left": {"kind": "bars_since_entry"},
                "op": ">=",
                "right": {"kind": "literal", "value": "48"}
              }
            ]
          }
        ]
      },
      "then": {"side": "buy", "quantity": {"kind": "position_quantity"}}
    }
  ]
 }
 ```
 Key short-specific notes:
 - Stop-loss for short = close > entry_price * (1 + stop_pct), e.g. `* 1.02` for 2% stop
 - Take-profit for short = close < entry_price * (1 - target_pct), e.g. `* 0.97` for 3% target
 - Short exit uses `"side": "buy"` with `{"kind": "position_quantity"}` (same as long exit uses sell)
 - `percent_of_balance` for short entry uses `"usdc"` as the asset (the collateral currency)
 ### Example 6 — Futures flip-through-zero: 2-rule EMA trend-follower using `reverse`
 When you always want to be in a position (long during uptrends, short during downtrends),
 use `"reverse": true` to flip from one side to the other in a single order. This uses half
 the round-trip fee count compared to a 4-rule separate-entry/exit approach.
 ```json
 {
  "type": "rule_based",
  "candle_interval": "4h",
  "rules": [
    {
      "comment": "Go long (or flip short→long): EMA9 crosses above EMA21 while above EMA50",
      "when": {
        "kind": "all_of",
        "conditions": [
          {"kind": "any_of", "conditions": [
            {"kind": "position", "state": "flat"},
            {"kind": "position", "state": "short"}
          ]},
          {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "above"},
          {"kind": "ema_trend", "period": 50, "direction": "above"}
        ]
      },
      "then": {"side": "buy", "quantity": {"method": "percent_of_balance", "percent": "10", "asset": "usdc"}, "reverse": true}
    },
    {
      "comment": "Go short (or flip long→short): EMA9 crosses below EMA21 while below EMA50",
      "when": {
        "kind": "all_of",
        "conditions": [
          {"kind": "any_of", "conditions": [
            {"kind": "position", "state": "flat"},
            {"kind": "position", "state": "long"}
          ]},
          {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "below"},
          {"kind": "ema_trend", "period": 50, "direction": "below"}
        ]
      },
      "then": {"side": "sell", "quantity": {"method": "percent_of_balance", "percent": "10", "asset": "usdc"}, "reverse": true}
    }
  ]
 }
 ```
 Key flip-strategy notes:
 - Gate each rule on `flat OR opposite` (using `any_of`) so it fires both on initial entry and on flip
 - `reverse: true` handles the flip math automatically — no need to size for `position_qty + new_qty`
 - This pattern works best for trend-following where you want continuous market exposure
 - Still add a time-based or ATR stop if you want a safety exit when the trend stalls
 ### Example 7 — Futures triple-Supertrend consensus flip
 Multiple Supertrend instances with different period/multiplier combos act as a tiered
 signal. `any_of` fires on the FIRST flip — the fastest line (7/1.5) reacts quickly,
 the slowest (20/3.0) confirms strong trends. `reverse: true` makes it always-in-market:
 the opposite signal is the stop-loss. No explicit stop or time exit needed.
 Varying parameters to tune:
 - Tighter multipliers (1.0–2.0) → more signals, more whipsaws
 - Looser multipliers (2.5–4.0) → fewer signals, longer holds
 - Try `all_of` instead of `any_of` to require consensus across all three (stronger filter)
 ```json
 {{
  "type": "rule_based",
  "candle_interval": "4h",
  "rules": [
    {{
      "comment": "LONG (or flip short→long): any Supertrend flips bullish",
      "when": {{
        "kind": "all_of",
        "conditions": [
          {{"kind": "any_of", "conditions": [
            {{"kind": "position", "state": "flat"}},
            {{"kind": "position", "state": "short"}}
          ]}},
          {{
            "kind": "any_of",
            "conditions": [
              {{"kind": "cross_over", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 7,  "multiplier": "1.5"}}}},
              {{"kind": "cross_over", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 10, "multiplier": "2.0"}}}},
              {{"kind": "cross_over", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 20, "multiplier": "3.0"}}}}
            ]
          }}
        ]
      }},
      "then": {{"side": "buy", "quantity": {{"method": "percent_of_balance", "percent": "5", "asset": "usdc"}}, "reverse": true}}
    }},
    {{
      "comment": "SHORT (or flip long→short): any Supertrend flips bearish",
      "when": {{
        "kind": "all_of",
        "conditions": [
          {{"kind": "any_of", "conditions": [
            {{"kind": "position", "state": "flat"}},
            {{"kind": "position", "state": "long"}}
          ]}},
          {{
            "kind": "any_of",
            "conditions": [
              {{"kind": "cross_under", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 7,  "multiplier": "1.5"}}}},
              {{"kind": "cross_under", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 10, "multiplier": "2.0"}}}},
              {{"kind": "cross_under", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 20, "multiplier": "3.0"}}}}
            ]
          }}
        ]
      }},
      "then": {{"side": "sell", "quantity": {{"method": "percent_of_balance", "percent": "5", "asset": "usdc"}}, "reverse": true}}
    }}
  ]
 }}
 ```
 Key Supertrend-specific notes:
 - `supertrend` ignores `field` — it uses OHLC internally; omit the `field` param
 - `multiplier` controls band width: lower = tighter, more reactive; higher = wider, more stable
 - `any_of` → first flip triggers (responsive); `all_of` → all three must agree (conservative)
 - Gate on position state to prevent re-entries scaling into an existing position"##;
 /// Build the user message for the first iteration (no prior results).
-pub fn initial_prompt(instruments: &[String], candle_intervals: &[String]) -> String {
+/// `prior_summary` contains a formatted summary of results from previous runs, if any.
 pub fn initial_prompt(instruments: &[String], candle_intervals: &[String], prior_summary: Option<&str>, has_futures: bool) -> String {
    let prior_section = match prior_summary {
        Some(s) => format!("{s}\n\n"),
        None => String::new(),
    };
    let starting_instruction = if prior_summary.is_some() {
        "Based on the prior results above:\n\
 - A strategy is \"promising\" if avg_sharpe >= 0.5 AND it traded >= 10 times per instrument. \
 If the best prior strategy meets both thresholds, refine it (tighten entry conditions, \
 adjust the exit, or tune the interval).\n\
 - If no prior strategy reaches avg_sharpe >= 0.5, do NOT repeat the same indicator family. \
 Scan the best-strategies list: if they all use the same core indicator (e.g. all use \
 Bollinger Bands, or all use EMA crossovers, or all use RSI threshold), your FIRST strategy \
 MUST use a completely different indicator family — for example: MACD crossover, ATR \
 breakout, volume spike, donchian channel breakout, or stochastic oscillator. Only after \
 that novelty attempt may you refine prior work.\n\
 - Never repeat an approach that produced 0 trades or fewer than 5 trades per instrument."
    } else {
        "Start with a multi-timeframe trend-following approach with proper risk management \
 (stop-loss, time exit, and ATR-based position sizing)."
    };
    let market_type = if has_futures { "futures" } else { "spot" };
    format!(
-        r#"Design a trading strategy for crypto spot markets.
+        r#"{prior_section}Design a trading strategy for crypto {market_type} markets.
 Available instruments: {}
 Available candle intervals: {}
-Start with a multi-timeframe trend-following approach with proper risk management
+{starting_instruction} Use "usdc" as the quote asset.
 (stop-loss, time exit, and ATR-based position sizing). Use "usdc" as the quote asset.
 Respond with ONLY the strategy JSON."#,
        instruments.join(", "),
--- a/src/swym.rs
+++ b/src/swym.rs
@@ -4,6 +4,21 @@ use serde::{Deserialize, Serialize};
 use serde_json::Value;
 use uuid::Uuid;
 /// Response from `POST /api/v1/strategies/validate`.
 #[derive(Debug, Deserialize)]
 pub struct ValidationResponse {
    pub valid: bool,
    #[serde(default)]
    pub errors: Vec<ValidationError>,
 }
 #[derive(Debug, Deserialize, Clone)]
 pub struct ValidationError {
    /// Dotted JSON path to the offending field. Absent for top-level structural errors.
    pub path: Option<String>,
    pub message: String,
 }
 /// Client for the swym backtesting API.
 pub struct SwymClient {
    client: Client,
@@ -30,6 +45,39 @@ pub struct CandleCoverage {
    pub first_open: String,
    pub last_close: String,
    pub count: u64,
    pub expected_count: Option<u64>,
    pub coverage_pct: Option<f64>,
 }
 /// Response from `GET /api/v1/paper-runs/compare?ids=...`.
 #[derive(Debug, Deserialize)]
 pub struct RunMetricsSummary {
    pub id: Uuid,
    pub status: String,
    pub candle_interval: Option<String>,
    pub total_positions: Option<u32>,
    #[serde(default, deserialize_with = "deserialize_opt_number")]
    pub win_rate: Option<f64>,
    #[serde(default, deserialize_with = "deserialize_opt_number")]
    pub profit_factor: Option<f64>,
    #[serde(default, deserialize_with = "deserialize_opt_number")]
    pub net_pnl: Option<f64>,
    #[serde(default, deserialize_with = "deserialize_opt_number")]
    pub sharpe_ratio: Option<f64>,
    #[serde(default, deserialize_with = "deserialize_opt_number")]
    pub sortino_ratio: Option<f64>,
    #[serde(default, deserialize_with = "deserialize_opt_number")]
    pub calmar_ratio: Option<f64>,
    #[serde(default, deserialize_with = "deserialize_opt_number")]
    pub max_drawdown: Option<f64>,
    #[serde(default, deserialize_with = "deserialize_opt_number")]
    pub pnl_return: Option<f64>,
    #[serde(default, deserialize_with = "deserialize_opt_number")]
    pub avg_win: Option<f64>,
    #[serde(default, deserialize_with = "deserialize_opt_number")]
    pub avg_loss: Option<f64>,
    #[serde(default, deserialize_with = "deserialize_opt_number")]
    pub avg_hold_duration_secs: Option<f64>,
 }
 #[derive(Debug, Clone, Serialize, Deserialize)]
@@ -45,6 +93,15 @@ pub struct BacktestResult {
    pub total_pnl: Option<f64>,
    pub net_pnl: Option<f64>,
    pub sharpe_ratio: Option<f64>,
    pub sortino_ratio: Option<f64>,
    pub calmar_ratio: Option<f64>,
    pub max_drawdown: Option<f64>,
    pub pnl_return: Option<f64>,
    pub avg_win: Option<f64>,
    pub avg_loss: Option<f64>,
    pub max_win: Option<f64>,
    pub max_loss: Option<f64>,
    pub avg_hold_duration_secs: Option<f64>,
    pub total_fees: Option<f64>,
    pub avg_bars_in_trade: Option<f64>,
    pub error_message: Option<String>,
@@ -52,16 +109,10 @@ pub struct BacktestResult {
 }
 impl BacktestResult {
-    /// Parse a backtest response.
+    /// Parse a backtest response using the flat summary fields added in swym patch 8fb410311.
    ///
    /// `exchange`, `base`, `quote` are needed to derive the instrument key used
    /// in the `result_summary.instruments` map (e.g. `binancespot-eth_usdc`).
    pub fn from_response(
        resp: &PaperRunResponse,
        instrument: &str,
        exchange: &str,
        base: &str,
        quote: &str,
    ) -> Self {
        let summary = resp.result_summary.as_ref();
        if let Some(s) = summary {
@@ -70,28 +121,47 @@ impl BacktestResult {
            tracing::debug!("[{instrument}] result_summary: null");
        }
-        // The API key for per-instrument stats: "binance_spot" + "eth" + "usdc" → "binancespot-eth_usdc"
+        let total_positions = summary.and_then(|s| s["total_positions"].as_u64().map(|v| v as u32));
-        let inst_key = format!("{}-{}_{}", exchange.replace('_', ""), base, quote);
+        let winning_positions = summary.and_then(|s| s["winning_positions"].as_u64().map(|v| v as u32));
-
+        let losing_positions = summary.and_then(|s| s["losing_positions"].as_u64().map(|v| v as u32));
-        let total_positions = summary.and_then(|s| {
+        let win_rate = summary.and_then(|s| parse_number(&s["win_rate"]));
-            s["backtest_metadata"]["position_count"].as_u64().map(|v| v as u32)
+        let profit_factor = summary.and_then(|s| parse_number(&s["profit_factor"]));
-        });
+        let net_pnl = summary.and_then(|s| parse_number(&s["net_pnl"]));
-
+        let total_pnl = summary.and_then(|s| parse_number(&s["total_pnl"]));
-        let inst_stats = summary.and_then(|s| s["instruments"].get(&inst_key));
+        let sharpe_ratio = summary.and_then(|s| parse_number(&s["sharpe_ratio"]));
        let sortino_ratio = summary.and_then(|s| parse_number(&s["sortino_ratio"]));
        let calmar_ratio = summary.and_then(|s| parse_number(&s["calmar_ratio"]));
        let max_drawdown = summary.and_then(|s| parse_number(&s["max_drawdown"]));
        let pnl_return = summary.and_then(|s| parse_number(&s["pnl_return"]));
        let avg_win = summary.and_then(|s| parse_number(&s["avg_win"]));
        let avg_loss = summary.and_then(|s| parse_number(&s["avg_loss"]));
        let max_win = summary.and_then(|s| parse_number(&s["max_win"]));
        let max_loss = summary.and_then(|s| parse_number(&s["max_loss"]));
        let avg_hold_duration_secs = summary.and_then(|s| parse_number(&s["avg_hold_duration_secs"]));
        let total_fees = summary.and_then(|s| parse_number(&s["total_fees"]));
        Self {
            run_id: resp.id,
            instrument: instrument.to_string(),
            status: resp.status.clone(),
            total_positions,
-            winning_positions: None,
+            winning_positions,
-            losing_positions: None,
+            losing_positions,
-            win_rate: inst_stats.and_then(|s| parse_ratio_value(&s["win_rate"])),
+            win_rate,
-            profit_factor: inst_stats.and_then(|s| parse_ratio_value(&s["profit_factor"])),
+            profit_factor,
-            total_pnl: inst_stats.and_then(|s| parse_decimal_str(&s["pnl"])),
+            total_pnl,
-            net_pnl: inst_stats.and_then(|s| parse_decimal_str(&s["pnl"])),
+            net_pnl,
-            sharpe_ratio: inst_stats.and_then(|s| parse_ratio_value(&s["sharpe_ratio"])),
+            sharpe_ratio,
-            total_fees: None,
+            sortino_ratio,
            calmar_ratio,
            max_drawdown,
            pnl_return,
            avg_win,
            avg_loss,
            max_win,
            max_loss,
            avg_hold_duration_secs,
            total_fees,
            avg_bars_in_trade: None,
            error_message: resp.error_message.clone(),
            condition_audit_summary: summary.and_then(|s| s.get("condition_audit_summary").cloned()),
@@ -116,6 +186,12 @@ impl BacktestResult {
            self.net_pnl.unwrap_or(0.0),
            self.sharpe_ratio.unwrap_or(0.0),
        );
        if let Some(sortino) = self.sortino_ratio {
            s.push_str(&format!(" sortino={:.2}", sortino));
        }
        if let Some(dd) = self.max_drawdown {
            s.push_str(&format!(" max_dd={:.1}%", dd * 100.0));
        }
        if self.total_positions.unwrap_or(0) == 0 {
            if let Some(audit) = &self.condition_audit_summary {
                let audit_str = format_audit_summary(audit);
@@ -129,27 +205,32 @@ impl BacktestResult {
    }
    /// Is this result promising enough to warrant out-of-sample validation?
    /// Uses sharpe if available, otherwise falls back to net_pnl > 0.
    pub fn is_promising(&self, min_sharpe: f64, min_trades: u32) -> bool {
-        self.status == "complete"
+        if self.status != "complete" { return false; }
-            && self.sharpe_ratio.unwrap_or(0.0) > min_sharpe
+        if self.total_positions.unwrap_or(0) < min_trades { return false; }
-            && self.total_positions.unwrap_or(0) >= min_trades
+        if self.net_pnl.unwrap_or(0.0) <= 0.0 { return false; }
-            && self.net_pnl.unwrap_or(0.0) > 0.0
+        match self.sharpe_ratio {
            Some(sr) => sr > min_sharpe,
            None => true, // sharpe absent (e.g. 0 trades); net_pnl + trades is sufficient signal
        }
    }
 }
-/// Parse a `{"interval": null, "value": "123.45"}` ratio wrapper.
+/// Parse a numeric JSON value — accepts either a plain JSON number or a decimal string.
-/// Returns `None` for null, missing, or sentinel values (Decimal::MAX ≈ 7.9e28).
+/// Returns `None` for null, missing, or sentinel values (>1e20 in magnitude).
-fn parse_ratio_value(v: &Value) -> Option<f64> {
+fn parse_number(v: &Value) -> Option<f64> {
-    let s = v.get("value")?.as_str()?;
+    let f = v.as_f64().or_else(|| v.as_str()?.parse().ok())?;
    let f: f64 = s.parse().ok()?;
    if f.abs() > 1e20 { None } else { Some(f) }
 }
-/// Parse a plain decimal string JSON value.
+/// Serde deserializer for `Option<f64>` that accepts both JSON numbers and decimal strings.
-/// Returns `None` for null, missing, or sentinel values.
+fn deserialize_opt_number<'de, D>(deserializer: D) -> Result<Option<f64>, D::Error>
-fn parse_decimal_str(v: &Value) -> Option<f64> {
+where
-    let f: f64 = v.as_str()?.parse().ok()?;
+    D: serde::Deserializer<'de>,
-    if f.abs() > 1e20 { None } else { Some(f) }
+{
    let v = Value::deserialize(deserializer)?;
    Ok(parse_number(&v))
 }
 /// Render a condition_audit_summary Value into a compact one-line string.
@@ -254,6 +335,32 @@ impl SwymClient {
        resp.json().await.context("parse candle coverage")
    }
    /// Validate a strategy against the swym DSL schema.
    ///
    /// Calls `POST /api/v1/strategies/validate` and returns a structured list
    /// of all validation errors. Returns `Ok(vec![])` when the strategy is valid.
    /// Returns `Err` only on network or parse failures, not on DSL errors.
    pub async fn validate_strategy(&self, strategy: &Value) -> Result<Vec<ValidationError>> {
        let url = format!("{}/strategies/validate", self.base_url);
        let resp = self
            .client
            .post(&url)
            .json(strategy)
            .send()
            .await
            .context("validate strategy request")?;
        if !resp.status().is_success() {
            let status = resp.status();
            let body = resp.text().await.unwrap_or_default();
            anyhow::bail!("validate strategy {status}: {body}");
        }
        let parsed: ValidationResponse =
            resp.json().await.context("parse validation response")?;
        Ok(parsed.errors)
    }
    /// Submit a backtest run.
    pub async fn submit_backtest(
        &self,
@@ -261,6 +368,7 @@ impl SwymClient {
        instrument_symbol: &str,
        base_asset: &str,
        quote_asset: &str,
        market_kind: &str,
        strategy: &Value,
        starts_at: &str,
        finishes_at: &str,
@@ -278,7 +386,7 @@ impl SwymClient {
                    "name_exchange": instrument_symbol,
                    "underlying": { "base": base_asset, "quote": quote_asset },
                    "quote": "underlying_quote",
-                    "kind": "spot"
+                    "kind": market_kind
                },
                "execution": {
                    "mocked_exchange": instrument_exchange,
@@ -352,6 +460,25 @@ impl SwymClient {
        }
    }
    /// Fetch metrics for multiple completed runs via the compare endpoint.
    /// Batches requests in groups of 50 (API maximum).
    pub async fn compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>> {
        let mut results = Vec::new();
        for chunk in run_ids.chunks(50) {
            let ids = chunk.iter().map(|id| id.to_string()).collect::<Vec<_>>().join(",");
            let url = format!("{}/paper-runs/compare?ids={}", self.base_url, ids);
            let resp = self.client.get(&url).send().await.context("compare runs request")?;
            if !resp.status().is_success() {
                let status = resp.status();
                let body = resp.text().await.unwrap_or_default();
                anyhow::bail!("compare runs {status}: {body}");
            }
            let mut batch: Vec<RunMetricsSummary> = resp.json().await.context("parse compare response")?;
            results.append(&mut batch);
        }
        Ok(results)
    }
    /// Fetch condition audit summary for a completed run.
    pub async fn condition_audit(&self, run_id: Uuid) -> Result<Value> {
        let url = format!("{}/paper-runs/{}/condition-audit", self.base_url, run_id);
Author	SHA1	Message	Date
rob thijssen	11fe79ed25	docs: add CLAUDE.md for future Claude Code instances Add comprehensive guidance document covering architecture, data flows, development commands, DSL schema reference, and common patterns for working with the scout strategy search agent. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-03-12 05:38:28 +02:00
rob thijssen	fcb9a2f553	chore: attempt dedupe guidance in prompt	2026-03-11 18:15:24 +02:00
rob thijssen	75c95f7935	feat: add triple-Supertrend consensus flip as strategy family 7 Adds awareness of the multi-Supertrend any_of flip pattern (based on the reference strategy at swym/assets/reference/supertrend-triple.json, itself a DSL port of the popular TradingView triple-Supertrend script). - prompts.rs: add strategy family 7 (Supertrend consensus flip) with guidance on any_of vs all_of, period/multiplier tuning, and the always-in-market / reverse-as-stop-loss trade-off - prompts.rs: add risk management exception for always-in-market flip strategies (reverse: true means the opposite signal is the stop) - prompts.rs: add Example 7 — correctly gated 2-rule triple-Supertrend flip with position state guards to prevent unintended scale-ins Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 18:40:15 +02:00
rob thijssen	6601da21cc	feat: add reverse flag and symmetric short support to DSL Update scout's schema and system prompt to reflect two upstream swym changes from 2026-03-10: - b535207: symmetric short quantity fix — buy-to-cover now correctly uses position_qty (executor was broken; scout's DSL patterns were already correct and will now work as intended) - 6f58949: reverse flag on Action — new optional "reverse": true field that submits position_qty + configured_qty when an opposite position is open, closing it and opening a new one in the opposite direction in a single order (flip-through-zero) Changes: - dsl-schema.json: add "reverse" boolean to Action definition - prompts.rs: add "Reverse / flip-through-zero" capability section and Example 6 (2-rule EMA flip strategy) to FUTURES_SHORT_EXAMPLES Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 18:28:54 +02:00
rob thijssen	8de3ae5fe1	Add Binance Futures support (long and short) - config.rs: add Instrument::market_kind() mapping exchange name to "spot"/"futures_um"/"futures_cm", and is_futures() helper - swym.rs: submit_backtest() accepts market_kind param; passes it as instrument.kind in the RunConfig instead of hardcoding "spot" - agent.rs: derive has_futures from instruments; pass to both system_prompt() and initial_prompt() - prompts.rs: - system_prompt() accepts has_futures; injects FUTURES_SHORT_EXAMPLES (Example 5: EMA trend-following short with ATR stop) when true - Rewrite position-state anti-patterns to cover both spot (long-only) and futures (long + short) semantics - initial_prompt() accepts has_futures; labels market as "spot" or "futures" and passes flag through to starting instruction context Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 18:13:06 +02:00
rob thijssen	a435d3a99d	Define concrete 'promising' threshold and enforce indicator diversity in ledger-informed prompt - Replace vague "promising metrics" with avg_sharpe >= 0.5 AND >= 10 trades per instrument - Add indicator-family diversity rule: if all prior strategies share the same core indicator (e.g. all Bollinger Bands), the first strategy of the new run must use a different family - Give explicit examples of alternative families: MACD, ATR breakout, volume spike, donchian channel breakout, stochastic oscillator - Extend the no-repeat ban to strategies with fewer than 5 trades per instrument Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 14:21:55 +02:00
rob thijssen	b476199de8	Fix ledger context being overridden by prescriptive initial prompt The 13:20:03 run showed the ledger context was counterproductive: the initial prompt's "Start with a multi-timeframe trend-following approach" instruction caused the model to ignore the prior summary and repeat EMA50-based strategies that produced 0 trades across all 15 iterations. Two fixes: - When prior_summary is present, replace the prescriptive starting instruction with one that explicitly defers to the ledger: refine the best prior strategy or try a different approach if all prior results were poor. Prevents the fixed instruction from overriding the context. - Cap ledger entries per unique strategy at 3. A strategy repeated across 11 iterations would contribute 33 entries, drowning out other approaches in the prior summary. 3 entries (one per instrument) is sufficient. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:54:35 +02:00
rob thijssen	d76d3b9061	Use write_all for ledger entries to improve concurrent-write safety writeln!(f, ...) makes two syscalls (data + newline) which can interleave between concurrent processes even with O_APPEND. Serialise entry to bytes and append the newline before write_all() so the entire entry lands in a single write() syscall, which O_APPEND makes atomic on Linux local filesystems for typical entry sizes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:12:38 +02:00
rob thijssen	0945c94cc8	Add --ledger-file arg for explicit ledger path control Defaults to <output_dir>/run_ledger.jsonl as before. Pass --ledger-file to read from (and write to) a specific ledger, enabling multiple ledger files to seed different search campaigns or merge results from separate runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:10:22 +02:00
rob thijssen	a0316be798	Add cross-run learning via run ledger and compare endpoint Persist strategy + run_id to results/run_ledger.jsonl after each backtest. On startup, load the ledger, fetch metrics via the new compare endpoint (batched in groups of 50), group by strategy, rank by avg Sharpe, and inject a summary of the top 5 and worst 3 prior strategies into the iteration-1 prompt. Also consumes the enriched result_summary fields from swym patch e47c18: sortino_ratio, calmar_ratio, max_drawdown, pnl_return, avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs. Sortino and max_drawdown are appended to summary_line() when present. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:05:39 +02:00
rob thijssen	609d64587b	docs: cross-run learnings plan	2026-03-10 13:04:13 +02:00
rob thijssen	6692bdb490	Prompt: fix method vs kind confusion causing 11/15 validation failures The 12:11:39 run shows the model using {"method":"position_quantity"} for every sell rule despite the existing CRITICAL note. Root cause: a contradictory anti-pattern ("Never use an expression object for quantity") was fighting the correct guidance, and the method/kind distinction wasn't emphatic enough. - Expand the CRITICAL note to explicitly contrast: buy uses SizingMethod ("method"), sell uses Expr ("kind") — they are different object types. - Remove the contradictory "never use an expression object" anti-pattern which conflicted with position_quantity and SizingMethod objects. - Add a final anti-pattern bullet as a second reminder of the same mistake. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 12:24:57 +02:00
rob thijssen	36689e3fbb	Prompt: fix field+offset kind omission and add interval guidance Two gaps revealed by the 2026-03-10T11:42:49 run: - Iterations 11-15 all failed with "missing field 'kind'" when the model wrote {"field":"volume","offset":-1} without the required "kind":"field". Expand the existing kind-required note with explicit offset examples. - Iteration 10 switched to 15m unprompted and got sharpe=-0.41 from overtrading. Add anti-pattern note: don't change interval when sharpe is negative — fix the signal logic instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 12:09:18 +02:00
rob thijssen	87d31f8d7e	Use flat result_summary fields from swym patch 8fb410311 BacktestResult::from_response now reads total_positions, winning_positions, losing_positions, win_rate, profit_factor, net_pnl, total_pnl, sharpe_ratio, and total_fees directly from the top-level result_summary object instead of deriving them from backtest_metadata + balance delta. Removes the quote/initial_balance parameters that were only needed for the workaround. Restores the full summary_line format with all metrics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 11:41:53 +02:00
rob thijssen	3892ab37c1	fix: parse actual result_summary structure (backtest_metadata + assets) The API doc described a flat result_summary that doesn't exist yet in the deployed backend. The actual shape is: { backtest_metadata: { position_count }, assets: [...], condition_audit_summary } - total_positions from backtest_metadata.position_count - net_pnl from assets[quote].tear_sheet.balance_end.total - initial_balance - win_rate, profit_factor, sharpe_ratio, total_fees, avg_bars_in_trade remain None until the API adds them from_response() takes quote and initial_balance again to locate the right asset and compute PnL. summary_line() only prints metrics that are actually present. is_promising() falls back to net_pnl>0 + trades when sharpe is unavailable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 10:32:13 +02:00
rob thijssen	85896752f2	fix: ValidationError.path optional, correct position_quantity usage in prompts - ValidationError.path is Option<String> — the API omits it for top-level structural errors. The required String was causing every validate call to fail to deserialize, falling through to submission instead of catching errors. - Log path as "(top-level)" when absent - Prompts: add explicit CRITICAL note that {"method":"position_quantity"} is wrong — position_quantity is an Expr (uses "kind") not a SizingMethod (uses "method"). The new SizingMethod examples caused the model to over-apply "method" to exits universally across the entire run. - Prompts: note that fixed_sum has no multiplier field (additionalProperties) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 09:45:17 +02:00
rob thijssen	ee260ea4d5	fix: parse flat result_summary structure per updated API doc The API result_summary is a flat object with top-level fields (total_positions, win_rate, profit_factor, net_pnl, sharpe_ratio, etc.) not a nested backtest_metadata/instruments map. This was causing all metrics to parse as None/zero for every completed run. - Rewrite BacktestResult::from_response() to read flat fields directly - Replace parse_ratio_value/parse_decimal_str with a single parse_number() that accepts both JSON numbers and decimal strings - Populate winning_positions, losing_positions, total_fees, avg_bars_in_trade (previously always None) - Simplify from_response signature — exchange/base/quote no longer needed - Add expected_count and coverage_pct to CandleCoverage struct - Update all example sell rules to use position_quantity instead of "0.01" - Note that "9999" is a valid sell-all alias (auto-capped by the API) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 09:37:55 +02:00
rob thijssen	3f8d4de7fb	feat: add declarative SizingMethod types from upstream schema Upstream added three new quantity sizing objects alongside DecimalString and Expr: - fixed_sum: buy N quote-currency worth at current price - percent_of_balance: buy N% of named asset's free balance - fixed_units: buy exactly N base units (semantic alias for decimal string) Update dsl-schema.json to include the three definitions and expand Action.quantity.oneOf to reference all five valid forms. Update prompts.rs Quantity section to present the declarative methods as the preferred approach — they're cleaner, more readable, and instrument-agnostic compared to raw Expr composition. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 09:33:43 +02:00
rob thijssen	7e1ff51ae0	feat: validate endpoint integration, Expr quantity sizing, apply_func input field fix - Add /api/v1/strategies/validate client to SwymClient; wire into agent loop before submission so all DSL errors are surfaced in one round-trip - Update dsl-schema.json to upstream: quantity is now oneOf[DecimalString, Expr], ExprApplyFunc uses "input" field (renamed from "expr") - Update prompts: document expression-based quantity sizing (fixed-fraction and ATR-based examples), fix apply_func to use "input" not "expr" throughout - Remove unused ValidationError import Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 09:12:12 +02:00
rob thijssen	5146b3f764	fix: replace negligible 0.001 quantity with meaningful sizing guidance The previous example quantity "0.001" represented <1% of the $10k initial balance for BTC and near-zero exposure for ETH/SOL, making P&L and Sharpe results statistically meaningless. - Update Quantity section with instrument-appropriate reference values (BTC: 0.01 ≈ $800, ETH: 3.0 ≈ $600, SOL: 50.0 ≈ $700) - Replace "0.001" with "0.01" in all four working examples - Explain that 5–10% of $10k initial balance is the sizing target - Explicitly warn against "0.001" as it produces negligible exposure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 07:41:28 +02:00
rob thijssen	759439313e	fix: two Bollinger Band DSL errors from 50-iteration run - bollinger_upper/lower func Exprs must NOT include a "field" parameter; they compute from close internally. Setting "field":"bollinger_upper" causes API rejection: expected one of open/high/low/close/volume. - bollinger Condition "band" only accepts "above_upper" or "below_lower"; "above_lower" and "below_upper" are invalid variants. Both errors appeared repeatedly across the 50-iteration run, causing failed backtest submissions on every Bollinger crossover strategy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 07:39:09 +02:00
rob thijssen	9a7761b452	fix: add hma/ma to unsupported list, clarify quantity exit semantics - Add `hma` (Hull MA) and generic `ma` to unsupported func names — both were used by R1 and rejected by the API - Note that Hull MA can be approximated via apply_func with wma - Add `"all"` to the quantity placeholder blacklist; explain that exit rules must repeat the entry decimal — there is no "close all" concept Observed in run 2026-03-09T20:10:55: 2 iterations failed on hma/ma, 3 iterations skipped by client-side validation on quantity="all". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 20:23:30 +02:00
rob thijssen	8d53d6383d	fix: correct DSL mistakes from observed R1 failures - ADX: clarify it is a FuncName inside {"kind":"func","name":"adx",...}, not a Condition kind — with inline usage example (ADX > 25 filter) - Expr "kind" field: add explicit note that every Expr object requires "kind"; {"field":"close"} without "kind" is rejected by the API - MACD: add Example 4 showing full crossover strategy composed from bin_op(sub, ema12, ema26) and apply_func(ema,9) as signal line All three mistakes were observed across consecutive R1-32B runs and caused repeated API submission failures. Each prompt addition follows the same pattern as the successful bollinger_upper fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 20:11:05 +02:00
rob thijssen	55e41b6795	fix: log R1 thinking, catch repeated DSL errors, add unsupported indicators Three improvements from the 2026-03-09T18:45:04 run analysis: R1 thinking visibility (claude.rs, agent.rs) extract_think_content() returns the raw <think> block content before it is stripped. agent.rs logs it at DEBUG level so 'RUST_LOG=debug' lets you see why the model keeps repeating a mistake — currently the think block is silently discarded after stripping. Prompt: unsupported indicators and bollinger_upper Expr mistake (prompts.rs) - bollinger_upper / bollinger_lower used as {"kind":"bollinger_upper",...} was the dominant failure in iters 9-15. Added explicit correction: use {"kind":"func","name":"bollinger_upper","period":20} in Expr context, never as a standalone kind. - roc, hma, vwap, macd, cci, stoch are NOT in the swym schema. Added a clear "NOT supported" list alongside the supported func names. Repeated API error detection in diagnose_history (agent.rs) If the same "unknown variant `X`" error appears 2+ times in the last 4 iterations, a targeted diagnosis note is emitted naming the bad variant and pointing to the DSL reference. This surfaces in the next iteration prompt so the model gets actionable feedback before it wastes another backtest budget on the same mistake. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 18:58:50 +02:00