docs: add CLAUDE.md for future Claude Code instances

Add comprehensive guidance document covering architecture, data flows, development commands, DSL schema reference, and common patterns for working with the scout strategy search agent. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
chore: attempt dedupe guidance in prompt
2026-03-12 05:38:28 +02:00 · 2026-03-11 18:15:24 +02:00 · 2026-03-10 18:40:15 +02:00 · 2026-03-10 18:28:54 +02:00 · 2026-03-10 18:13:06 +02:00 · 2026-03-10 14:21:55 +02:00
7 changed files with 881 additions and 20 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,116 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+`scout` is an autonomous strategy search agent for the [swym](https://swym.rs) backtesting platform. It runs a loop: asks Claude to generate trading strategies → submits backtests to swym → evaluates results → feeds learnings back → repeats. Promising strategies are automatically validated on out-of-sample data to filter overfitting.
+
+## Architecture
+
+### Core Modules
+
+- **`agent.rs`** - Main orchestration logic. Contains the `run()` function that implements the search loop, strategy validation, and learning feedback. Key types: `IterationRecord`, `LedgerEntry`, `validate_strategy()`, `diagnose_history()`.
+- **`claude.rs`** - Claude API client. Handles model communication, JSON extraction from responses, and context length detection for R1-family models with thinking blocks.
+- **`swym.rs`** - Swym backtesting API client. Wraps all swym API calls: candle coverage, strategy validation, backtest submission, polling, and metrics retrieval.
+- **`prompts.rs`** - System and user prompts for the LLM. Generates the DSL schema context and iteration-specific prompts with prior results.
+- **`config.rs`** - CLI argument parsing and configuration. Defines `Cli` struct with all command-line flags and environment variables.
+
+### Key Data Flows
+
+1. **Strategy Generation**: `agent::run()` → `claude::chat()` → extracts JSON strategy → validates → submits to swym
+2. **Backtest Execution**: `swym::submit_backtest()` → `swym::poll_until_done()` → `BacktestResult::from_response()`
+3. **Learning Loop**: `load_prior_summary()` reads `run_ledger.jsonl` → fetches metrics via `swym::compare_runs()` → formats compact summary → appends to iteration prompt
+4. **OOS Validation**: Promising in-sample results trigger re-backtest on held-out data → strategies passing both phases saved to `validated_*.json`
+
+### Important Patterns
+
+- **Deduplication**: Strategies are deduplicated by full JSON serialization using a HashMap (`tested_strategies`). Identical strategies are skipped with a warning.
+- **Validation**: Two-stage validation—client-side (structure, quantity parsing, exit rules) and server-side (DSL schema validation via `/strategies/validate`).
+- **Context Management**: Conversation history is trimmed to keep last 6 messages (3 exchanges) to avoid token limits. Prior results are summarized in the next prompt.
+- **Error Recovery**: Consecutive failures (3×) trigger abort. Transient API errors are logged but don't stop the run.
+- **Ledger Persistence**: Each backtest writes a `LedgerEntry` to `run_ledger.jsonl` for cross-run learning. Uses atomic O_APPEND writes.
+
+## Development Commands
+
+```bash
+# Build
+cargo build
+
+# Run with default config
+cargo run
+
+# Run with custom flags
+cargo run -- \
+  --swym-url https://dev.swym.hanzalova.internal/api/v1 \
+  --max-iterations 50 \
+  --instruments binance_spot:BTCUSDC,binance_spot:ETHUSDC
+
+# Run tests
+cargo test
+
+# Run with debug logging
+RUST_LOG=debug cargo run
+```
+
+## DSL Schema
+
+Strategies are JSON objects with the schema defined in `src/dsl-schema.json`. The DSL uses a rule-based structure with `when` (entry conditions) and `then` (exit actions). Key concepts:
+
+- **Indicators**: `{"kind":"indicator","name":"...","params":{...}}`
+- **Comparators**: `{"kind":"compare","lhs":"...","op":"...","rhs":"..."}`
+- **Functions**: `{"kind":"func","name":"...","args":[...]}`
+
+See `src/dsl-schema.json` for the complete schema and `prompts.rs::system_prompt()` for how it's presented to Claude.
+
+## Model Families
+
+The code supports different Claude model families via `ModelFamily` enum in `config.rs`:
+
+- **Sonnet**: Standard model, no special handling
+- **Opus**: Larger context, higher cost
+- **R1**: Has thinking blocks (`<think>...</think>`) that need to be stripped before JSON extraction
+
+Context length is auto-detected from the server's `/api/v1/models` endpoint (LM Studio) or `/v1/models/{id}` (OpenAI-compatible). Output token budget is set to half the context window.
+
+## Output Files
+
+- `strategy_001.json` through `strategy_NNN.json` - Every strategy attempted (full JSON)
+- `validated_001.json` through `validated_NNN.json` - Strategies that passed OOS validation (includes in-sample + OOS metrics)
+- `best_strategy.json` - Strategy with highest average Sharpe across instruments
+- `run_ledger.jsonl` - Persistent record of all backtests for learning across runs
+
+## Common Tasks
+
+### Adding a new CLI flag
+
+1. Add field to `Cli` struct in `config.rs`
+2. Add clap derive attribute with `#[arg(short, long, env = "VAR_NAME")]`
+3. Use the flag in `agent::run()` via `cli.flag_name`
+
+### Extending the DSL
+
+1. Update `src/dsl-schema.json` with new expression kinds
+2. Add validation logic in `validate_strategy()` if needed
+3. Update prompts in `prompts.rs` to guide the model
+
+### Modifying the learning loop
+
+1. Edit `load_prior_summary()` in `agent.rs` to change how prior results are formatted
+2. Adjust `diagnose_history()` to add new diagnostics or change convergence detection
+3. Update `prompts.rs::iteration_prompt()` to incorporate new information
+
+### Adding new validation checks
+
+Add to `validate_strategy()` in `agent.rs`. Returns `(hard_errors, warnings)` where hard errors block submission and warnings are logged but allow the backtest to proceed.
+
+## Testing Strategy
+
+The codebase uses `anyhow` for error handling and `tracing` for logging. Key test areas:
+
+- Strategy JSON extraction from various response formats
+- Context length detection from LM Studio/OpenAI endpoints
+- Ledger entry serialization/deserialization
+- Backtest result parsing from swym API responses
+- Deduplication logic
+- Convergence detection in `diagnose_history()`
--- a/docs/plan/cross-run-learning.md
+++ b/docs/plan/cross-run-learning.md
@@ -0,0 +1,133 @@
+# Plan: Cross-run learning via run ledger and compare endpoint
+
+## Context
+
+Scout currently starts from scratch every run — no memory of prior iterations. The upstream
+patch `e47c18` adds:
+1. **Enriched `result_summary`**: sortino_ratio, calmar_ratio, max_drawdown, pnl_return,
+   avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs
+2. **Compare endpoint**: `GET /api/v1/paper-runs/compare?ids=uuid1,uuid2,...` returns
+   `RunMetricsSummary` for up to 50 runs in one call
+
+Goal: persist enough state across runs so that iteration 1 of a new run starts informed by
+all previous runs' strategies and outcomes.
+
+## Changes
+
+### 1. Run ledger — persist strategy + run_id per backtest (`src/agent.rs`)
+
+After each successful `run_single_backtest`, append a JSONL entry to `{output_dir}/run_ledger.jsonl`:
+
+```json
+{"run_id":"uuid","instrument":"BTCUSDC","candle_interval":"4h","strategy":{...},"timestamp":"2026-03-10T12:38:15Z"}
+```
+
+One line per instrument-backtest (3 per iteration for 3 instruments). The strategy JSON is
+duplicated across instrument entries for the same iteration — this keeps the format flat and
+self-contained.
+
+Use `OpenOptions::append(true).create(true)` — no locking needed since scout is single-threaded.
+
+### 2. Load prior runs on startup (`src/agent.rs`)
+
+At the top of `run()`, before the iteration loop:
+1. Read `run_ledger.jsonl` if it exists (ignore if missing — first run)
+2. Collect all `run_id`s
+3. Call `swym.compare_runs(&run_ids)` (batching in groups of 50)
+4. Join metrics back to strategies from the ledger
+5. Group by strategy (entries with the same strategy JSON share an iteration)
+6. Rank by average sharpe across instruments
+7. Build a `prior_results_summary: Option<String>` for the initial prompt
+
+### 3. Compare endpoint client (`src/swym.rs`)
+
+Add `RunMetricsSummary` struct:
+
+```rust
+pub struct RunMetricsSummary {
+    pub id: Uuid,
+    pub status: String,
+    pub candle_interval: Option<String>,
+    pub total_positions: Option<u32>,
+    pub win_rate: Option<f64>,
+    pub profit_factor: Option<f64>,
+    pub net_pnl: Option<f64>,
+    pub sharpe_ratio: Option<f64>,
+    pub sortino_ratio: Option<f64>,
+    pub calmar_ratio: Option<f64>,
+    pub max_drawdown: Option<f64>,
+    pub pnl_return: Option<f64>,
+    pub avg_win: Option<f64>,
+    pub avg_loss: Option<f64>,
+    pub max_win: Option<f64>,
+    pub max_loss: Option<f64>,
+    pub avg_hold_duration_secs: Option<f64>,
+}
+```
+
+Add `SwymClient::compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>>`:
+- `GET {base_url}/paper-runs/compare?ids={comma_separated}`
+- Parse JSON array response using `parse_number()` for decimal strings
+
+### 4. Enrich `BacktestResult` with new fields (`src/swym.rs`)
+
+Add to `BacktestResult`: `sortino_ratio`, `calmar_ratio`, `max_drawdown`, `pnl_return`,
+`avg_win`, `avg_loss`, `max_win`, `max_loss`, `avg_hold_duration_secs`.
+
+Parse all in `from_response()` via existing `parse_number()`.
+
+Update `summary_line()` to include `max_dd={:.1}%` and `sortino={:.2}` when present —
+these two are the most useful additions for the model's reasoning.
+
+### 5. Prior-results-aware initial prompt (`src/prompts.rs`)
+
+Modify `initial_prompt()` to accept `prior_summary: Option<&str>`.
+
+When present, insert before the "Design a trading strategy" instruction:
+
+```
+## Learnings from {N} prior backtests across {M} strategies
+
+{top 5 strategies ranked by avg sharpe, each showing:}
+- Interval, rule count, avg metrics across instruments
+- One-line description of the strategy approach (extracted from rule comments)
+- Full strategy JSON for the top 1-2
+
+{compact table of all prior strategies' avg metrics}
+
+Use these insights to avoid repeating failed approaches and to build on what worked.
+```
+
+Limit to ~2000 tokens of prior context to avoid crowding the prompt. If many prior runs,
+show only the top 5 + bottom 3 (worst performers to avoid), plus a count of total runs.
+
+### 6. Ledger entry struct (`src/agent.rs`)
+
+```rust
+#[derive(Serialize, Deserialize)]
+struct LedgerEntry {
+    run_id: Uuid,
+    instrument: String,
+    candle_interval: String,
+    strategy: Value,
+    timestamp: String,
+}
+```
+
+## Files to modify
+
+- `src/swym.rs` — `RunMetricsSummary` struct, `compare_runs()` method, enrich `BacktestResult`
+  with new fields, update `summary_line()`
+- `src/agent.rs` — `LedgerEntry` struct, append-to-ledger after backtest, load-ledger-on-startup,
+  call compare endpoint, build prior summary, pass to initial prompt
+- `src/prompts.rs` — `initial_prompt()` accepts optional prior summary
+
+## Verification
+
+1. `cargo build --release`
+2. Run once → confirm `run_ledger.jsonl` is created with entries
+3. Run again → confirm:
+   - Ledger is loaded, compare endpoint is called
+   - Iteration 1 prompt includes prior results summary (visible at debug log level)
+   - New entries are appended (not overwritten)
+4. Check that enriched metrics (sortino, max_drawdown) appear in summary_line output
--- a/src/agent.rs
+++ b/src/agent.rs
@@ -1,14 +1,26 @@
+use std::io::Write as IoWrite;
 use std::path::Path;
 use std::time::Duration;

 use anyhow::{Context, Result};
+use serde::{Deserialize, Serialize};
 use serde_json::Value;
 use tracing::{debug, error, info, warn};
+use uuid::Uuid;

 use crate::claude::{self, ClaudeClient, Message};
 use crate::config::{Cli, Instrument};
 use crate::prompts;
-use crate::swym::{BacktestResult, SwymClient};
+use crate::swym::{BacktestResult, RunMetricsSummary, SwymClient};
+
+/// Persistent record of a single completed backtest, written to the run ledger.
+#[derive(Debug, Serialize, Deserialize)]
+struct LedgerEntry {
+    run_id: Uuid,
+    instrument: String,
+    candle_interval: String,
+    strategy: Value,
+}

 /// A single iteration's record: strategy + results across instruments.
 #[derive(Debug)]
@@ -190,14 +202,24 @@ pub async fn run(cli: &Cli) -> Result<()> {

    // Load DSL schema for the system prompt
    let schema = include_str!("dsl-schema.json");
-    let system = prompts::system_prompt(schema, claude.family());
+    let has_futures = instruments.iter().any(|i| i.is_futures());
+    let system = prompts::system_prompt(schema, claude.family(), has_futures);
    info!("model family: {}", claude.family().name());

+    // Resolve ledger path: explicit --ledger-file takes precedence, else <output_dir>/run_ledger.jsonl
+    let ledger_path = cli.ledger_file.clone().unwrap_or_else(|| cli.output_dir.join("run_ledger.jsonl"));
+    info!("ledger: {}", ledger_path.display());
+
+    // Load prior runs from ledger and build cross-run context for iteration 1
+    let prior_summary = load_prior_summary(&ledger_path, &swym).await;
+
    // Agent state
    let mut history: Vec<IterationRecord> = Vec::new();
    let mut conversation: Vec<Message> = Vec::new();
    let mut best_strategy: Option<(f64, Value)> = None; // (avg_sharpe, strategy)
    let mut consecutive_failures = 0u32;
+    // Deduplication: track canonical strategy JSON → first iteration it was tested.
+    let mut tested_strategies: std::collections::HashMap<String, u32> = std::collections::HashMap::new();

    let instrument_names: Vec<String> = instruments.iter().map(|i| i.symbol.clone()).collect();

@@ -206,7 +228,7 @@ pub async fn run(cli: &Cli) -> Result<()> {

        // Build the user prompt
        let user_msg = if iteration == 1 {
-            prompts::initial_prompt(&instrument_names, &available_intervals)
+            prompts::initial_prompt(&instrument_names, &available_intervals, prior_summary.as_deref(), has_futures)
        } else {
            let results_text = history
                .iter()
@@ -372,6 +394,27 @@ pub async fn run(cli: &Cli) -> Result<()> {
            }
        }

+        // Deduplication check: skip strategies identical to one already tested this run.
+        let strategy_key = serde_json::to_string(&strategy).unwrap_or_default();
+        if let Some(&first_iter) = tested_strategies.get(&strategy_key) {
+            warn!("duplicate strategy (identical to iteration {first_iter}), skipping backtest");
+            let record = IterationRecord {
+                iteration,
+                strategy: strategy.clone(),
+                results: vec![],
+                validation_notes: vec![format!(
+                    "DUPLICATE: this exact strategy was already tested in iteration {first_iter}. \
+                     You submitted identical JSON. You MUST design a completely different strategy — \
+                     different indicator family, different entry conditions, or different timeframe. \
+                     Do NOT submit the same JSON again."
+                )],
+            };
+            info!("{}", record.summary());
+            history.push(record);
+            continue;
+        }
+        tested_strategies.insert(strategy_key, iteration);
+
        // Run backtests against all instruments (in-sample)
        let mut results: Vec<BacktestResult> = Vec::new();

@@ -397,12 +440,13 @@ pub async fn run(cli: &Cli) -> Result<()> {
                            info!("  condition audit: {}", serde_json::to_string_pretty(audit).unwrap_or_default());
                        }
                    }
+                    append_ledger_entry(&ledger_path, &result, &strategy);
                    results.push(result);
                }
                Err(e) => {
                    warn!("  backtest failed for {}: {e:#}", inst.symbol);
                    results.push(BacktestResult {
-                        run_id: uuid::Uuid::nil(),
+                        run_id: Uuid::nil(),
                        instrument: inst.symbol.clone(),
                        status: "failed".to_string(),
                        total_positions: None,
@@ -413,6 +457,15 @@ pub async fn run(cli: &Cli) -> Result<()> {
                        total_pnl: None,
                        net_pnl: None,
                        sharpe_ratio: None,
+                        sortino_ratio: None,
+                        calmar_ratio: None,
+                        max_drawdown: None,
+                        pnl_return: None,
+                        avg_win: None,
+                        avg_loss: None,
+                        max_win: None,
+                        max_loss: None,
+                        avg_hold_duration_secs: None,
                        total_fees: None,
                        avg_bars_in_trade: None,
                        error_message: Some(e.to_string()),
@@ -550,6 +603,7 @@ async fn run_single_backtest(
            &inst.symbol,
            &inst.base(),
            &inst.quote(),
+            inst.market_kind(),
            strategy,
            starts_at,
            finishes_at,
@@ -573,6 +627,179 @@ async fn run_single_backtest(
    Ok(BacktestResult::from_response(&final_resp, &inst.symbol))
 }

+/// Append a ledger entry for a completed backtest so future runs can learn from it.
+fn append_ledger_entry(ledger: &Path, result: &BacktestResult, strategy: &Value) {
+    // Skip nil run_ids (error placeholders)
+    if result.run_id == Uuid::nil() {
+        return;
+    }
+    let entry = LedgerEntry {
+        run_id: result.run_id,
+        instrument: result.instrument.clone(),
+        candle_interval: strategy["candle_interval"]
+            .as_str()
+            .unwrap_or("?")
+            .to_string(),
+        strategy: strategy.clone(),
+    };
+    // Append newline inside the serialised bytes so the entire write is a single
+    // write_all() syscall — O_APPEND + single write() is atomic on Linux local
+    // filesystems, making concurrent instances safe for typical entry sizes.
+    let mut bytes = match serde_json::to_vec(&entry) {
+        Ok(b) => b,
+        Err(e) => {
+            warn!("could not serialize ledger entry: {e}");
+            return;
+        }
+    };
+    bytes.push(b'\n');
+    if let Err(e) = std::fs::OpenOptions::new()
+        .append(true)
+        .create(true)
+        .open(ledger)
+        .and_then(|mut f| f.write_all(&bytes))
+    {
+        warn!("could not write ledger entry: {e}");
+    }
+}
+
+/// Load the run ledger, fetch metrics via the compare endpoint, and return a compact
+/// prior-results summary string for the initial prompt.  Returns `None` if the ledger
+/// is absent, empty, or the compare call fails.
+async fn load_prior_summary(ledger: &Path, swym: &SwymClient) -> Option<String> {
+    let path = ledger;
+    let contents = std::fs::read_to_string(&path).ok()?;
+
+    // Parse all ledger entries
+    let entries: Vec<LedgerEntry> = contents
+        .lines()
+        .filter(|l| !l.trim().is_empty())
+        .filter_map(|l| serde_json::from_str(l).ok())
+        .collect();
+    if entries.is_empty() {
+        return None;
+    }
+    info!("loaded {} ledger entries from previous runs", entries.len());
+
+    // Fetch metrics for all run_ids
+    let run_ids: Vec<Uuid> = entries.iter().map(|e| e.run_id).collect();
+    let metrics = match swym.compare_runs(&run_ids).await {
+        Ok(m) => m,
+        Err(e) => {
+            warn!("could not fetch prior run metrics: {e}");
+            return None;
+        }
+    };
+
+    // Build a map from run_id → metrics
+    let metrics_map: std::collections::HashMap<Uuid, &RunMetricsSummary> =
+        metrics.iter().map(|m| (m.id, m)).collect();
+
+    // Group entries by strategy (use candle_interval + rules fingerprint)
+    // We use the full strategy JSON as the grouping key.
+    let mut strategy_groups: std::collections::HashMap<String, Vec<(&LedgerEntry, Option<&RunMetricsSummary>)>> =
+        std::collections::HashMap::new();
+    // Cap at 3 entries per unique strategy (one per instrument is enough).
+    // Without this, a strategy repeated across many iterations swamps the summary.
+    for entry in &entries {
+        let key = serde_json::to_string(&entry.strategy).unwrap_or_default();
+        let group = strategy_groups.entry(key).or_default();
+        if group.len() < 3 {
+            let m = metrics_map.get(&entry.run_id).copied();
+            group.push((entry, m));
+        }
+    }
+
+    // Compute avg sharpe per strategy group
+    let mut strategies: Vec<(f64, &Value, Vec<(&LedgerEntry, Option<&RunMetricsSummary>)>)> = strategy_groups
+        .into_values()
+        .map(|group| {
+            let sharpes: Vec<f64> = group
+                .iter()
+                .filter_map(|(_, m)| m.and_then(|m| m.sharpe_ratio))
+                .collect();
+            let avg_sharpe = if sharpes.is_empty() {
+                f64::NEG_INFINITY
+            } else {
+                sharpes.iter().sum::<f64>() / sharpes.len() as f64
+            };
+            let strategy = &group[0].0.strategy;
+            (avg_sharpe, strategy, group)
+        })
+        .collect();
+    strategies.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap_or(std::cmp::Ordering::Equal));
+
+    let total_strategies = strategies.len();
+    let total_backtests = entries.len();
+
+    // Build summary text — top 5 + bottom 3 (if distinct), capped at ~2000 chars
+    let mut lines = vec![format!(
+        "## Learnings from {} prior backtests across {} strategies\n",
+        total_backtests, total_strategies
+    )];
+    lines.push("### Best strategies (ranked by avg Sharpe):".to_string());
+
+    let show_top = strategies.len().min(5);
+    for (avg_sharpe, strategy, group) in strategies.iter().take(show_top) {
+        let interval = strategy["candle_interval"].as_str().unwrap_or("?");
+        let rule_count = strategy["rules"].as_array().map(|r| r.len()).unwrap_or(0);
+        // Collect per-instrument metrics
+        let inst_lines: Vec<String> = group
+            .iter()
+            .filter_map(|(entry, m)| {
+                let m = (*m)?;
+                Some(format!(
+                    "    {}: trades={} sharpe={:.3} net_pnl={:.2}{}",
+                    entry.instrument,
+                    m.total_positions.unwrap_or(0),
+                    m.sharpe_ratio.unwrap_or(0.0),
+                    m.net_pnl.unwrap_or(0.0),
+                    m.max_drawdown.map(|d| format!(" max_dd={:.1}%", d * 100.0)).unwrap_or_default(),
+                ))
+            })
+            .collect();
+        // Pull the first rule comment as a strategy description
+        let description = strategy["rules"][0]["comment"]
+            .as_str()
+            .unwrap_or("(no description)");
+        lines.push(format!(
+            "\n  [{interval}, {rule_count} rules, avg_sharpe={avg_sharpe:.3}] {description}"
+        ));
+        lines.extend(inst_lines);
+        // Include full JSON only for the top 2
+        let rank = strategies.iter().position(|(_, s, _)| std::ptr::eq(*s, *strategy)).unwrap_or(99);
+        if rank < 2 {
+            lines.push(format!(
+                "  strategy JSON: {}",
+                serde_json::to_string(strategy).unwrap_or_default()
+            ));
+        }
+    }
+
+    // Worst 3 (if we have more than 5)
+    if strategies.len() > 5 {
+        lines.push("\n### Worst strategies (avoid repeating these):".to_string());
+        let worst_start = strategies.len().saturating_sub(3);
+        for (avg_sharpe, strategy, _) in strategies.iter().skip(worst_start) {
+            let interval = strategy["candle_interval"].as_str().unwrap_or("?");
+            let description = strategy["rules"][0]["comment"].as_str().unwrap_or("(no description)");
+            lines.push(format!("  [{interval}, avg_sharpe={avg_sharpe:.3}] {description}"));
+        }
+    }
+
+    lines.push(format!(
+        "\nUse these results to avoid repeating failed approaches and build on what worked.\n"
+    ));
+
+    let summary = lines.join("\n");
+    // Truncate to ~6000 chars to stay within prompt budget
+    if summary.len() > 6000 {
+        Some(format!("{}…\n[truncated — {} total strategies]\n", &summary[..5900], total_strategies))
+    } else {
+        Some(summary)
+    }
+}
+
 fn save_validated_strategy(
    output_dir: &Path,
    iteration: u32,
--- a/src/config.rs
+++ b/src/config.rs
@@ -118,6 +118,13 @@ pub struct Cli {
    #[arg(long, default_value = "./results")]
    pub output_dir: PathBuf,

+    /// Path to the run ledger JSONL file used for cross-run learning.
+    /// Defaults to <output_dir>/run_ledger.jsonl when not specified.
+    /// Pass a different path to seed a new run from a specific ledger
+    /// (e.g. a curated export from a previous campaign).
+    #[arg(long)]
+    pub ledger_file: Option<PathBuf>,
+
    /// Poll interval in seconds when waiting for backtest completion.
    #[arg(long, default_value_t = 2)]
    pub poll_interval_secs: u64,
@@ -167,4 +174,22 @@ impl Instrument {
        }
        "usdc".to_string()
    }
+
+    /// Instrument kind for the paper-run config `instrument.kind` field.
+    /// Derived from the exchange identifier (case-insensitive).
+    pub fn market_kind(&self) -> &'static str {
+        let e = self.exchange.to_ascii_lowercase();
+        if e.contains("futures_usd") || e.contains("futures_um") {
+            "futures_um"
+        } else if e.contains("futures_coin") || e.contains("futures_cm") {
+            "futures_cm"
+        } else {
+            "spot"
+        }
+    }
+
+    /// True when this instrument is traded on a futures market.
+    pub fn is_futures(&self) -> bool {
+        self.market_kind() != "spot"
+    }
 }
--- a/src/dsl-schema.json
+++ b/src/dsl-schema.json
@@ -74,6 +74,11 @@
            { "$ref": "#/definitions/SizingFixedUnits" },
            { "$ref": "#/definitions/Expr" }
          ]
+        },
+        "reverse": {
+          "type": "boolean",
+          "default": false,
+          "description": "Flip-through-zero flag (futures only). When true and an opposite position is currently open, the submitted order quantity becomes position_qty + configured_qty, closing the existing position and immediately opening a new one in the opposite direction in a single order. When flat the flag has no effect and configured_qty is used as normal. Omit or set false for standard close-only behaviour."
        }
      }
    },
--- a/src/prompts.rs
+++ b/src/prompts.rs
@@ -4,7 +4,7 @@ use crate::config::ModelFamily;
 ///
 /// Accepts a `ModelFamily` so each family can receive tailored guidance
 /// while sharing the common DSL schema and strategy evaluation rules.
-pub fn system_prompt(dsl_schema: &str, family: &ModelFamily) -> String {
+pub fn system_prompt(dsl_schema: &str, family: &ModelFamily, has_futures: bool) -> String {
    let output_instructions = match family {
        ModelFamily::DeepSeekR1 => {
            "## Output format\n\n\
@@ -103,6 +103,14 @@ Buy a fixed number of base units (semantic alias for a decimal string):
  "right":{{"kind":"func","name":"atr","period":14}}}}
 ```

+CRITICAL — ATR sizing and balance limits: `N/atr(14)` expresses quantity in BASE asset units.
+For BTC, 4h ATR ≈ $1500–3000. So `1000/atr(14)` ≈ 0.4–0.7 BTC ≈ $32k–56k notional —
+silently rejected on a $10k account (fill returns None, 0 positions open, no error shown).
+The numerator N represents your intended dollar risk per trade. For a $10k account keep N ≤ 200.
+`200/atr(14)` ≈ 0.07–0.13 BTC ≈ $5.6k–10k notional — fits within a $10k account.
+Prefer `percent_of_balance` for most sizing. Only reach for ATR-based Expr sizing when you need
+volatility-scaled position risk, and keep the numerator proportional to your risk tolerance.
+
 **4. Exit rules** — use `position_quantity` to close the exact open size:
 ```json
 {{"kind":"position_quantity"}}
@@ -110,14 +118,35 @@ Buy a fixed number of base units (semantic alias for a decimal string):
 Alternatively, `"9999"` works for exits: sell quantities are automatically capped to the open
 position size, so a large fixed number is equivalent to `position_quantity`.

-CRITICAL mistakes to never make:
- `{{"method":"position_quantity"}}` is WRONG — `position_quantity` is an Expr, not a SizingMethod.
-  CORRECT: `{{"kind":"position_quantity"}}`. The `"method"` field belongs ONLY to the three
-  declarative sizing objects (`fixed_sum`, `percent_of_balance`, `fixed_units`).
+CRITICAL — the `"method"` vs `"kind"` distinction:
+- `"method"` belongs ONLY to the three declarative sizing objects: `fixed_sum`, `percent_of_balance`, `fixed_units`.
+- `"kind"` belongs to Expr objects: `position_quantity`, `bin_op`, `func`, `field`, `literal`, etc.
+- `{{"method":"position_quantity"}}` is ALWAYS WRONG. It will be rejected every time.
+  CORRECT: `{{"kind":"position_quantity"}}`.
+- If you used `{{"method":"percent_of_balance",...}}` for the buy, use `{{"kind":"position_quantity"}}` for the sell.
+  These are different object types — buy uses a SizingMethod (`method`), sell uses an Expr (`kind`).
 - `{{"method":"fixed_sum","amount":"100","multiplier":"2.0"}}` is WRONG — `fixed_sum` has no
  `multiplier` field. Only `amount` is accepted alongside `method`.
 - NEVER add extra fields to SizingMethod objects — they use `additionalProperties: false`.

+### Reverse / flip-through-zero (futures only)
+
+Setting `"reverse": true` on a rule action enables a single-order position flip on futures.
+When an opposite position is open, quantity = `position_qty + configured_qty`, which closes
+the existing position and opens a new one in the opposite direction in one order (fees split
+proportionally). When flat the flag has no effect — `configured_qty` is used normally.
+
+This lets you collapse a 4-rule long+short strategy (separate open/close for each leg) into
+2 rules, reducing round-trip fees and keeping logic compact:
+
+```json
+{{"side": "sell", "quantity": {{"method": "percent_of_balance", "percent": "10", "asset": "usdc"}}, "reverse": true}}
+```
+
+Use `reverse` when you always want to be in a position — the signal flips you from long to
+short (or vice versa) rather than first exiting and then re-entering separately. Do NOT use
+`reverse` on spot markets (short selling is not supported there).
+
 ### Multi-timeframe
 Any expression can reference a different timeframe via "timeframe" field.
 Use higher timeframes as trend filters, lower timeframes for entry precision.
@@ -142,6 +171,13 @@ Use higher timeframes as trend filters, lower timeframes for entry precision.
 6. **Composite / hybrid**: Combine families. Trend filter + mean-reversion entry.
   Momentum confirmation + volatility sizing.

+7. **Supertrend consensus flip (futures only)**: Use `any_of` across multiple
+   Supertrend configs (e.g. period=7/mul=1.5, period=10/mul=2.0, period=20/mul=3.0)
+   so that ANY flip triggers a long or short entry. Combine with `"reverse": true`
+   for an always-in-market approach where the opposite signal is the stop-loss.
+   Varying multiplier tightens/loosens the band; varying period controls sensitivity.
+   Risk: choppy markets generate many whipsaws — best on daily or 4h.
+
 ## Risk management (always include)

 Every strategy MUST have:
@@ -149,6 +185,10 @@ Every strategy MUST have:
 - A time-based exit: use bars_since_entry to avoid holding losers indefinitely
 - Reasonable position sizing: prefer ATR-based or percent-of-balance over fixed quantity

+Exception: always-in-market flip strategies (using `"reverse": true`) do not need an
+explicit stop-loss or time exit — the opposite signal acts as the stop. These are
+only valid on futures. See Example 6 and Example 7.
+
 {output_instructions}

 ## Interpreting backtest results
@@ -157,7 +197,11 @@ When I share results from previous iterations, use them to guide your next strat

 - **Zero trades**: The entry conditions are too restrictive or never co-occur.
  Relax thresholds, simplify conditions, or check if the indicator periods make
-  sense for the candle interval.
+  sense for the candle interval. Also check your position sizing — if using an
+  ATR-based Expr quantity (`N/atr(14)`), a large N can produce a notional value
+  exceeding your account balance (e.g. `1000/atr(14)` on BTC ≈ 0.4 BTC ≈ $32k),
+  which is silently rejected by the fill engine. Switch to `percent_of_balance`
+  or reduce N to ≤ 200 for a $10k account.

 - **Many trades but negative PnL**: The entry signal has no edge, or the exit
  logic is poor. Try different indicator combinations, add trend filters, or
@@ -190,6 +234,9 @@ Common mistakes to NEVER make:
 - `"kind": "expr_field"` does NOT exist. Use `{{"kind":"field","field":"close"}}`.
 - Every Expr object MUST have a `"kind"` field. `{{"field":"close"}}` is WRONG — missing `"kind"`.
  CORRECT: `{{"kind":"field","field":"close"}}`. The `"kind"` is never optional.
+  This applies to ALL field access including offset lookups:
+  `{{"field":"volume","offset":-1}}` is WRONG. CORRECT: `{{"kind":"field","field":"volume","offset":-1}}`.
+  `{{"field":"high","offset":-2}}` is WRONG. CORRECT: `{{"kind":"field","field":"high","offset":-2}}`.
 - `rsi`, `adx`, `supertrend` are NOT valid inside `apply_func`. Use only `apply_func`
  with `ApplyFuncName` values: `highest`, `lowest`, `sma`, `ema`, `wma`, `std_dev`, `sum`,
  `bollinger_upper`, `bollinger_lower`.
@@ -473,26 +520,241 @@ CRITICAL: `apply_func` uses `"input"`, not `"expr"`. Writing `"expr":` will be r
 - Don't set RSI thresholds at extreme values (< 10 or > 90) — too rare to fire
 - Don't use very short periods (< 5) on high timeframes — noisy
 - Don't use very long periods (> 100) on low timeframes — too slow to react
+- Don't switch to 15m or shorter intervals when results are poor — higher frequency amplifies
+  fees and noise, making edge harder to find. Prefer 1h or 4h. If Sharpe is negative across
+  intervals, the issue is signal logic, not timeframe — fix the signal before changing interval.
 - Don't create strategies with more than 5-6 conditions — overfitting risk
 - Don't ignore fees — a strategy needs to overcome 0.1% per round trip
- Always gate buy rules with position state "flat" and sell rules with "long"
- Never add a short-entry (sell when flat) rule — spot markets are long-only
- Never use an expression object for `quantity` — it must always be a plain decimal string like `"0.01"`
- Never use a placeholder string for `quantity` — `"ATR_SIZED"`, `"FULL_BALANCE"`, `"dynamic"`, etc. are all invalid and will be rejected. Use `"0.01"` or similar.
-"##
+- Spot markets are long-only: gate buy (entry) rules with state "flat" and sell (exit) rules with state "long". Never add a short-entry (sell when flat) rule on spot.
+- Futures markets support both directions: long entry = buy when flat; long exit = sell when long; short entry = sell when flat; short exit (cover) = buy when short. Always include a stop-loss and time exit for both long and short legs.
+- Never use a placeholder string for `quantity` — `"ATR_SIZED"`, `"FULL_BALANCE"`, `"dynamic"`, etc. are all invalid and will be rejected.
+- Don't use large ATR-based sizing numerators. `N/atr(14)` gives BASE units; for BTC (ATR ≈ $2000
+  on 4h), `1000/atr(14)` ≈ 0.5 BTC ≈ $40k — silently rejected on a $10k account. Keep N ≤ 200
+  or use `percent_of_balance`. The condition audit may show entry conditions firing while 0 positions
+  open — this is the typical symptom of an oversized ATR quantity.
+- `{{"method":"position_quantity"}}` is WRONG for exit rules — use `{{"kind":"position_quantity"}}` (see Quantity section above).
+{futures_examples}"##,
+        futures_examples = if has_futures { FUTURES_SHORT_EXAMPLES } else { "" },
    )
 }

+/// Short-entry and short-exit strategy examples, injected into the system prompt when
+/// futures instruments are present.
+const FUTURES_SHORT_EXAMPLES: &str = r##"
+
+### Example 5 — Futures short: EMA trend-following short with ATR stop
+
+On futures you can also short. Short entry = `"side": "sell"` when `"state": "flat"`;
+short exit (cover) = `"side": "buy"` when `"state": "short"`. Stop-loss for a short
+is price rising above entry, e.g. entry_price * 1.02. You may run long and short legs
+in the same strategy (4 rules total), or a short-only strategy (2 rules).
+
+```json
+{
+  "type": "rule_based",
+  "candle_interval": "4h",
+  "rules": [
+    {
+      "comment": "Short entry: EMA9 crosses below EMA21 while price is below EMA50 (downtrend)",
+      "when": {
+        "kind": "all_of",
+        "conditions": [
+          {"kind": "position", "state": "flat"},
+          {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "below"},
+          {"kind": "ema_trend", "period": 50, "direction": "below"}
+        ]
+      },
+      "then": {"side": "sell", "quantity": {"method": "percent_of_balance", "percent": "10", "asset": "usdc"}}
+    },
+    {
+      "comment": "Short exit: EMA9 crosses back above EMA21, OR 2% stop-loss, OR 48-bar time exit",
+      "when": {
+        "kind": "all_of",
+        "conditions": [
+          {"kind": "position", "state": "short"},
+          {
+            "kind": "any_of",
+            "conditions": [
+              {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "above"},
+              {
+                "kind": "compare",
+                "left": {"kind": "field", "field": "close"},
+                "op": ">",
+                "right": {"kind": "bin_op", "op": "mul", "left": {"kind": "entry_price"}, "right": {"kind": "literal", "value": "1.02"}}
+              },
+              {
+                "kind": "compare",
+                "left": {"kind": "bars_since_entry"},
+                "op": ">=",
+                "right": {"kind": "literal", "value": "48"}
+              }
+            ]
+          }
+        ]
+      },
+      "then": {"side": "buy", "quantity": {"kind": "position_quantity"}}
+    }
+  ]
+}
+```
+
+Key short-specific notes:
+- Stop-loss for short = close > entry_price * (1 + stop_pct), e.g. `* 1.02` for 2% stop
+- Take-profit for short = close < entry_price * (1 - target_pct), e.g. `* 0.97` for 3% target
+- Short exit uses `"side": "buy"` with `{"kind": "position_quantity"}` (same as long exit uses sell)
+- `percent_of_balance` for short entry uses `"usdc"` as the asset (the collateral currency)
+
+### Example 6 — Futures flip-through-zero: 2-rule EMA trend-follower using `reverse`
+
+When you always want to be in a position (long during uptrends, short during downtrends),
+use `"reverse": true` to flip from one side to the other in a single order. This uses half
+the round-trip fee count compared to a 4-rule separate-entry/exit approach.
+
+```json
+{
+  "type": "rule_based",
+  "candle_interval": "4h",
+  "rules": [
+    {
+      "comment": "Go long (or flip short→long): EMA9 crosses above EMA21 while above EMA50",
+      "when": {
+        "kind": "all_of",
+        "conditions": [
+          {"kind": "any_of", "conditions": [
+            {"kind": "position", "state": "flat"},
+            {"kind": "position", "state": "short"}
+          ]},
+          {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "above"},
+          {"kind": "ema_trend", "period": 50, "direction": "above"}
+        ]
+      },
+      "then": {"side": "buy", "quantity": {"method": "percent_of_balance", "percent": "10", "asset": "usdc"}, "reverse": true}
+    },
+    {
+      "comment": "Go short (or flip long→short): EMA9 crosses below EMA21 while below EMA50",
+      "when": {
+        "kind": "all_of",
+        "conditions": [
+          {"kind": "any_of", "conditions": [
+            {"kind": "position", "state": "flat"},
+            {"kind": "position", "state": "long"}
+          ]},
+          {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "below"},
+          {"kind": "ema_trend", "period": 50, "direction": "below"}
+        ]
+      },
+      "then": {"side": "sell", "quantity": {"method": "percent_of_balance", "percent": "10", "asset": "usdc"}, "reverse": true}
+    }
+  ]
+}
+```
+
+Key flip-strategy notes:
+- Gate each rule on `flat OR opposite` (using `any_of`) so it fires both on initial entry and on flip
+- `reverse: true` handles the flip math automatically — no need to size for `position_qty + new_qty`
+- This pattern works best for trend-following where you want continuous market exposure
+- Still add a time-based or ATR stop if you want a safety exit when the trend stalls
+
+### Example 7 — Futures triple-Supertrend consensus flip
+
+Multiple Supertrend instances with different period/multiplier combos act as a tiered
+signal. `any_of` fires on the FIRST flip — the fastest line (7/1.5) reacts quickly,
+the slowest (20/3.0) confirms strong trends. `reverse: true` makes it always-in-market:
+the opposite signal is the stop-loss. No explicit stop or time exit needed.
+
+Varying parameters to tune:
+- Tighter multipliers (1.0–2.0) → more signals, more whipsaws
+- Looser multipliers (2.5–4.0) → fewer signals, longer holds
+- Try `all_of` instead of `any_of` to require consensus across all three (stronger filter)
+
+```json
+{{
+  "type": "rule_based",
+  "candle_interval": "4h",
+  "rules": [
+    {{
+      "comment": "LONG (or flip short→long): any Supertrend flips bullish",
+      "when": {{
+        "kind": "all_of",
+        "conditions": [
+          {{"kind": "any_of", "conditions": [
+            {{"kind": "position", "state": "flat"}},
+            {{"kind": "position", "state": "short"}}
+          ]}},
+          {{
+            "kind": "any_of",
+            "conditions": [
+              {{"kind": "cross_over", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 7,  "multiplier": "1.5"}}}},
+              {{"kind": "cross_over", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 10, "multiplier": "2.0"}}}},
+              {{"kind": "cross_over", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 20, "multiplier": "3.0"}}}}
+            ]
+          }}
+        ]
+      }},
+      "then": {{"side": "buy", "quantity": {{"method": "percent_of_balance", "percent": "5", "asset": "usdc"}}, "reverse": true}}
+    }},
+    {{
+      "comment": "SHORT (or flip long→short): any Supertrend flips bearish",
+      "when": {{
+        "kind": "all_of",
+        "conditions": [
+          {{"kind": "any_of", "conditions": [
+            {{"kind": "position", "state": "flat"}},
+            {{"kind": "position", "state": "long"}}
+          ]}},
+          {{
+            "kind": "any_of",
+            "conditions": [
+              {{"kind": "cross_under", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 7,  "multiplier": "1.5"}}}},
+              {{"kind": "cross_under", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 10, "multiplier": "2.0"}}}},
+              {{"kind": "cross_under", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 20, "multiplier": "3.0"}}}}
+            ]
+          }}
+        ]
+      }},
+      "then": {{"side": "sell", "quantity": {{"method": "percent_of_balance", "percent": "5", "asset": "usdc"}}, "reverse": true}}
+    }}
+  ]
+}}
+```
+
+Key Supertrend-specific notes:
+- `supertrend` ignores `field` — it uses OHLC internally; omit the `field` param
+- `multiplier` controls band width: lower = tighter, more reactive; higher = wider, more stable
+- `any_of` → first flip triggers (responsive); `all_of` → all three must agree (conservative)
+- Gate on position state to prevent re-entries scaling into an existing position"##;
+
 /// Build the user message for the first iteration (no prior results).
-pub fn initial_prompt(instruments: &[String], candle_intervals: &[String]) -> String {
+/// `prior_summary` contains a formatted summary of results from previous runs, if any.
+pub fn initial_prompt(instruments: &[String], candle_intervals: &[String], prior_summary: Option<&str>, has_futures: bool) -> String {
+    let prior_section = match prior_summary {
+        Some(s) => format!("{s}\n\n"),
+        None => String::new(),
+    };
+    let starting_instruction = if prior_summary.is_some() {
+        "Based on the prior results above:\n\
+- A strategy is \"promising\" if avg_sharpe >= 0.5 AND it traded >= 10 times per instrument. \
+If the best prior strategy meets both thresholds, refine it (tighten entry conditions, \
+adjust the exit, or tune the interval).\n\
+- If no prior strategy reaches avg_sharpe >= 0.5, do NOT repeat the same indicator family. \
+Scan the best-strategies list: if they all use the same core indicator (e.g. all use \
+Bollinger Bands, or all use EMA crossovers, or all use RSI threshold), your FIRST strategy \
+MUST use a completely different indicator family — for example: MACD crossover, ATR \
+breakout, volume spike, donchian channel breakout, or stochastic oscillator. Only after \
+that novelty attempt may you refine prior work.\n\
+- Never repeat an approach that produced 0 trades or fewer than 5 trades per instrument."
+    } else {
+        "Start with a multi-timeframe trend-following approach with proper risk management \
+(stop-loss, time exit, and ATR-based position sizing)."
+    };
+    let market_type = if has_futures { "futures" } else { "spot" };
    format!(
-        r#"Design a trading strategy for crypto spot markets.
+        r#"{prior_section}Design a trading strategy for crypto {market_type} markets.

 Available instruments: {}
 Available candle intervals: {}

-Start with a multi-timeframe trend-following approach with proper risk management
-(stop-loss, time exit, and ATR-based position sizing). Use "usdc" as the quote asset.
+{starting_instruction} Use "usdc" as the quote asset.

 Respond with ONLY the strategy JSON."#,
        instruments.join(", "),
--- a/src/swym.rs
+++ b/src/swym.rs
@@ -49,6 +49,37 @@ pub struct CandleCoverage {
    pub coverage_pct: Option<f64>,
 }

+/// Response from `GET /api/v1/paper-runs/compare?ids=...`.
+#[derive(Debug, Deserialize)]
+pub struct RunMetricsSummary {
+    pub id: Uuid,
+    pub status: String,
+    pub candle_interval: Option<String>,
+    pub total_positions: Option<u32>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub win_rate: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub profit_factor: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub net_pnl: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub sharpe_ratio: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub sortino_ratio: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub calmar_ratio: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub max_drawdown: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub pnl_return: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub avg_win: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub avg_loss: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub avg_hold_duration_secs: Option<f64>,
+}
+
 #[derive(Debug, Clone, Serialize, Deserialize)]
 pub struct BacktestResult {
    pub run_id: Uuid,
@@ -62,6 +93,15 @@ pub struct BacktestResult {
    pub total_pnl: Option<f64>,
    pub net_pnl: Option<f64>,
    pub sharpe_ratio: Option<f64>,
+    pub sortino_ratio: Option<f64>,
+    pub calmar_ratio: Option<f64>,
+    pub max_drawdown: Option<f64>,
+    pub pnl_return: Option<f64>,
+    pub avg_win: Option<f64>,
+    pub avg_loss: Option<f64>,
+    pub max_win: Option<f64>,
+    pub max_loss: Option<f64>,
+    pub avg_hold_duration_secs: Option<f64>,
    pub total_fees: Option<f64>,
    pub avg_bars_in_trade: Option<f64>,
    pub error_message: Option<String>,
@@ -89,6 +129,15 @@ impl BacktestResult {
        let net_pnl = summary.and_then(|s| parse_number(&s["net_pnl"]));
        let total_pnl = summary.and_then(|s| parse_number(&s["total_pnl"]));
        let sharpe_ratio = summary.and_then(|s| parse_number(&s["sharpe_ratio"]));
+        let sortino_ratio = summary.and_then(|s| parse_number(&s["sortino_ratio"]));
+        let calmar_ratio = summary.and_then(|s| parse_number(&s["calmar_ratio"]));
+        let max_drawdown = summary.and_then(|s| parse_number(&s["max_drawdown"]));
+        let pnl_return = summary.and_then(|s| parse_number(&s["pnl_return"]));
+        let avg_win = summary.and_then(|s| parse_number(&s["avg_win"]));
+        let avg_loss = summary.and_then(|s| parse_number(&s["avg_loss"]));
+        let max_win = summary.and_then(|s| parse_number(&s["max_win"]));
+        let max_loss = summary.and_then(|s| parse_number(&s["max_loss"]));
+        let avg_hold_duration_secs = summary.and_then(|s| parse_number(&s["avg_hold_duration_secs"]));
        let total_fees = summary.and_then(|s| parse_number(&s["total_fees"]));

        Self {
@@ -103,6 +152,15 @@ impl BacktestResult {
            total_pnl,
            net_pnl,
            sharpe_ratio,
+            sortino_ratio,
+            calmar_ratio,
+            max_drawdown,
+            pnl_return,
+            avg_win,
+            avg_loss,
+            max_win,
+            max_loss,
+            avg_hold_duration_secs,
            total_fees,
            avg_bars_in_trade: None,
            error_message: resp.error_message.clone(),
@@ -128,6 +186,12 @@ impl BacktestResult {
            self.net_pnl.unwrap_or(0.0),
            self.sharpe_ratio.unwrap_or(0.0),
        );
+        if let Some(sortino) = self.sortino_ratio {
+            s.push_str(&format!(" sortino={:.2}", sortino));
+        }
+        if let Some(dd) = self.max_drawdown {
+            s.push_str(&format!(" max_dd={:.1}%", dd * 100.0));
+        }
        if self.total_positions.unwrap_or(0) == 0 {
            if let Some(audit) = &self.condition_audit_summary {
                let audit_str = format_audit_summary(audit);
@@ -160,6 +224,15 @@ fn parse_number(v: &Value) -> Option<f64> {
    if f.abs() > 1e20 { None } else { Some(f) }
 }

+/// Serde deserializer for `Option<f64>` that accepts both JSON numbers and decimal strings.
+fn deserialize_opt_number<'de, D>(deserializer: D) -> Result<Option<f64>, D::Error>
+where
+    D: serde::Deserializer<'de>,
+{
+    let v = Value::deserialize(deserializer)?;
+    Ok(parse_number(&v))
+}
+
 /// Render a condition_audit_summary Value into a compact one-line string.
 ///
 /// Handles the primary shape from the swym API:
@@ -295,6 +368,7 @@ impl SwymClient {
        instrument_symbol: &str,
        base_asset: &str,
        quote_asset: &str,
+        market_kind: &str,
        strategy: &Value,
        starts_at: &str,
        finishes_at: &str,
@@ -312,7 +386,7 @@ impl SwymClient {
                    "name_exchange": instrument_symbol,
                    "underlying": { "base": base_asset, "quote": quote_asset },
                    "quote": "underlying_quote",
-                    "kind": "spot"
+                    "kind": market_kind
                },
                "execution": {
                    "mocked_exchange": instrument_exchange,
@@ -386,6 +460,25 @@ impl SwymClient {
        }
    }

+    /// Fetch metrics for multiple completed runs via the compare endpoint.
+    /// Batches requests in groups of 50 (API maximum).
+    pub async fn compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>> {
+        let mut results = Vec::new();
+        for chunk in run_ids.chunks(50) {
+            let ids = chunk.iter().map(|id| id.to_string()).collect::<Vec<_>>().join(",");
+            let url = format!("{}/paper-runs/compare?ids={}", self.base_url, ids);
+            let resp = self.client.get(&url).send().await.context("compare runs request")?;
+            if !resp.status().is_success() {
+                let status = resp.status();
+                let body = resp.text().await.unwrap_or_default();
+                anyhow::bail!("compare runs {status}: {body}");
+            }
+            let mut batch: Vec<RunMetricsSummary> = resp.json().await.context("parse compare response")?;
+            results.append(&mut batch);
+        }
+        Ok(results)
+    }
+
    /// Fetch condition audit summary for a completed run.
    pub async fn condition_audit(&self, run_id: Uuid) -> Result<Value> {
        let url = format!("{}/paper-runs/{}/condition-audit", self.base_url, run_id);
Author	SHA1	Message	Date
rob thijssen	11fe79ed25	docs: add CLAUDE.md for future Claude Code instances Add comprehensive guidance document covering architecture, data flows, development commands, DSL schema reference, and common patterns for working with the scout strategy search agent. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-03-12 05:38:28 +02:00
rob thijssen	fcb9a2f553	chore: attempt dedupe guidance in prompt	2026-03-11 18:15:24 +02:00
rob thijssen	75c95f7935	feat: add triple-Supertrend consensus flip as strategy family 7 Adds awareness of the multi-Supertrend any_of flip pattern (based on the reference strategy at swym/assets/reference/supertrend-triple.json, itself a DSL port of the popular TradingView triple-Supertrend script). - prompts.rs: add strategy family 7 (Supertrend consensus flip) with guidance on any_of vs all_of, period/multiplier tuning, and the always-in-market / reverse-as-stop-loss trade-off - prompts.rs: add risk management exception for always-in-market flip strategies (reverse: true means the opposite signal is the stop) - prompts.rs: add Example 7 — correctly gated 2-rule triple-Supertrend flip with position state guards to prevent unintended scale-ins Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 18:40:15 +02:00
rob thijssen	6601da21cc	feat: add reverse flag and symmetric short support to DSL Update scout's schema and system prompt to reflect two upstream swym changes from 2026-03-10: - b535207: symmetric short quantity fix — buy-to-cover now correctly uses position_qty (executor was broken; scout's DSL patterns were already correct and will now work as intended) - 6f58949: reverse flag on Action — new optional "reverse": true field that submits position_qty + configured_qty when an opposite position is open, closing it and opening a new one in the opposite direction in a single order (flip-through-zero) Changes: - dsl-schema.json: add "reverse" boolean to Action definition - prompts.rs: add "Reverse / flip-through-zero" capability section and Example 6 (2-rule EMA flip strategy) to FUTURES_SHORT_EXAMPLES Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 18:28:54 +02:00
rob thijssen	8de3ae5fe1	Add Binance Futures support (long and short) - config.rs: add Instrument::market_kind() mapping exchange name to "spot"/"futures_um"/"futures_cm", and is_futures() helper - swym.rs: submit_backtest() accepts market_kind param; passes it as instrument.kind in the RunConfig instead of hardcoding "spot" - agent.rs: derive has_futures from instruments; pass to both system_prompt() and initial_prompt() - prompts.rs: - system_prompt() accepts has_futures; injects FUTURES_SHORT_EXAMPLES (Example 5: EMA trend-following short with ATR stop) when true - Rewrite position-state anti-patterns to cover both spot (long-only) and futures (long + short) semantics - initial_prompt() accepts has_futures; labels market as "spot" or "futures" and passes flag through to starting instruction context Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 18:13:06 +02:00
rob thijssen	a435d3a99d	Define concrete 'promising' threshold and enforce indicator diversity in ledger-informed prompt - Replace vague "promising metrics" with avg_sharpe >= 0.5 AND >= 10 trades per instrument - Add indicator-family diversity rule: if all prior strategies share the same core indicator (e.g. all Bollinger Bands), the first strategy of the new run must use a different family - Give explicit examples of alternative families: MACD, ATR breakout, volume spike, donchian channel breakout, stochastic oscillator - Extend the no-repeat ban to strategies with fewer than 5 trades per instrument Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 14:21:55 +02:00
rob thijssen	b476199de8	Fix ledger context being overridden by prescriptive initial prompt The 13:20:03 run showed the ledger context was counterproductive: the initial prompt's "Start with a multi-timeframe trend-following approach" instruction caused the model to ignore the prior summary and repeat EMA50-based strategies that produced 0 trades across all 15 iterations. Two fixes: - When prior_summary is present, replace the prescriptive starting instruction with one that explicitly defers to the ledger: refine the best prior strategy or try a different approach if all prior results were poor. Prevents the fixed instruction from overriding the context. - Cap ledger entries per unique strategy at 3. A strategy repeated across 11 iterations would contribute 33 entries, drowning out other approaches in the prior summary. 3 entries (one per instrument) is sufficient. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:54:35 +02:00
rob thijssen	d76d3b9061	Use write_all for ledger entries to improve concurrent-write safety writeln!(f, ...) makes two syscalls (data + newline) which can interleave between concurrent processes even with O_APPEND. Serialise entry to bytes and append the newline before write_all() so the entire entry lands in a single write() syscall, which O_APPEND makes atomic on Linux local filesystems for typical entry sizes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:12:38 +02:00
rob thijssen	0945c94cc8	Add --ledger-file arg for explicit ledger path control Defaults to <output_dir>/run_ledger.jsonl as before. Pass --ledger-file to read from (and write to) a specific ledger, enabling multiple ledger files to seed different search campaigns or merge results from separate runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:10:22 +02:00
rob thijssen	a0316be798	Add cross-run learning via run ledger and compare endpoint Persist strategy + run_id to results/run_ledger.jsonl after each backtest. On startup, load the ledger, fetch metrics via the new compare endpoint (batched in groups of 50), group by strategy, rank by avg Sharpe, and inject a summary of the top 5 and worst 3 prior strategies into the iteration-1 prompt. Also consumes the enriched result_summary fields from swym patch e47c18: sortino_ratio, calmar_ratio, max_drawdown, pnl_return, avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs. Sortino and max_drawdown are appended to summary_line() when present. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:05:39 +02:00
rob thijssen	609d64587b	docs: cross-run learnings plan	2026-03-10 13:04:13 +02:00
rob thijssen	6692bdb490	Prompt: fix method vs kind confusion causing 11/15 validation failures The 12:11:39 run shows the model using {"method":"position_quantity"} for every sell rule despite the existing CRITICAL note. Root cause: a contradictory anti-pattern ("Never use an expression object for quantity") was fighting the correct guidance, and the method/kind distinction wasn't emphatic enough. - Expand the CRITICAL note to explicitly contrast: buy uses SizingMethod ("method"), sell uses Expr ("kind") — they are different object types. - Remove the contradictory "never use an expression object" anti-pattern which conflicted with position_quantity and SizingMethod objects. - Add a final anti-pattern bullet as a second reminder of the same mistake. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 12:24:57 +02:00
rob thijssen	36689e3fbb	Prompt: fix field+offset kind omission and add interval guidance Two gaps revealed by the 2026-03-10T11:42:49 run: - Iterations 11-15 all failed with "missing field 'kind'" when the model wrote {"field":"volume","offset":-1} without the required "kind":"field". Expand the existing kind-required note with explicit offset examples. - Iteration 10 switched to 15m unprompted and got sharpe=-0.41 from overtrading. Add anti-pattern note: don't change interval when sharpe is negative — fix the signal logic instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 12:09:18 +02:00