docs: add CLAUDE.md for future Claude Code instances

Add comprehensive guidance document covering architecture, data flows, development commands, DSL schema reference, and common patterns for working with the scout strategy search agent. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
chore: attempt dedupe guidance in prompt
2026-03-12 05:38:28 +02:00 · 2026-03-11 18:15:24 +02:00 · 2026-03-10 18:40:15 +02:00 · 2026-03-10 18:28:54 +02:00 · 2026-03-10 18:13:06 +02:00 · 2026-03-10 14:21:55 +02:00
8 changed files with 1515 additions and 96 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,116 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+`scout` is an autonomous strategy search agent for the [swym](https://swym.rs) backtesting platform. It runs a loop: asks Claude to generate trading strategies → submits backtests to swym → evaluates results → feeds learnings back → repeats. Promising strategies are automatically validated on out-of-sample data to filter overfitting.
+
+## Architecture
+
+### Core Modules
+
+- **`agent.rs`** - Main orchestration logic. Contains the `run()` function that implements the search loop, strategy validation, and learning feedback. Key types: `IterationRecord`, `LedgerEntry`, `validate_strategy()`, `diagnose_history()`.
+- **`claude.rs`** - Claude API client. Handles model communication, JSON extraction from responses, and context length detection for R1-family models with thinking blocks.
+- **`swym.rs`** - Swym backtesting API client. Wraps all swym API calls: candle coverage, strategy validation, backtest submission, polling, and metrics retrieval.
+- **`prompts.rs`** - System and user prompts for the LLM. Generates the DSL schema context and iteration-specific prompts with prior results.
+- **`config.rs`** - CLI argument parsing and configuration. Defines `Cli` struct with all command-line flags and environment variables.
+
+### Key Data Flows
+
+1. **Strategy Generation**: `agent::run()` → `claude::chat()` → extracts JSON strategy → validates → submits to swym
+2. **Backtest Execution**: `swym::submit_backtest()` → `swym::poll_until_done()` → `BacktestResult::from_response()`
+3. **Learning Loop**: `load_prior_summary()` reads `run_ledger.jsonl` → fetches metrics via `swym::compare_runs()` → formats compact summary → appends to iteration prompt
+4. **OOS Validation**: Promising in-sample results trigger re-backtest on held-out data → strategies passing both phases saved to `validated_*.json`
+
+### Important Patterns
+
+- **Deduplication**: Strategies are deduplicated by full JSON serialization using a HashMap (`tested_strategies`). Identical strategies are skipped with a warning.
+- **Validation**: Two-stage validation—client-side (structure, quantity parsing, exit rules) and server-side (DSL schema validation via `/strategies/validate`).
+- **Context Management**: Conversation history is trimmed to keep last 6 messages (3 exchanges) to avoid token limits. Prior results are summarized in the next prompt.
+- **Error Recovery**: Consecutive failures (3×) trigger abort. Transient API errors are logged but don't stop the run.
+- **Ledger Persistence**: Each backtest writes a `LedgerEntry` to `run_ledger.jsonl` for cross-run learning. Uses atomic O_APPEND writes.
+
+## Development Commands
+
+```bash
+# Build
+cargo build
+
+# Run with default config
+cargo run
+
+# Run with custom flags
+cargo run -- \
+  --swym-url https://dev.swym.hanzalova.internal/api/v1 \
+  --max-iterations 50 \
+  --instruments binance_spot:BTCUSDC,binance_spot:ETHUSDC
+
+# Run tests
+cargo test
+
+# Run with debug logging
+RUST_LOG=debug cargo run
+```
+
+## DSL Schema
+
+Strategies are JSON objects with the schema defined in `src/dsl-schema.json`. The DSL uses a rule-based structure with `when` (entry conditions) and `then` (exit actions). Key concepts:
+
+- **Indicators**: `{"kind":"indicator","name":"...","params":{...}}`
+- **Comparators**: `{"kind":"compare","lhs":"...","op":"...","rhs":"..."}`
+- **Functions**: `{"kind":"func","name":"...","args":[...]}`
+
+See `src/dsl-schema.json` for the complete schema and `prompts.rs::system_prompt()` for how it's presented to Claude.
+
+## Model Families
+
+The code supports different Claude model families via `ModelFamily` enum in `config.rs`:
+
+- **Sonnet**: Standard model, no special handling
+- **Opus**: Larger context, higher cost
+- **R1**: Has thinking blocks (`<think>...</think>`) that need to be stripped before JSON extraction
+
+Context length is auto-detected from the server's `/api/v1/models` endpoint (LM Studio) or `/v1/models/{id}` (OpenAI-compatible). Output token budget is set to half the context window.
+
+## Output Files
+
+- `strategy_001.json` through `strategy_NNN.json` - Every strategy attempted (full JSON)
+- `validated_001.json` through `validated_NNN.json` - Strategies that passed OOS validation (includes in-sample + OOS metrics)
+- `best_strategy.json` - Strategy with highest average Sharpe across instruments
+- `run_ledger.jsonl` - Persistent record of all backtests for learning across runs
+
+## Common Tasks
+
+### Adding a new CLI flag
+
+1. Add field to `Cli` struct in `config.rs`
+2. Add clap derive attribute with `#[arg(short, long, env = "VAR_NAME")]`
+3. Use the flag in `agent::run()` via `cli.flag_name`
+
+### Extending the DSL
+
+1. Update `src/dsl-schema.json` with new expression kinds
+2. Add validation logic in `validate_strategy()` if needed
+3. Update prompts in `prompts.rs` to guide the model
+
+### Modifying the learning loop
+
+1. Edit `load_prior_summary()` in `agent.rs` to change how prior results are formatted
+2. Adjust `diagnose_history()` to add new diagnostics or change convergence detection
+3. Update `prompts.rs::iteration_prompt()` to incorporate new information
+
+### Adding new validation checks
+
+Add to `validate_strategy()` in `agent.rs`. Returns `(hard_errors, warnings)` where hard errors block submission and warnings are logged but allow the backtest to proceed.
+
+## Testing Strategy
+
+The codebase uses `anyhow` for error handling and `tracing` for logging. Key test areas:
+
+- Strategy JSON extraction from various response formats
+- Context length detection from LM Studio/OpenAI endpoints
+- Ledger entry serialization/deserialization
+- Backtest result parsing from swym API responses
+- Deduplication logic
+- Convergence detection in `diagnose_history()`
--- a/docs/plan/cross-run-learning.md
+++ b/docs/plan/cross-run-learning.md
@@ -0,0 +1,133 @@
+# Plan: Cross-run learning via run ledger and compare endpoint
+
+## Context
+
+Scout currently starts from scratch every run — no memory of prior iterations. The upstream
+patch `e47c18` adds:
+1. **Enriched `result_summary`**: sortino_ratio, calmar_ratio, max_drawdown, pnl_return,
+   avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs
+2. **Compare endpoint**: `GET /api/v1/paper-runs/compare?ids=uuid1,uuid2,...` returns
+   `RunMetricsSummary` for up to 50 runs in one call
+
+Goal: persist enough state across runs so that iteration 1 of a new run starts informed by
+all previous runs' strategies and outcomes.
+
+## Changes
+
+### 1. Run ledger — persist strategy + run_id per backtest (`src/agent.rs`)
+
+After each successful `run_single_backtest`, append a JSONL entry to `{output_dir}/run_ledger.jsonl`:
+
+```json
+{"run_id":"uuid","instrument":"BTCUSDC","candle_interval":"4h","strategy":{...},"timestamp":"2026-03-10T12:38:15Z"}
+```
+
+One line per instrument-backtest (3 per iteration for 3 instruments). The strategy JSON is
+duplicated across instrument entries for the same iteration — this keeps the format flat and
+self-contained.
+
+Use `OpenOptions::append(true).create(true)` — no locking needed since scout is single-threaded.
+
+### 2. Load prior runs on startup (`src/agent.rs`)
+
+At the top of `run()`, before the iteration loop:
+1. Read `run_ledger.jsonl` if it exists (ignore if missing — first run)
+2. Collect all `run_id`s
+3. Call `swym.compare_runs(&run_ids)` (batching in groups of 50)
+4. Join metrics back to strategies from the ledger
+5. Group by strategy (entries with the same strategy JSON share an iteration)
+6. Rank by average sharpe across instruments
+7. Build a `prior_results_summary: Option<String>` for the initial prompt
+
+### 3. Compare endpoint client (`src/swym.rs`)
+
+Add `RunMetricsSummary` struct:
+
+```rust
+pub struct RunMetricsSummary {
+    pub id: Uuid,
+    pub status: String,
+    pub candle_interval: Option<String>,
+    pub total_positions: Option<u32>,
+    pub win_rate: Option<f64>,
+    pub profit_factor: Option<f64>,
+    pub net_pnl: Option<f64>,
+    pub sharpe_ratio: Option<f64>,
+    pub sortino_ratio: Option<f64>,
+    pub calmar_ratio: Option<f64>,
+    pub max_drawdown: Option<f64>,
+    pub pnl_return: Option<f64>,
+    pub avg_win: Option<f64>,
+    pub avg_loss: Option<f64>,
+    pub max_win: Option<f64>,
+    pub max_loss: Option<f64>,
+    pub avg_hold_duration_secs: Option<f64>,
+}
+```
+
+Add `SwymClient::compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>>`:
+- `GET {base_url}/paper-runs/compare?ids={comma_separated}`
+- Parse JSON array response using `parse_number()` for decimal strings
+
+### 4. Enrich `BacktestResult` with new fields (`src/swym.rs`)
+
+Add to `BacktestResult`: `sortino_ratio`, `calmar_ratio`, `max_drawdown`, `pnl_return`,
+`avg_win`, `avg_loss`, `max_win`, `max_loss`, `avg_hold_duration_secs`.
+
+Parse all in `from_response()` via existing `parse_number()`.
+
+Update `summary_line()` to include `max_dd={:.1}%` and `sortino={:.2}` when present —
+these two are the most useful additions for the model's reasoning.
+
+### 5. Prior-results-aware initial prompt (`src/prompts.rs`)
+
+Modify `initial_prompt()` to accept `prior_summary: Option<&str>`.
+
+When present, insert before the "Design a trading strategy" instruction:
+
+```
+## Learnings from {N} prior backtests across {M} strategies
+
+{top 5 strategies ranked by avg sharpe, each showing:}
+- Interval, rule count, avg metrics across instruments
+- One-line description of the strategy approach (extracted from rule comments)
+- Full strategy JSON for the top 1-2
+
+{compact table of all prior strategies' avg metrics}
+
+Use these insights to avoid repeating failed approaches and to build on what worked.
+```
+
+Limit to ~2000 tokens of prior context to avoid crowding the prompt. If many prior runs,
+show only the top 5 + bottom 3 (worst performers to avoid), plus a count of total runs.
+
+### 6. Ledger entry struct (`src/agent.rs`)
+
+```rust
+#[derive(Serialize, Deserialize)]
+struct LedgerEntry {
+    run_id: Uuid,
+    instrument: String,
+    candle_interval: String,
+    strategy: Value,
+    timestamp: String,
+}
+```
+
+## Files to modify
+
+- `src/swym.rs` — `RunMetricsSummary` struct, `compare_runs()` method, enrich `BacktestResult`
+  with new fields, update `summary_line()`
+- `src/agent.rs` — `LedgerEntry` struct, append-to-ledger after backtest, load-ledger-on-startup,
+  call compare endpoint, build prior summary, pass to initial prompt
+- `src/prompts.rs` — `initial_prompt()` accepts optional prior summary
+
+## Verification
+
+1. `cargo build --release`
+2. Run once → confirm `run_ledger.jsonl` is created with entries
+3. Run again → confirm:
+   - Ledger is loaded, compare endpoint is called
+   - Iteration 1 prompt includes prior results summary (visible at debug log level)
+   - New entries are appended (not overwritten)
+4. Check that enriched metrics (sortino, max_drawdown) appear in summary_line output
--- a/src/agent.rs
+++ b/src/agent.rs
@@ -1,14 +1,26 @@
+use std::io::Write as IoWrite;
 use std::path::Path;
 use std::time::Duration;

 use anyhow::{Context, Result};
+use serde::{Deserialize, Serialize};
 use serde_json::Value;
 use tracing::{debug, error, info, warn};
+use uuid::Uuid;

 use crate::claude::{self, ClaudeClient, Message};
 use crate::config::{Cli, Instrument};
 use crate::prompts;
-use crate::swym::{BacktestResult, SwymClient};
+use crate::swym::{BacktestResult, RunMetricsSummary, SwymClient};
+
+/// Persistent record of a single completed backtest, written to the run ledger.
+#[derive(Debug, Serialize, Deserialize)]
+struct LedgerEntry {
+    run_id: Uuid,
+    instrument: String,
+    candle_interval: String,
+    strategy: Value,
+}

 /// A single iteration's record: strategy + results across instruments.
 #[derive(Debug)]
@@ -132,7 +144,8 @@ pub async fn run(cli: &Cli) -> Result<()> {

    // Init clients
    let swym = SwymClient::new(&cli.swym_url)?;
-    let claude = ClaudeClient::new(&cli.anthropic_key, &cli.anthropic_url, &cli.model);
+    let mut claude = ClaudeClient::new(&cli.anthropic_key, &cli.anthropic_url, &cli.model);
+    claude.apply_server_limits().await;

    // Check candle coverage for all instruments
    info!(
@@ -189,13 +202,24 @@ pub async fn run(cli: &Cli) -> Result<()> {

    // Load DSL schema for the system prompt
    let schema = include_str!("dsl-schema.json");
-    let system = prompts::system_prompt(schema);
+    let has_futures = instruments.iter().any(|i| i.is_futures());
+    let system = prompts::system_prompt(schema, claude.family(), has_futures);
+    info!("model family: {}", claude.family().name());
+
+    // Resolve ledger path: explicit --ledger-file takes precedence, else <output_dir>/run_ledger.jsonl
+    let ledger_path = cli.ledger_file.clone().unwrap_or_else(|| cli.output_dir.join("run_ledger.jsonl"));
+    info!("ledger: {}", ledger_path.display());
+
+    // Load prior runs from ledger and build cross-run context for iteration 1
+    let prior_summary = load_prior_summary(&ledger_path, &swym).await;

    // Agent state
    let mut history: Vec<IterationRecord> = Vec::new();
    let mut conversation: Vec<Message> = Vec::new();
    let mut best_strategy: Option<(f64, Value)> = None; // (avg_sharpe, strategy)
    let mut consecutive_failures = 0u32;
+    // Deduplication: track canonical strategy JSON → first iteration it was tested.
+    let mut tested_strategies: std::collections::HashMap<String, u32> = std::collections::HashMap::new();

    let instrument_names: Vec<String> = instruments.iter().map(|i| i.symbol.clone()).collect();

@@ -204,7 +228,7 @@ pub async fn run(cli: &Cli) -> Result<()> {

        // Build the user prompt
        let user_msg = if iteration == 1 {
-            prompts::initial_prompt(&instrument_names, &available_intervals)
+            prompts::initial_prompt(&instrument_names, &available_intervals, prior_summary.as_deref(), has_futures)
        } else {
            let results_text = history
                .iter()
@@ -263,14 +287,21 @@ pub async fn run(cli: &Cli) -> Result<()> {
            content: response_text.clone(),
        });

+        // Log R1 reasoning chain at debug level so it can be inspected when
+        // the model makes repeated DSL mistakes (run with RUST_LOG=debug).
+        if let Some(thinking) = claude::extract_think_content(&response_text) {
+            debug!("R1 thinking ({} chars):\n{}", thinking.len(), thinking);
+        }
+
        // Extract strategy JSON
        let strategy = match claude::extract_json(&response_text) {
            Ok(s) => s,
            Err(e) => {
-                warn!("failed to extract strategy JSON: {e}");
+                warn!("failed to extract strategy JSON: {e:#}");
                warn!(
-                    "raw response: {}",
-                    &response_text[..response_text.len().min(500)]
+                    "raw response ({} chars): {}",
+                    response_text.len(),
+                    &response_text[..response_text.len().min(800)]
                );
                consecutive_failures += 1;
                if consecutive_failures >= 3 {
@@ -316,7 +347,7 @@ pub async fn run(cli: &Cli) -> Result<()> {
        let strat_path = cli.output_dir.join(format!("strategy_{iteration:03}.json"));
        std::fs::write(&strat_path, serde_json::to_string_pretty(&strategy)?)?;

-        // Hard validation errors: skip the expensive backtest and give immediate feedback.
+        // Hard client-side validation errors: skip without hitting the API.
        if !hard_errors.is_empty() {
            let record = IterationRecord {
                iteration,
@@ -329,6 +360,61 @@ pub async fn run(cli: &Cli) -> Result<()> {
            continue;
        }

+        // Server-side validation: call /strategies/validate to get ALL DSL errors
+        // at once before submitting a backtest. This avoids burning a full backtest
+        // round-trip on a structurally invalid strategy and gives the model a
+        // complete list of errors to fix in one shot.
+        match swym.validate_strategy(&strategy).await {
+            Ok(api_errors) if !api_errors.is_empty() => {
+                for e in &api_errors {
+                    warn!("  DSL error at {}: {}", e.path.as_deref().unwrap_or("(top-level)"), e.message);
+                }
+                let error_notes: Vec<String> = api_errors
+                    .iter()
+                    .map(|e| format!("DSL error at {}: {}", e.path.as_deref().unwrap_or("(top-level)"), e.message))
+                    .collect();
+                validation_notes.extend(error_notes);
+                let record = IterationRecord {
+                    iteration,
+                    strategy: strategy.clone(),
+                    results: vec![],
+                    validation_notes,
+                };
+                info!("{}", record.summary());
+                history.push(record);
+                continue;
+            }
+            Ok(_) => {
+                // Valid — proceed to backtest
+            }
+            Err(e) => {
+                // Network/parse failure from the validate endpoint — log and proceed
+                // anyway so a transient API issue doesn't stall the run.
+                warn!("  strategy validation request failed (proceeding): {e:#}");
+            }
+        }
+
+        // Deduplication check: skip strategies identical to one already tested this run.
+        let strategy_key = serde_json::to_string(&strategy).unwrap_or_default();
+        if let Some(&first_iter) = tested_strategies.get(&strategy_key) {
+            warn!("duplicate strategy (identical to iteration {first_iter}), skipping backtest");
+            let record = IterationRecord {
+                iteration,
+                strategy: strategy.clone(),
+                results: vec![],
+                validation_notes: vec![format!(
+                    "DUPLICATE: this exact strategy was already tested in iteration {first_iter}. \
+                     You submitted identical JSON. You MUST design a completely different strategy — \
+                     different indicator family, different entry conditions, or different timeframe. \
+                     Do NOT submit the same JSON again."
+                )],
+            };
+            info!("{}", record.summary());
+            history.push(record);
+            continue;
+        }
+        tested_strategies.insert(strategy_key, iteration);
+
        // Run backtests against all instruments (in-sample)
        let mut results: Vec<BacktestResult> = Vec::new();

@@ -354,12 +440,13 @@ pub async fn run(cli: &Cli) -> Result<()> {
                            info!("  condition audit: {}", serde_json::to_string_pretty(audit).unwrap_or_default());
                        }
                    }
+                    append_ledger_entry(&ledger_path, &result, &strategy);
                    results.push(result);
                }
                Err(e) => {
                    warn!("  backtest failed for {}: {e:#}", inst.symbol);
                    results.push(BacktestResult {
-                        run_id: uuid::Uuid::nil(),
+                        run_id: Uuid::nil(),
                        instrument: inst.symbol.clone(),
                        status: "failed".to_string(),
                        total_positions: None,
@@ -370,6 +457,15 @@ pub async fn run(cli: &Cli) -> Result<()> {
                        total_pnl: None,
                        net_pnl: None,
                        sharpe_ratio: None,
+                        sortino_ratio: None,
+                        calmar_ratio: None,
+                        max_drawdown: None,
+                        pnl_return: None,
+                        avg_win: None,
+                        avg_loss: None,
+                        max_win: None,
+                        max_loss: None,
+                        avg_hold_duration_secs: None,
                        total_fees: None,
                        avg_bars_in_trade: None,
                        error_message: Some(e.to_string()),
@@ -507,6 +603,7 @@ async fn run_single_backtest(
            &inst.symbol,
            &inst.base(),
            &inst.quote(),
+            inst.market_kind(),
            strategy,
            starts_at,
            finishes_at,
@@ -527,13 +624,180 @@ async fn run_single_backtest(
        .await
        .context("poll")?;

-    Ok(BacktestResult::from_response(
-        &final_resp,
-        &inst.symbol,
-        &inst.exchange,
-        &inst.base(),
-        &inst.quote(),
-    ))
+    Ok(BacktestResult::from_response(&final_resp, &inst.symbol))
+}
+
+/// Append a ledger entry for a completed backtest so future runs can learn from it.
+fn append_ledger_entry(ledger: &Path, result: &BacktestResult, strategy: &Value) {
+    // Skip nil run_ids (error placeholders)
+    if result.run_id == Uuid::nil() {
+        return;
+    }
+    let entry = LedgerEntry {
+        run_id: result.run_id,
+        instrument: result.instrument.clone(),
+        candle_interval: strategy["candle_interval"]
+            .as_str()
+            .unwrap_or("?")
+            .to_string(),
+        strategy: strategy.clone(),
+    };
+    // Append newline inside the serialised bytes so the entire write is a single
+    // write_all() syscall — O_APPEND + single write() is atomic on Linux local
+    // filesystems, making concurrent instances safe for typical entry sizes.
+    let mut bytes = match serde_json::to_vec(&entry) {
+        Ok(b) => b,
+        Err(e) => {
+            warn!("could not serialize ledger entry: {e}");
+            return;
+        }
+    };
+    bytes.push(b'\n');
+    if let Err(e) = std::fs::OpenOptions::new()
+        .append(true)
+        .create(true)
+        .open(ledger)
+        .and_then(|mut f| f.write_all(&bytes))
+    {
+        warn!("could not write ledger entry: {e}");
+    }
+}
+
+/// Load the run ledger, fetch metrics via the compare endpoint, and return a compact
+/// prior-results summary string for the initial prompt.  Returns `None` if the ledger
+/// is absent, empty, or the compare call fails.
+async fn load_prior_summary(ledger: &Path, swym: &SwymClient) -> Option<String> {
+    let path = ledger;
+    let contents = std::fs::read_to_string(&path).ok()?;
+
+    // Parse all ledger entries
+    let entries: Vec<LedgerEntry> = contents
+        .lines()
+        .filter(|l| !l.trim().is_empty())
+        .filter_map(|l| serde_json::from_str(l).ok())
+        .collect();
+    if entries.is_empty() {
+        return None;
+    }
+    info!("loaded {} ledger entries from previous runs", entries.len());
+
+    // Fetch metrics for all run_ids
+    let run_ids: Vec<Uuid> = entries.iter().map(|e| e.run_id).collect();
+    let metrics = match swym.compare_runs(&run_ids).await {
+        Ok(m) => m,
+        Err(e) => {
+            warn!("could not fetch prior run metrics: {e}");
+            return None;
+        }
+    };
+
+    // Build a map from run_id → metrics
+    let metrics_map: std::collections::HashMap<Uuid, &RunMetricsSummary> =
+        metrics.iter().map(|m| (m.id, m)).collect();
+
+    // Group entries by strategy (use candle_interval + rules fingerprint)
+    // We use the full strategy JSON as the grouping key.
+    let mut strategy_groups: std::collections::HashMap<String, Vec<(&LedgerEntry, Option<&RunMetricsSummary>)>> =
+        std::collections::HashMap::new();
+    // Cap at 3 entries per unique strategy (one per instrument is enough).
+    // Without this, a strategy repeated across many iterations swamps the summary.
+    for entry in &entries {
+        let key = serde_json::to_string(&entry.strategy).unwrap_or_default();
+        let group = strategy_groups.entry(key).or_default();
+        if group.len() < 3 {
+            let m = metrics_map.get(&entry.run_id).copied();
+            group.push((entry, m));
+        }
+    }
+
+    // Compute avg sharpe per strategy group
+    let mut strategies: Vec<(f64, &Value, Vec<(&LedgerEntry, Option<&RunMetricsSummary>)>)> = strategy_groups
+        .into_values()
+        .map(|group| {
+            let sharpes: Vec<f64> = group
+                .iter()
+                .filter_map(|(_, m)| m.and_then(|m| m.sharpe_ratio))
+                .collect();
+            let avg_sharpe = if sharpes.is_empty() {
+                f64::NEG_INFINITY
+            } else {
+                sharpes.iter().sum::<f64>() / sharpes.len() as f64
+            };
+            let strategy = &group[0].0.strategy;
+            (avg_sharpe, strategy, group)
+        })
+        .collect();
+    strategies.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap_or(std::cmp::Ordering::Equal));
+
+    let total_strategies = strategies.len();
+    let total_backtests = entries.len();
+
+    // Build summary text — top 5 + bottom 3 (if distinct), capped at ~2000 chars
+    let mut lines = vec![format!(
+        "## Learnings from {} prior backtests across {} strategies\n",
+        total_backtests, total_strategies
+    )];
+    lines.push("### Best strategies (ranked by avg Sharpe):".to_string());
+
+    let show_top = strategies.len().min(5);
+    for (avg_sharpe, strategy, group) in strategies.iter().take(show_top) {
+        let interval = strategy["candle_interval"].as_str().unwrap_or("?");
+        let rule_count = strategy["rules"].as_array().map(|r| r.len()).unwrap_or(0);
+        // Collect per-instrument metrics
+        let inst_lines: Vec<String> = group
+            .iter()
+            .filter_map(|(entry, m)| {
+                let m = (*m)?;
+                Some(format!(
+                    "    {}: trades={} sharpe={:.3} net_pnl={:.2}{}",
+                    entry.instrument,
+                    m.total_positions.unwrap_or(0),
+                    m.sharpe_ratio.unwrap_or(0.0),
+                    m.net_pnl.unwrap_or(0.0),
+                    m.max_drawdown.map(|d| format!(" max_dd={:.1}%", d * 100.0)).unwrap_or_default(),
+                ))
+            })
+            .collect();
+        // Pull the first rule comment as a strategy description
+        let description = strategy["rules"][0]["comment"]
+            .as_str()
+            .unwrap_or("(no description)");
+        lines.push(format!(
+            "\n  [{interval}, {rule_count} rules, avg_sharpe={avg_sharpe:.3}] {description}"
+        ));
+        lines.extend(inst_lines);
+        // Include full JSON only for the top 2
+        let rank = strategies.iter().position(|(_, s, _)| std::ptr::eq(*s, *strategy)).unwrap_or(99);
+        if rank < 2 {
+            lines.push(format!(
+                "  strategy JSON: {}",
+                serde_json::to_string(strategy).unwrap_or_default()
+            ));
+        }
+    }
+
+    // Worst 3 (if we have more than 5)
+    if strategies.len() > 5 {
+        lines.push("\n### Worst strategies (avoid repeating these):".to_string());
+        let worst_start = strategies.len().saturating_sub(3);
+        for (avg_sharpe, strategy, _) in strategies.iter().skip(worst_start) {
+            let interval = strategy["candle_interval"].as_str().unwrap_or("?");
+            let description = strategy["rules"][0]["comment"].as_str().unwrap_or("(no description)");
+            lines.push(format!("  [{interval}, avg_sharpe={avg_sharpe:.3}] {description}"));
+        }
+    }
+
+    lines.push(format!(
+        "\nUse these results to avoid repeating failed approaches and build on what worked.\n"
+    ));
+
+    let summary = lines.join("\n");
+    // Truncate to ~6000 chars to stay within prompt budget
+    if summary.len() > 6000 {
+        Some(format!("{}…\n[truncated — {} total strategies]\n", &summary[..5900], total_strategies))
+    } else {
+        Some(summary)
+    }
 }

 fn save_validated_strategy(
@@ -662,6 +926,48 @@ pub fn diagnose_history(history: &[IterationRecord]) -> (String, bool) {
        }
    }

+    // --- Repeated API error detection ---
+    // If the same DSL error variant has appeared in 2+ consecutive iterations,
+    // call it out explicitly so the model knows exactly what to fix.
+    {
+        let recent_errors: Vec<String> = history
+            .iter()
+            .rev()
+            .take(4)
+            .flat_map(|rec| rec.results.iter())
+            .filter_map(|r| r.error_message.as_deref())
+            .filter(|e| e.contains("unknown variant"))
+            .map(|e| {
+                // Extract the variant name: "unknown variant `foo`, expected ..."
+                e.split('`')
+                    .nth(1)
+                    .unwrap_or(e)
+                    .to_string()
+            })
+            .collect();
+
+        if recent_errors.len() >= 2 {
+            // Find the most frequent bad variant
+            let mut counts: std::collections::HashMap<&str, usize> = std::collections::HashMap::new();
+            for v in &recent_errors {
+                *counts.entry(v.as_str()).or_default() += 1;
+            }
+            if let Some((bad_variant, count)) = counts.into_iter().max_by_key(|(_, c)| *c) {
+                if count >= 2 {
+                    notes.push(format!(
+                        "⚠ DSL ERROR (repeated {count}×): the swym API rejected \
+                         `{bad_variant}` as an unknown variant. \
+                         Check the 'Critical: expression kinds' section — \
+                         `{bad_variant}` may be a FuncName (use inside \
+                         {{\"kind\":\"func\",\"name\":\"{bad_variant}\",...}}) \
+                         or it may not be supported at all. \
+                         Use ONLY the documented kinds and func names."
+                    ));
+                }
+            }
+        }
+    }
+
    // --- Zero-trade check ---
    let zero_trade_iters = history
        .iter()
--- a/src/claude.rs
+++ b/src/claude.rs
@@ -2,12 +2,20 @@ use anyhow::{Context, Result};
 use reqwest::Client;
 use serde::{Deserialize, Serialize};
 use serde_json::Value;
+use tracing::{info, warn};
+
+use crate::config::ModelFamily;

 pub struct ClaudeClient {
    client: Client,
    api_key: String,
    api_url: String,
    model: String,
+    family: ModelFamily,
+    /// Effective max output tokens, initialised from the family default and
+    /// optionally updated by `apply_server_limits()` after querying the
+    /// server's model metadata.
+    max_output_tokens: u32,
 }

 #[derive(Serialize)]
@@ -43,19 +51,93 @@ pub struct Usage {

 impl ClaudeClient {
    pub fn new(api_key: &str, api_url: &str, model: &str) -> Self {
+        let family = ModelFamily::detect(model);
+        // R1 thinking can take several minutes; use a generous timeout.
+        let timeout_secs = if family.has_thinking() { 300 } else { 120 };
        let client = Client::builder()
-            .timeout(std::time::Duration::from_secs(120))
+            .timeout(std::time::Duration::from_secs(timeout_secs))
            .build()
            .expect("build http client");
+        let max_output_tokens = family.max_output_tokens();
        Self {
            client,
            api_key: api_key.to_string(),
            api_url: api_url.to_string(),
            model: model.to_string(),
+            family,
+            max_output_tokens,
        }
    }

-    /// Send a conversation to Claude and get the text response.
+    pub fn family(&self) -> &ModelFamily {
+        &self.family
+    }
+
+    /// Query the server for the loaded model's actual context length and
+    /// update `max_output_tokens` accordingly.
+    ///
+    /// Uses half the loaded context window for output, leaving the other
+    /// half for the system prompt and conversation history. Falls back to
+    /// the family default if the server does not expose the information.
+    ///
+    /// Tries two endpoints:
+    /// 1. LM Studio `/api/v1/models` — returns `loaded_instances[].config.context_length`
+    /// 2. OpenAI-compat `/v1/models/{id}` — returns `context_length` if present
+    pub async fn apply_server_limits(&mut self) {
+        match self.query_context_length().await {
+            Some(ctx_len) => {
+                // Reserve half the context for input (system prompt + history).
+                let budget = ctx_len / 2;
+                info!(
+                    "server context_length={ctx_len} → max_output_tokens={budget} \
+                     (was {} from family default)",
+                    self.max_output_tokens,
+                );
+                self.max_output_tokens = budget;
+            }
+            None => {
+                info!(
+                    "could not determine server context_length; \
+                     using family default max_output_tokens={}",
+                    self.max_output_tokens,
+                );
+            }
+        }
+    }
+
+    /// Try to discover the loaded context length for the current model.
+    async fn query_context_length(&self) -> Option<u32> {
+        let base = self.api_url.trim_end_matches('/');
+
+        // --- Strategy 1: LM Studio proprietary /api/v1/models ---
+        let lmstudio_url = format!("{base}/api/v1/models");
+        if let Ok(resp) = self.client.get(&lmstudio_url).send().await {
+            if resp.status().is_success() {
+                if let Ok(json) = resp.json::<Value>().await {
+                    if let Some(ctx) = lmstudio_context_length(&json, &self.model) {
+                        return Some(ctx);
+                    }
+                }
+            }
+        }
+
+        // --- Strategy 2: OpenAI-compat /v1/models/{id} ---
+        let oai_url = format!("{base}/v1/models/{}", self.model);
+        if let Ok(resp) = self.client.get(&oai_url).send().await {
+            if resp.status().is_success() {
+                if let Ok(json) = resp.json::<Value>().await {
+                    if let Some(n) = json["context_length"].as_u64() {
+                        return Some(n as u32);
+                    }
+                }
+            }
+        }
+
+        warn!("could not query context_length from server for model {}", self.model);
+        None
+    }
+
+    /// Send a conversation to the model and get the text response.
    pub async fn chat(
        &self,
        system: &str,
@@ -63,7 +145,7 @@ impl ClaudeClient {
    ) -> Result<(String, Option<Usage>)> {
        let body = MessagesRequest {
            model: self.model.clone(),
-            max_tokens: 4096,
+            max_tokens: self.max_output_tokens,
            system: system.to_string(),
            messages: messages.to_vec(),
        };
@@ -98,9 +180,54 @@ impl ClaudeClient {
    }
 }

-/// Extract a JSON object from Claude's response text.
-/// Looks for the first `{` ... `}` block, handling markdown code fences.
+/// Extract the loaded context_length for a model from the LM Studio
+/// `/api/v1/models` response.
+///
+/// Matches on `key` or `id` fields (LM Studio uses `key`; some variants
+/// append a quantization suffix like `@q4_k_m`, so we strip that too).
+fn lmstudio_context_length(json: &Value, model_id: &str) -> Option<u32> {
+    let models = json["models"].as_array()?;
+    let model_base = model_id.split('@').next().unwrap_or(model_id);
+
+    for entry in models {
+        let key = entry["key"].as_str().unwrap_or("");
+        let key_base = key.split('@').next().unwrap_or(key);
+
+        if key_base == model_base || key == model_id {
+            // Prefer the actually-loaded context (loaded_instances[0].config.context_length)
+            // over the theoretical max_context_length.
+            let loaded = entry["loaded_instances"]
+                .as_array()
+                .and_then(|a| a.first())
+                .and_then(|inst| inst["config"]["context_length"].as_u64())
+                .map(|n| n as u32);
+            if loaded.is_some() {
+                return loaded;
+            }
+            // Fall back to max_context_length if no loaded instance info
+            if let Some(n) = entry["max_context_length"].as_u64() {
+                return Some(n as u32);
+            }
+        }
+    }
+    None
+}
+
+/// Return the content of the first `<think>` block, if any.
+/// Used for debug logging of R1 reasoning chains.
+pub fn extract_think_content(text: &str) -> Option<String> {
+    let start = text.find("<think>")? + "<think>".len();
+    let end = text[start..].find("</think>").map(|i| start + i)?;
+    Some(text[start..end].trim().to_string())
+}
+
+/// Extract a JSON object from a model response text.
+/// Handles markdown code fences and R1-style `<think>...</think>` blocks.
 pub fn extract_json(text: &str) -> Result<Value> {
+    // Strip R1-style thinking blocks before looking for JSON
+    let text = strip_think_blocks(text);
+    let text = text.as_ref();
+
    // Strip markdown fences if present
    let cleaned = text
        .replace("```json", "")
@@ -137,3 +264,25 @@ pub fn extract_json(text: &str) -> Result<Value> {

    serde_json::from_str(&cleaned[s..e]).context("parse extracted JSON")
 }
+
+/// Remove `<think>...</think>` blocks emitted by R1-family reasoning models.
+/// Handles nested tags and unterminated blocks (truncated responses).
+fn strip_think_blocks(text: &str) -> std::borrow::Cow<'_, str> {
+    if !text.contains("<think>") {
+        return std::borrow::Cow::Borrowed(text);
+    }
+    let mut out = String::with_capacity(text.len());
+    let mut rest = text;
+    while let Some(start) = rest.find("<think>") {
+        out.push_str(&rest[..start]);
+        rest = &rest[start + "<think>".len()..];
+        if let Some(end) = rest.find("</think>") {
+            rest = &rest[end + "</think>".len()..];
+        } else {
+            // Unterminated — discard the rest (truncated thinking block)
+            rest = "";
+        }
+    }
+    out.push_str(rest);
+    std::borrow::Cow::Owned(out)
+}
--- a/src/config.rs
+++ b/src/config.rs
@@ -2,6 +2,50 @@ use std::path::PathBuf;

 use clap::Parser;

+/// Model family — controls token budgets and prompt style.
+#[derive(Debug, Clone, PartialEq)]
+pub enum ModelFamily {
+    /// DeepSeek-R1 and its distillations: emit `<think>` blocks that count
+    /// against the output-token budget, so we need a much larger max_tokens.
+    DeepSeekR1,
+    /// General instruction-following models (Qwen, Llama, Mistral, …).
+    Generic,
+}
+
+impl ModelFamily {
+    /// Detect family from a model name string (case-insensitive).
+    pub fn detect(model: &str) -> Self {
+        let m = model.to_ascii_lowercase();
+        if m.contains("deepseek-r1") || m.contains("r1-distill") || m.contains("r1_distill") {
+            Self::DeepSeekR1
+        } else {
+            Self::Generic
+        }
+    }
+
+    /// Display name for logging.
+    pub fn name(&self) -> &'static str {
+        match self {
+            Self::DeepSeekR1 => "DeepSeek-R1",
+            Self::Generic => "Generic",
+        }
+    }
+
+    /// Maximum output tokens to request. R1 thinking blocks can be thousands
+    /// of tokens; reserve enough headroom for the JSON after thinking.
+    pub fn max_output_tokens(&self) -> u32 {
+        match self {
+            Self::DeepSeekR1 => 32768,
+            Self::Generic => 8192,
+        }
+    }
+
+    /// Whether this model family emits chain-of-thought before its response.
+    pub fn has_thinking(&self) -> bool {
+        matches!(self, Self::DeepSeekR1)
+    }
+}
+
 /// Autonomous strategy search agent for the swym backtesting platform.
 ///
 /// Runs a loop: ask Claude to generate/refine strategies → submit backtests to swym →
@@ -74,6 +118,13 @@ pub struct Cli {
    #[arg(long, default_value = "./results")]
    pub output_dir: PathBuf,

+    /// Path to the run ledger JSONL file used for cross-run learning.
+    /// Defaults to <output_dir>/run_ledger.jsonl when not specified.
+    /// Pass a different path to seed a new run from a specific ledger
+    /// (e.g. a curated export from a previous campaign).
+    #[arg(long)]
+    pub ledger_file: Option<PathBuf>,
+
    /// Poll interval in seconds when waiting for backtest completion.
    #[arg(long, default_value_t = 2)]
    pub poll_interval_secs: u64,
@@ -123,4 +174,22 @@ impl Instrument {
        }
        "usdc".to_string()
    }
+
+    /// Instrument kind for the paper-run config `instrument.kind` field.
+    /// Derived from the exchange identifier (case-insensitive).
+    pub fn market_kind(&self) -> &'static str {
+        let e = self.exchange.to_ascii_lowercase();
+        if e.contains("futures_usd") || e.contains("futures_um") {
+            "futures_um"
+        } else if e.contains("futures_coin") || e.contains("futures_cm") {
+            "futures_cm"
+        } else {
+            "spot"
+        }
+    }
+
+    /// True when this instrument is traded on a futures market.
+    pub fn is_futures(&self) -> bool {
+        self.market_kind() != "spot"
+    }
 }
--- a/src/dsl-schema.json
+++ b/src/dsl-schema.json
@@ -66,11 +66,53 @@
      "properties": {
        "side": { "type": "string", "enum": ["buy", "sell"] },
        "quantity": {
-          "$ref": "#/definitions/DecimalString",
-          "description": "Per-order size in base asset units, e.g. \"0.001\" for BTC."
+          "description": "Per-order size in base asset units. Fixed decimal string (e.g. \"0.001\"), a declarative SizingMethod object, or a dynamic Expr object. When a method or Expr returns None the order is skipped; negative values are clamped to zero.",
+          "oneOf": [
+            { "$ref": "#/definitions/DecimalString" },
+            { "$ref": "#/definitions/SizingFixedSum" },
+            { "$ref": "#/definitions/SizingPercentOfBalance" },
+            { "$ref": "#/definitions/SizingFixedUnits" },
+            { "$ref": "#/definitions/Expr" }
+          ]
+        },
+        "reverse": {
+          "type": "boolean",
+          "default": false,
+          "description": "Flip-through-zero flag (futures only). When true and an opposite position is currently open, the submitted order quantity becomes position_qty + configured_qty, closing the existing position and immediately opening a new one in the opposite direction in a single order. When flat the flag has no effect and configured_qty is used as normal. Omit or set false for standard close-only behaviour."
        }
      }
    },
+    "SizingFixedSum": {
+      "description": "Buy `amount` worth of quote currency at the current price. qty = amount / current_price.",
+      "type": "object",
+      "required": ["method", "amount"],
+      "additionalProperties": false,
+      "properties": {
+        "method": { "const": "fixed_sum" },
+        "amount": { "$ref": "#/definitions/DecimalString", "description": "Quote-currency amount, e.g. \"500\" means buy $500 worth." }
+      }
+    },
+    "SizingPercentOfBalance": {
+      "description": "Buy percent% of the named asset's free balance worth of base asset. qty = balance(asset) * percent/100 / current_price.",
+      "type": "object",
+      "required": ["method", "percent", "asset"],
+      "additionalProperties": false,
+      "properties": {
+        "method": { "const": "percent_of_balance" },
+        "percent": { "$ref": "#/definitions/DecimalString", "description": "Percentage, e.g. \"2\" means 2% of the free balance." },
+        "asset": { "type": "string", "description": "Asset name to look up, e.g. \"usdc\". Matched case-insensitively." }
+      }
+    },
+    "SizingFixedUnits": {
+      "description": "Buy exactly `units` of base asset. Semantic alias for a fixed decimal quantity.",
+      "type": "object",
+      "required": ["method", "units"],
+      "additionalProperties": false,
+      "properties": {
+        "method": { "const": "fixed_units" },
+        "units": { "$ref": "#/definitions/DecimalString", "description": "Base asset quantity, e.g. \"0.01\" means 0.01 BTC." }
+      }
+    },
    "Rule": {
      "type": "object",
      "required": ["when", "then"],
@@ -280,7 +322,12 @@
        { "$ref": "#/definitions/ExprBinOp" },
        { "$ref": "#/definitions/ExprApplyFunc" },
        { "$ref": "#/definitions/ExprUnaryOp" },
-        { "$ref": "#/definitions/ExprBarsSince" }
+        { "$ref": "#/definitions/ExprBarsSince" },
+        { "$ref": "#/definitions/ExprEntryPrice" },
+        { "$ref": "#/definitions/ExprPositionQuantity" },
+        { "$ref": "#/definitions/ExprUnrealisedPnl" },
+        { "$ref": "#/definitions/ExprBarsSinceEntry" },
+        { "$ref": "#/definitions/ExprBalance" }
      ]
    },
    "ExprLiteral": {
@@ -417,6 +464,55 @@
          "description": "Maximum bars to look back."
        }
      }
+    },
+    "ExprEntryPrice": {
+      "description": "Volume-weighted average entry price of the current open position. Returns None when flat.",
+      "type": "object",
+      "required": ["kind"],
+      "additionalProperties": false,
+      "properties": {
+        "kind": { "const": "entry_price" }
+      }
+    },
+    "ExprPositionQuantity": {
+      "description": "Absolute quantity of the current open position in base asset units. Returns None when flat.",
+      "type": "object",
+      "required": ["kind"],
+      "additionalProperties": false,
+      "properties": {
+        "kind": { "const": "position_quantity" }
+      }
+    },
+    "ExprUnrealisedPnl": {
+      "description": "Estimated unrealised PnL of the current open position in quote asset. Returns None when flat.",
+      "type": "object",
+      "required": ["kind"],
+      "additionalProperties": false,
+      "properties": {
+        "kind": { "const": "unrealised_pnl" }
+      }
+    },
+    "ExprBarsSinceEntry": {
+      "description": "Number of complete primary-interval bars elapsed since the current position was opened. Computed as floor((now - time_enter) / primary_interval_secs). Returns None when flat.",
+      "type": "object",
+      "required": ["kind"],
+      "additionalProperties": false,
+      "properties": {
+        "kind": { "const": "bars_since_entry" }
+      }
+    },
+    "ExprBalance": {
+      "description": "Free balance of the named asset (matched case-insensitively). Returns None when the asset is not found or balance data is unavailable.",
+      "type": "object",
+      "required": ["kind", "asset"],
+      "additionalProperties": false,
+      "properties": {
+        "kind": { "const": "balance" },
+        "asset": {
+          "type": "string",
+          "description": "Internal asset name, e.g. \"usdt\", \"btc\". Case-insensitive."
+        }
+      }
    }
  }
 }
--- a/src/prompts.rs
+++ b/src/prompts.rs
@@ -1,9 +1,28 @@
-/// System prompt for the strategy-generation Claude instance.
+use crate::config::ModelFamily;
+
+/// System prompt for the strategy-generation model.
 ///
-/// This is the most important part of the agent — it defines how Claude
-/// thinks about strategy design, what it knows about the DSL, and how
-/// it should interpret backtest results.
-pub fn system_prompt(dsl_schema: &str) -> String {
+/// Accepts a `ModelFamily` so each family can receive tailored guidance
+/// while sharing the common DSL schema and strategy evaluation rules.
+pub fn system_prompt(dsl_schema: &str, family: &ModelFamily, has_futures: bool) -> String {
+    let output_instructions = match family {
+        ModelFamily::DeepSeekR1 => {
+            "## Output format\n\n\
+             Think through your strategy design carefully before committing to it. \
+             After your thinking, output ONLY a bare JSON object — no markdown fences, \
+             no commentary, no explanation. Start with `{` and end with `}`. \
+             Your thinking will be stripped automatically; only the JSON is used."
+        }
+        ModelFamily::Generic => {
+            "## How to respond\n\n\
+             You must respond with ONLY a valid JSON object — the strategy config.\n\
+             No prose, no markdown explanation, no commentary.\n\
+             Just the raw JSON starting with { and ending with }.\n\n\
+             The JSON must be a valid strategy with \"type\": \"rule_based\".\n\
+             Use \"usdc\" (not \"usdt\") as the quote asset for balance expressions."
+        }
+    };
+
    format!(
        r##"You are a quantitative trading strategy researcher. Your task is to design,
 evaluate, and iteratively refine trading strategies expressed in the swym JSON DSL.
@@ -33,6 +52,10 @@ sma, ema, wma, rsi, std_dev, sum, highest, lowest, atr, supertrend, adx,
 bollinger_upper, bollinger_lower — applied to any candle field (open/high/low/close/volume)
 with configurable period and optional offset.

+These are FuncNames used INSIDE `{{"kind":"func","name":"...","period":N}}` expressions.
+`atr`, `adx`, and `supertrend` use OHLC internally and ignore the `field` parameter.
+To use ADX as a trend-strength filter: `{{"kind":"compare","left":{{"kind":"func","name":"adx","period":14}},"op":">","right":{{"kind":"literal","value":"25"}}}}`
+
 ### Composed indicators (apply_func)
 Apply rolling functions to arbitrary expressions: EMA of EMA, Hull MA (WMA of expression),
 VWAP (sum of close*volume / sum of volume), standard deviation of returns, etc.
@@ -51,11 +74,78 @@ bars_since_entry — complete bars elapsed since position was opened
 balance — free balance of a named asset (e.g. "usdt", "usdc")

 ### Quantity
-Action quantity MUST be a fixed decimal string that parses as a floating-point number,
-e.g. `"quantity": "0.001"`.
-NEVER use an expression object for quantity — only plain decimal strings are accepted.
-NEVER use placeholder strings like `"ATR_SIZED"`, `"FULL_BALANCE"`, `"percent_of_balance"`,
-`"dynamic"`, or any non-numeric string — these will be rejected immediately.
+Action quantity accepts four forms — pick the simplest one for your intent:
+
+**1. Declarative sizing methods (preferred — instrument-agnostic, readable):**
+
+Spend a fixed quote amount (e.g. $500 worth of base at current price):
+```json
+{{"method":"fixed_sum","amount":"500"}}
+```
+
+Spend a percentage of free quote balance (e.g. 5% of USDC):
+```json
+{{"method":"percent_of_balance","percent":"5","asset":"usdc"}}
+```
+
+Buy a fixed number of base units (semantic alias for a decimal string):
+```json
+{{"method":"fixed_units","units":"0.01"}}
+```
+
+**2. Plain decimal string** — use only when you have a specific reason:
+`"0.01"` (0.01 BTC, 3.0 ETH, 50.0 SOL — instrument-specific, not portable)
+
+**3. Expr** — for dynamic sizing not covered by the methods above, e.g. ATR-based:
+```json
+{{"kind":"bin_op","op":"div",
+  "left":{{"kind":"literal","value":"200"}},
+  "right":{{"kind":"func","name":"atr","period":14}}}}
+```
+
+CRITICAL — ATR sizing and balance limits: `N/atr(14)` expresses quantity in BASE asset units.
+For BTC, 4h ATR ≈ $1500–3000. So `1000/atr(14)` ≈ 0.4–0.7 BTC ≈ $32k–56k notional —
+silently rejected on a $10k account (fill returns None, 0 positions open, no error shown).
+The numerator N represents your intended dollar risk per trade. For a $10k account keep N ≤ 200.
+`200/atr(14)` ≈ 0.07–0.13 BTC ≈ $5.6k–10k notional — fits within a $10k account.
+Prefer `percent_of_balance` for most sizing. Only reach for ATR-based Expr sizing when you need
+volatility-scaled position risk, and keep the numerator proportional to your risk tolerance.
+
+**4. Exit rules** — use `position_quantity` to close the exact open size:
+```json
+{{"kind":"position_quantity"}}
+```
+Alternatively, `"9999"` works for exits: sell quantities are automatically capped to the open
+position size, so a large fixed number is equivalent to `position_quantity`.
+
+CRITICAL — the `"method"` vs `"kind"` distinction:
+- `"method"` belongs ONLY to the three declarative sizing objects: `fixed_sum`, `percent_of_balance`, `fixed_units`.
+- `"kind"` belongs to Expr objects: `position_quantity`, `bin_op`, `func`, `field`, `literal`, etc.
+- `{{"method":"position_quantity"}}` is ALWAYS WRONG. It will be rejected every time.
+  CORRECT: `{{"kind":"position_quantity"}}`.
+- If you used `{{"method":"percent_of_balance",...}}` for the buy, use `{{"kind":"position_quantity"}}` for the sell.
+  These are different object types — buy uses a SizingMethod (`method`), sell uses an Expr (`kind`).
+- `{{"method":"fixed_sum","amount":"100","multiplier":"2.0"}}` is WRONG — `fixed_sum` has no
+  `multiplier` field. Only `amount` is accepted alongside `method`.
+- NEVER add extra fields to SizingMethod objects — they use `additionalProperties: false`.
+
+### Reverse / flip-through-zero (futures only)
+
+Setting `"reverse": true` on a rule action enables a single-order position flip on futures.
+When an opposite position is open, quantity = `position_qty + configured_qty`, which closes
+the existing position and opens a new one in the opposite direction in one order (fees split
+proportionally). When flat the flag has no effect — `configured_qty` is used normally.
+
+This lets you collapse a 4-rule long+short strategy (separate open/close for each leg) into
+2 rules, reducing round-trip fees and keeping logic compact:
+
+```json
+{{"side": "sell", "quantity": {{"method": "percent_of_balance", "percent": "10", "asset": "usdc"}}, "reverse": true}}
+```
+
+Use `reverse` when you always want to be in a position — the signal flips you from long to
+short (or vice versa) rather than first exiting and then re-entering separately. Do NOT use
+`reverse` on spot markets (short selling is not supported there).

 ### Multi-timeframe
 Any expression can reference a different timeframe via "timeframe" field.
@@ -81,6 +171,13 @@ Use higher timeframes as trend filters, lower timeframes for entry precision.
 6. **Composite / hybrid**: Combine families. Trend filter + mean-reversion entry.
   Momentum confirmation + volatility sizing.

+7. **Supertrend consensus flip (futures only)**: Use `any_of` across multiple
+   Supertrend configs (e.g. period=7/mul=1.5, period=10/mul=2.0, period=20/mul=3.0)
+   so that ANY flip triggers a long or short entry. Combine with `"reverse": true`
+   for an always-in-market approach where the opposite signal is the stop-loss.
+   Varying multiplier tightens/loosens the band; varying period controls sensitivity.
+   Risk: choppy markets generate many whipsaws — best on daily or 4h.
+
 ## Risk management (always include)

 Every strategy MUST have:
@@ -88,14 +185,11 @@ Every strategy MUST have:
 - A time-based exit: use bars_since_entry to avoid holding losers indefinitely
 - Reasonable position sizing: prefer ATR-based or percent-of-balance over fixed quantity

-## How to respond
+Exception: always-in-market flip strategies (using `"reverse": true`) do not need an
+explicit stop-loss or time exit — the opposite signal acts as the stop. These are
+only valid on futures. See Example 6 and Example 7.

-You must respond with ONLY a valid JSON object — the strategy config.
-No prose, no markdown explanation, no commentary.
-Just the raw JSON starting with {{ and ending with }}.
-
-The JSON must be a valid strategy with "type": "rule_based".
-Use "usdc" (not "usdt") as the quote asset for balance expressions.
+{output_instructions}

 ## Interpreting backtest results

@@ -103,7 +197,11 @@ When I share results from previous iterations, use them to guide your next strat

 - **Zero trades**: The entry conditions are too restrictive or never co-occur.
  Relax thresholds, simplify conditions, or check if the indicator periods make
-  sense for the candle interval.
+  sense for the candle interval. Also check your position sizing — if using an
+  ATR-based Expr quantity (`N/atr(14)`), a large N can produce a notional value
+  exceeding your account balance (e.g. `1000/atr(14)` on BTC ≈ 0.4 BTC ≈ $32k),
+  which is silently rejected by the fill engine. Switch to `percent_of_balance`
+  or reduce N to ≤ 200 for a $10k account.

 - **Many trades but negative PnL**: The entry signal has no edge, or the exit
  logic is poor. Try different indicator combinations, add trend filters, or
@@ -134,11 +232,31 @@ Common mistakes to NEVER make:
 - `"kind": "bars_since_entry"` is a valid standalone Expr (no extra fields needed).
  Do NOT put `"bars_since_entry"` as a `"name"` inside `{{"kind":"func",...}}` — that is WRONG.
 - `"kind": "expr_field"` does NOT exist. Use `{{"kind":"field","field":"close"}}`.
+- Every Expr object MUST have a `"kind"` field. `{{"field":"close"}}` is WRONG — missing `"kind"`.
+  CORRECT: `{{"kind":"field","field":"close"}}`. The `"kind"` is never optional.
+  This applies to ALL field access including offset lookups:
+  `{{"field":"volume","offset":-1}}` is WRONG. CORRECT: `{{"kind":"field","field":"volume","offset":-1}}`.
+  `{{"field":"high","offset":-2}}` is WRONG. CORRECT: `{{"kind":"field","field":"high","offset":-2}}`.
 - `rsi`, `adx`, `supertrend` are NOT valid inside `apply_func`. Use only `apply_func`
  with `ApplyFuncName` values: `highest`, `lowest`, `sma`, `ema`, `wma`, `std_dev`, `sum`,
  `bollinger_upper`, `bollinger_lower`.
 - `volume` is a candle FIELD, not a func name. Access it as `{{"kind":"field","field":"volume"}}`.
-  To compute EMA of volume: `{{"kind":"apply_func","name":"ema","period":20,"expr":{{"kind":"field","field":"volume"}}}}`.
+  To compute EMA of volume: `{{"kind":"apply_func","name":"ema","period":20,"input":{{"kind":"field","field":"volume"}}}}`.
+- `bollinger_upper` and `bollinger_lower` are FUNC NAMES, not Expr kinds. To compare close to the upper band:
+  `{{"kind":"compare","left":{{"kind":"field","field":"close"}},"op":">","right":{{"kind":"func","name":"bollinger_upper","period":20}}}}`
+  NEVER write `{{"kind":"bollinger_upper",...}}` — `bollinger_upper` is not an Expr kind.
+  NEVER set `"field":"bollinger_upper"` on a func Expr — `bollinger_upper`/`bollinger_lower` have no `field`
+  parameter; they compute from close internally. Just `{{"kind":"func","name":"bollinger_upper","period":20}}`.
+- The `{{"kind":"bollinger",...}}` Condition (shorthand) only accepts `"band": "above_upper"` or
+  `"band": "below_lower"`. There is NO `above_lower` or `below_upper` — those are invalid and will be
+  rejected. Use `above_upper` (price above the upper band) or `below_lower` (price below the lower band).
+- `adx` is a FUNC NAME, not a Condition kind. To filter for strong trends (ADX > 25):
+  `{{"kind":"compare","left":{{"kind":"func","name":"adx","period":14}},"op":">","right":{{"kind":"literal","value":"25"}}}}`
+  NEVER write `{{"kind":"adx",...}}` — `adx` is not a Condition kind, it is a FuncName used inside `{{"kind":"func",...}}`.
+- `roc` (rate of change), `hma` (Hull MA), `ma` (generic), `vwap`, `macd`, `cci`, `stoch` are NOT supported.
+  Use `sma`, `ema`, `wma`, `rsi`, `atr`, `adx`, `supertrend`, `std_dev`, `sum`, `highest`, `lowest`,
+  `bollinger_upper`, `bollinger_lower` only. There is no generic `ma` — use `sma` or `ema` explicitly.
+  Hull MA can be approximated as: WMA(2*WMA(n/2) - WMA(n)) using `apply_func`.

 ## Working examples

@@ -159,7 +277,7 @@ Common mistakes to NEVER make:
          {{"kind": "ema_trend", "period": 50, "direction": "above"}}
        ]
      }},
-      "then": {{"side": "buy", "quantity": "0.001"}}
+      "then": {{"side": "buy", "quantity": "0.01"}}
    }},
    {{
      "comment": "Sell: EMA9 crosses below EMA21, OR 2% stop-loss, OR 72-bar time exit",
@@ -187,7 +305,7 @@ Common mistakes to NEVER make:
          }}
        ]
      }},
-      "then": {{"side": "sell", "quantity": "0.001"}}
+      "then": {{"side": "sell", "quantity": {{"kind": "position_quantity"}}}}
    }}
  ]
 }}
@@ -210,7 +328,7 @@ Common mistakes to NEVER make:
          {{"kind": "bollinger", "period": 20, "band": "below_lower"}}
        ]
      }},
-      "then": {{"side": "buy", "quantity": "0.001"}}
+      "then": {{"side": "buy", "quantity": "0.01"}}
    }},
    {{
      "comment": "Sell: RSI recovers above 55, OR 3% stop-loss, OR 48-bar time exit",
@@ -238,7 +356,7 @@ Common mistakes to NEVER make:
          }}
        ]
      }},
-      "then": {{"side": "sell", "quantity": "0.001"}}
+      "then": {{"side": "sell", "quantity": {{"kind": "position_quantity"}}}}
    }}
  ]
 }}
@@ -265,7 +383,7 @@ Common mistakes to NEVER make:
          }}
        ]
      }},
-      "then": {{"side": "buy", "quantity": "0.001"}}
+      "then": {{"side": "buy", "quantity": "0.01"}}
    }},
    {{
      "comment": "Sell: 2-ATR stop-loss below entry price, OR 48-bar time exit",
@@ -300,38 +418,343 @@ Common mistakes to NEVER make:
          }}
        ]
      }},
-      "then": {{"side": "sell", "quantity": "0.001"}}
+      "then": {{"side": "sell", "quantity": {{"kind": "position_quantity"}}}}
    }}
  ]
 }}
 ```

+### Example 4 — MACD crossover (composed from primitives)
+
+MACD has no native support, but can be composed from `func` and `apply_func`.
+The MACD line is `EMA(12) - EMA(26)`; the signal line is `EMA(9)` of the MACD line.
+
+```json
+{{
+  "type": "rule_based",
+  "candle_interval": "4h",
+  "rules": [
+    {{
+      "comment": "Buy: MACD line crosses above signal line",
+      "when": {{
+        "kind": "all_of",
+        "conditions": [
+          {{"kind": "position", "state": "flat"}},
+          {{
+            "kind": "cross_over",
+            "left": {{
+              "kind": "bin_op", "op": "sub",
+              "left":  {{"kind": "func", "name": "ema", "period": 12}},
+              "right": {{"kind": "func", "name": "ema", "period": 26}}
+            }},
+            "right": {{
+              "kind": "apply_func", "name": "ema", "period": 9,
+              "input": {{
+                "kind": "bin_op", "op": "sub",
+                "left":  {{"kind": "func", "name": "ema", "period": 12}},
+                "right": {{"kind": "func", "name": "ema", "period": 26}}
+              }}
+            }}
+          }}
+        ]
+      }},
+      "then": {{"side": "buy", "quantity": "0.01"}}
+    }},
+    {{
+      "comment": "Sell: MACD crosses below signal, OR 2% stop-loss, OR 72-bar time exit",
+      "when": {{
+        "kind": "all_of",
+        "conditions": [
+          {{"kind": "position", "state": "long"}},
+          {{
+            "kind": "any_of",
+            "conditions": [
+              {{
+                "kind": "cross_under",
+                "left": {{
+                  "kind": "bin_op", "op": "sub",
+                  "left":  {{"kind": "func", "name": "ema", "period": 12}},
+                  "right": {{"kind": "func", "name": "ema", "period": 26}}
+                }},
+                "right": {{
+                  "kind": "apply_func", "name": "ema", "period": 9,
+                  "input": {{
+                    "kind": "bin_op", "op": "sub",
+                    "left":  {{"kind": "func", "name": "ema", "period": 12}},
+                    "right": {{"kind": "func", "name": "ema", "period": 26}}
+                  }}
+                }}
+              }},
+              {{
+                "kind": "compare",
+                "left": {{"kind": "field", "field": "close"}},
+                "op": "<",
+                "right": {{"kind": "bin_op", "op": "mul",
+                           "left": {{"kind": "entry_price"}},
+                           "right": {{"kind": "literal", "value": "0.98"}}}}
+              }},
+              {{
+                "kind": "compare",
+                "left": {{"kind": "bars_since_entry"}},
+                "op": ">=",
+                "right": {{"kind": "literal", "value": "72"}}
+              }}
+            ]
+          }}
+        ]
+      }},
+      "then": {{"side": "sell", "quantity": {{"kind": "position_quantity"}}}}
+    }}
+  ]
+}}
+```
+
+Key pattern: `apply_func` wraps any `Expr` tree using the `"input"` field (NOT `"expr"`).
+This enables EMA-of-expression (signal line), WMA-of-expression (Hull MA), or std_dev-of-returns.
+There is NO native `macd` func name — always compose it as `bin_op(sub, func(ema,12), func(ema,26))` as shown above.
+CRITICAL: `apply_func` uses `"input"`, not `"expr"`. Writing `"expr":` will be rejected by the API.
+
 ## Anti-patterns to avoid

 - Don't use the same indicator for both entry and exit (circular logic)
 - Don't set RSI thresholds at extreme values (< 10 or > 90) — too rare to fire
 - Don't use very short periods (< 5) on high timeframes — noisy
 - Don't use very long periods (> 100) on low timeframes — too slow to react
+- Don't switch to 15m or shorter intervals when results are poor — higher frequency amplifies
+  fees and noise, making edge harder to find. Prefer 1h or 4h. If Sharpe is negative across
+  intervals, the issue is signal logic, not timeframe — fix the signal before changing interval.
 - Don't create strategies with more than 5-6 conditions — overfitting risk
 - Don't ignore fees — a strategy needs to overcome 0.1% per round trip
- Always gate buy rules with position state "flat" and sell rules with "long"
- Never add a short-entry (sell when flat) rule — spot markets are long-only
- Never use an expression object for `quantity` — it must always be a plain decimal string like `"0.001"`
- Never use a placeholder string for `quantity` — `"ATR_SIZED"`, `"FULL_BALANCE"`, `"dynamic"`, etc. are all invalid and will be rejected. Use `"0.001"` or similar.
-"##
+- Spot markets are long-only: gate buy (entry) rules with state "flat" and sell (exit) rules with state "long". Never add a short-entry (sell when flat) rule on spot.
+- Futures markets support both directions: long entry = buy when flat; long exit = sell when long; short entry = sell when flat; short exit (cover) = buy when short. Always include a stop-loss and time exit for both long and short legs.
+- Never use a placeholder string for `quantity` — `"ATR_SIZED"`, `"FULL_BALANCE"`, `"dynamic"`, etc. are all invalid and will be rejected.
+- Don't use large ATR-based sizing numerators. `N/atr(14)` gives BASE units; for BTC (ATR ≈ $2000
+  on 4h), `1000/atr(14)` ≈ 0.5 BTC ≈ $40k — silently rejected on a $10k account. Keep N ≤ 200
+  or use `percent_of_balance`. The condition audit may show entry conditions firing while 0 positions
+  open — this is the typical symptom of an oversized ATR quantity.
+- `{{"method":"position_quantity"}}` is WRONG for exit rules — use `{{"kind":"position_quantity"}}` (see Quantity section above).
+{futures_examples}"##,
+        futures_examples = if has_futures { FUTURES_SHORT_EXAMPLES } else { "" },
    )
 }

+/// Short-entry and short-exit strategy examples, injected into the system prompt when
+/// futures instruments are present.
+const FUTURES_SHORT_EXAMPLES: &str = r##"
+
+### Example 5 — Futures short: EMA trend-following short with ATR stop
+
+On futures you can also short. Short entry = `"side": "sell"` when `"state": "flat"`;
+short exit (cover) = `"side": "buy"` when `"state": "short"`. Stop-loss for a short
+is price rising above entry, e.g. entry_price * 1.02. You may run long and short legs
+in the same strategy (4 rules total), or a short-only strategy (2 rules).
+
+```json
+{
+  "type": "rule_based",
+  "candle_interval": "4h",
+  "rules": [
+    {
+      "comment": "Short entry: EMA9 crosses below EMA21 while price is below EMA50 (downtrend)",
+      "when": {
+        "kind": "all_of",
+        "conditions": [
+          {"kind": "position", "state": "flat"},
+          {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "below"},
+          {"kind": "ema_trend", "period": 50, "direction": "below"}
+        ]
+      },
+      "then": {"side": "sell", "quantity": {"method": "percent_of_balance", "percent": "10", "asset": "usdc"}}
+    },
+    {
+      "comment": "Short exit: EMA9 crosses back above EMA21, OR 2% stop-loss, OR 48-bar time exit",
+      "when": {
+        "kind": "all_of",
+        "conditions": [
+          {"kind": "position", "state": "short"},
+          {
+            "kind": "any_of",
+            "conditions": [
+              {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "above"},
+              {
+                "kind": "compare",
+                "left": {"kind": "field", "field": "close"},
+                "op": ">",
+                "right": {"kind": "bin_op", "op": "mul", "left": {"kind": "entry_price"}, "right": {"kind": "literal", "value": "1.02"}}
+              },
+              {
+                "kind": "compare",
+                "left": {"kind": "bars_since_entry"},
+                "op": ">=",
+                "right": {"kind": "literal", "value": "48"}
+              }
+            ]
+          }
+        ]
+      },
+      "then": {"side": "buy", "quantity": {"kind": "position_quantity"}}
+    }
+  ]
+}
+```
+
+Key short-specific notes:
+- Stop-loss for short = close > entry_price * (1 + stop_pct), e.g. `* 1.02` for 2% stop
+- Take-profit for short = close < entry_price * (1 - target_pct), e.g. `* 0.97` for 3% target
+- Short exit uses `"side": "buy"` with `{"kind": "position_quantity"}` (same as long exit uses sell)
+- `percent_of_balance` for short entry uses `"usdc"` as the asset (the collateral currency)
+
+### Example 6 — Futures flip-through-zero: 2-rule EMA trend-follower using `reverse`
+
+When you always want to be in a position (long during uptrends, short during downtrends),
+use `"reverse": true` to flip from one side to the other in a single order. This uses half
+the round-trip fee count compared to a 4-rule separate-entry/exit approach.
+
+```json
+{
+  "type": "rule_based",
+  "candle_interval": "4h",
+  "rules": [
+    {
+      "comment": "Go long (or flip short→long): EMA9 crosses above EMA21 while above EMA50",
+      "when": {
+        "kind": "all_of",
+        "conditions": [
+          {"kind": "any_of", "conditions": [
+            {"kind": "position", "state": "flat"},
+            {"kind": "position", "state": "short"}
+          ]},
+          {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "above"},
+          {"kind": "ema_trend", "period": 50, "direction": "above"}
+        ]
+      },
+      "then": {"side": "buy", "quantity": {"method": "percent_of_balance", "percent": "10", "asset": "usdc"}, "reverse": true}
+    },
+    {
+      "comment": "Go short (or flip long→short): EMA9 crosses below EMA21 while below EMA50",
+      "when": {
+        "kind": "all_of",
+        "conditions": [
+          {"kind": "any_of", "conditions": [
+            {"kind": "position", "state": "flat"},
+            {"kind": "position", "state": "long"}
+          ]},
+          {"kind": "ema_crossover", "fast_period": 9, "slow_period": 21, "direction": "below"},
+          {"kind": "ema_trend", "period": 50, "direction": "below"}
+        ]
+      },
+      "then": {"side": "sell", "quantity": {"method": "percent_of_balance", "percent": "10", "asset": "usdc"}, "reverse": true}
+    }
+  ]
+}
+```
+
+Key flip-strategy notes:
+- Gate each rule on `flat OR opposite` (using `any_of`) so it fires both on initial entry and on flip
+- `reverse: true` handles the flip math automatically — no need to size for `position_qty + new_qty`
+- This pattern works best for trend-following where you want continuous market exposure
+- Still add a time-based or ATR stop if you want a safety exit when the trend stalls
+
+### Example 7 — Futures triple-Supertrend consensus flip
+
+Multiple Supertrend instances with different period/multiplier combos act as a tiered
+signal. `any_of` fires on the FIRST flip — the fastest line (7/1.5) reacts quickly,
+the slowest (20/3.0) confirms strong trends. `reverse: true` makes it always-in-market:
+the opposite signal is the stop-loss. No explicit stop or time exit needed.
+
+Varying parameters to tune:
+- Tighter multipliers (1.0–2.0) → more signals, more whipsaws
+- Looser multipliers (2.5–4.0) → fewer signals, longer holds
+- Try `all_of` instead of `any_of` to require consensus across all three (stronger filter)
+
+```json
+{{
+  "type": "rule_based",
+  "candle_interval": "4h",
+  "rules": [
+    {{
+      "comment": "LONG (or flip short→long): any Supertrend flips bullish",
+      "when": {{
+        "kind": "all_of",
+        "conditions": [
+          {{"kind": "any_of", "conditions": [
+            {{"kind": "position", "state": "flat"}},
+            {{"kind": "position", "state": "short"}}
+          ]}},
+          {{
+            "kind": "any_of",
+            "conditions": [
+              {{"kind": "cross_over", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 7,  "multiplier": "1.5"}}}},
+              {{"kind": "cross_over", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 10, "multiplier": "2.0"}}}},
+              {{"kind": "cross_over", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 20, "multiplier": "3.0"}}}}
+            ]
+          }}
+        ]
+      }},
+      "then": {{"side": "buy", "quantity": {{"method": "percent_of_balance", "percent": "5", "asset": "usdc"}}, "reverse": true}}
+    }},
+    {{
+      "comment": "SHORT (or flip long→short): any Supertrend flips bearish",
+      "when": {{
+        "kind": "all_of",
+        "conditions": [
+          {{"kind": "any_of", "conditions": [
+            {{"kind": "position", "state": "flat"}},
+            {{"kind": "position", "state": "long"}}
+          ]}},
+          {{
+            "kind": "any_of",
+            "conditions": [
+              {{"kind": "cross_under", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 7,  "multiplier": "1.5"}}}},
+              {{"kind": "cross_under", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 10, "multiplier": "2.0"}}}},
+              {{"kind": "cross_under", "left": {{"kind": "field", "field": "close"}}, "right": {{"kind": "func", "name": "supertrend", "period": 20, "multiplier": "3.0"}}}}
+            ]
+          }}
+        ]
+      }},
+      "then": {{"side": "sell", "quantity": {{"method": "percent_of_balance", "percent": "5", "asset": "usdc"}}, "reverse": true}}
+    }}
+  ]
+}}
+```
+
+Key Supertrend-specific notes:
+- `supertrend` ignores `field` — it uses OHLC internally; omit the `field` param
+- `multiplier` controls band width: lower = tighter, more reactive; higher = wider, more stable
+- `any_of` → first flip triggers (responsive); `all_of` → all three must agree (conservative)
+- Gate on position state to prevent re-entries scaling into an existing position"##;
+
 /// Build the user message for the first iteration (no prior results).
-pub fn initial_prompt(instruments: &[String], candle_intervals: &[String]) -> String {
+/// `prior_summary` contains a formatted summary of results from previous runs, if any.
+pub fn initial_prompt(instruments: &[String], candle_intervals: &[String], prior_summary: Option<&str>, has_futures: bool) -> String {
+    let prior_section = match prior_summary {
+        Some(s) => format!("{s}\n\n"),
+        None => String::new(),
+    };
+    let starting_instruction = if prior_summary.is_some() {
+        "Based on the prior results above:\n\
+- A strategy is \"promising\" if avg_sharpe >= 0.5 AND it traded >= 10 times per instrument. \
+If the best prior strategy meets both thresholds, refine it (tighten entry conditions, \
+adjust the exit, or tune the interval).\n\
+- If no prior strategy reaches avg_sharpe >= 0.5, do NOT repeat the same indicator family. \
+Scan the best-strategies list: if they all use the same core indicator (e.g. all use \
+Bollinger Bands, or all use EMA crossovers, or all use RSI threshold), your FIRST strategy \
+MUST use a completely different indicator family — for example: MACD crossover, ATR \
+breakout, volume spike, donchian channel breakout, or stochastic oscillator. Only after \
+that novelty attempt may you refine prior work.\n\
+- Never repeat an approach that produced 0 trades or fewer than 5 trades per instrument."
+    } else {
+        "Start with a multi-timeframe trend-following approach with proper risk management \
+(stop-loss, time exit, and ATR-based position sizing)."
+    };
+    let market_type = if has_futures { "futures" } else { "spot" };
    format!(
-        r#"Design a trading strategy for crypto spot markets.
+        r#"{prior_section}Design a trading strategy for crypto {market_type} markets.

 Available instruments: {}
 Available candle intervals: {}

-Start with a multi-timeframe trend-following approach with proper risk management
-(stop-loss, time exit, and ATR-based position sizing). Use "usdc" as the quote asset.
+{starting_instruction} Use "usdc" as the quote asset.

 Respond with ONLY the strategy JSON."#,
        instruments.join(", "),
--- a/src/swym.rs
+++ b/src/swym.rs
@@ -4,6 +4,21 @@ use serde::{Deserialize, Serialize};
 use serde_json::Value;
 use uuid::Uuid;

+/// Response from `POST /api/v1/strategies/validate`.
+#[derive(Debug, Deserialize)]
+pub struct ValidationResponse {
+    pub valid: bool,
+    #[serde(default)]
+    pub errors: Vec<ValidationError>,
+}
+
+#[derive(Debug, Deserialize, Clone)]
+pub struct ValidationError {
+    /// Dotted JSON path to the offending field. Absent for top-level structural errors.
+    pub path: Option<String>,
+    pub message: String,
+}
+
 /// Client for the swym backtesting API.
 pub struct SwymClient {
    client: Client,
@@ -30,6 +45,39 @@ pub struct CandleCoverage {
    pub first_open: String,
    pub last_close: String,
    pub count: u64,
+    pub expected_count: Option<u64>,
+    pub coverage_pct: Option<f64>,
+}
+
+/// Response from `GET /api/v1/paper-runs/compare?ids=...`.
+#[derive(Debug, Deserialize)]
+pub struct RunMetricsSummary {
+    pub id: Uuid,
+    pub status: String,
+    pub candle_interval: Option<String>,
+    pub total_positions: Option<u32>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub win_rate: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub profit_factor: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub net_pnl: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub sharpe_ratio: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub sortino_ratio: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub calmar_ratio: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub max_drawdown: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub pnl_return: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub avg_win: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub avg_loss: Option<f64>,
+    #[serde(default, deserialize_with = "deserialize_opt_number")]
+    pub avg_hold_duration_secs: Option<f64>,
 }

 #[derive(Debug, Clone, Serialize, Deserialize)]
@@ -45,6 +93,15 @@ pub struct BacktestResult {
    pub total_pnl: Option<f64>,
    pub net_pnl: Option<f64>,
    pub sharpe_ratio: Option<f64>,
+    pub sortino_ratio: Option<f64>,
+    pub calmar_ratio: Option<f64>,
+    pub max_drawdown: Option<f64>,
+    pub pnl_return: Option<f64>,
+    pub avg_win: Option<f64>,
+    pub avg_loss: Option<f64>,
+    pub max_win: Option<f64>,
+    pub max_loss: Option<f64>,
+    pub avg_hold_duration_secs: Option<f64>,
    pub total_fees: Option<f64>,
    pub avg_bars_in_trade: Option<f64>,
    pub error_message: Option<String>,
@@ -52,16 +109,10 @@ pub struct BacktestResult {
 }

 impl BacktestResult {
-    /// Parse a backtest response.
-    ///
-    /// `exchange`, `base`, `quote` are needed to derive the instrument key used
-    /// in the `result_summary.instruments` map (e.g. `binancespot-eth_usdc`).
+    /// Parse a backtest response using the flat summary fields added in swym patch 8fb410311.
    pub fn from_response(
        resp: &PaperRunResponse,
        instrument: &str,
-        exchange: &str,
-        base: &str,
-        quote: &str,
    ) -> Self {
        let summary = resp.result_summary.as_ref();
        if let Some(s) = summary {
@@ -70,28 +121,47 @@ impl BacktestResult {
            tracing::debug!("[{instrument}] result_summary: null");
        }

-        // The API key for per-instrument stats: "binance_spot" + "eth" + "usdc" → "binancespot-eth_usdc"
-        let inst_key = format!("{}-{}_{}", exchange.replace('_', ""), base, quote);
-
-        let total_positions = summary.and_then(|s| {
-            s["backtest_metadata"]["position_count"].as_u64().map(|v| v as u32)
-        });
-
-        let inst_stats = summary.and_then(|s| s["instruments"].get(&inst_key));
+        let total_positions = summary.and_then(|s| s["total_positions"].as_u64().map(|v| v as u32));
+        let winning_positions = summary.and_then(|s| s["winning_positions"].as_u64().map(|v| v as u32));
+        let losing_positions = summary.and_then(|s| s["losing_positions"].as_u64().map(|v| v as u32));
+        let win_rate = summary.and_then(|s| parse_number(&s["win_rate"]));
+        let profit_factor = summary.and_then(|s| parse_number(&s["profit_factor"]));
+        let net_pnl = summary.and_then(|s| parse_number(&s["net_pnl"]));
+        let total_pnl = summary.and_then(|s| parse_number(&s["total_pnl"]));
+        let sharpe_ratio = summary.and_then(|s| parse_number(&s["sharpe_ratio"]));
+        let sortino_ratio = summary.and_then(|s| parse_number(&s["sortino_ratio"]));
+        let calmar_ratio = summary.and_then(|s| parse_number(&s["calmar_ratio"]));
+        let max_drawdown = summary.and_then(|s| parse_number(&s["max_drawdown"]));
+        let pnl_return = summary.and_then(|s| parse_number(&s["pnl_return"]));
+        let avg_win = summary.and_then(|s| parse_number(&s["avg_win"]));
+        let avg_loss = summary.and_then(|s| parse_number(&s["avg_loss"]));
+        let max_win = summary.and_then(|s| parse_number(&s["max_win"]));
+        let max_loss = summary.and_then(|s| parse_number(&s["max_loss"]));
+        let avg_hold_duration_secs = summary.and_then(|s| parse_number(&s["avg_hold_duration_secs"]));
+        let total_fees = summary.and_then(|s| parse_number(&s["total_fees"]));

        Self {
            run_id: resp.id,
            instrument: instrument.to_string(),
            status: resp.status.clone(),
            total_positions,
-            winning_positions: None,
-            losing_positions: None,
-            win_rate: inst_stats.and_then(|s| parse_ratio_value(&s["win_rate"])),
-            profit_factor: inst_stats.and_then(|s| parse_ratio_value(&s["profit_factor"])),
-            total_pnl: inst_stats.and_then(|s| parse_decimal_str(&s["pnl"])),
-            net_pnl: inst_stats.and_then(|s| parse_decimal_str(&s["pnl"])),
-            sharpe_ratio: inst_stats.and_then(|s| parse_ratio_value(&s["sharpe_ratio"])),
-            total_fees: None,
+            winning_positions,
+            losing_positions,
+            win_rate,
+            profit_factor,
+            total_pnl,
+            net_pnl,
+            sharpe_ratio,
+            sortino_ratio,
+            calmar_ratio,
+            max_drawdown,
+            pnl_return,
+            avg_win,
+            avg_loss,
+            max_win,
+            max_loss,
+            avg_hold_duration_secs,
+            total_fees,
            avg_bars_in_trade: None,
            error_message: resp.error_message.clone(),
            condition_audit_summary: summary.and_then(|s| s.get("condition_audit_summary").cloned()),
@@ -116,6 +186,12 @@ impl BacktestResult {
            self.net_pnl.unwrap_or(0.0),
            self.sharpe_ratio.unwrap_or(0.0),
        );
+        if let Some(sortino) = self.sortino_ratio {
+            s.push_str(&format!(" sortino={:.2}", sortino));
+        }
+        if let Some(dd) = self.max_drawdown {
+            s.push_str(&format!(" max_dd={:.1}%", dd * 100.0));
+        }
        if self.total_positions.unwrap_or(0) == 0 {
            if let Some(audit) = &self.condition_audit_summary {
                let audit_str = format_audit_summary(audit);
@@ -129,27 +205,32 @@ impl BacktestResult {
    }

    /// Is this result promising enough to warrant out-of-sample validation?
+    /// Uses sharpe if available, otherwise falls back to net_pnl > 0.
    pub fn is_promising(&self, min_sharpe: f64, min_trades: u32) -> bool {
-        self.status == "complete"
-            && self.sharpe_ratio.unwrap_or(0.0) > min_sharpe
-            && self.total_positions.unwrap_or(0) >= min_trades
-            && self.net_pnl.unwrap_or(0.0) > 0.0
+        if self.status != "complete" { return false; }
+        if self.total_positions.unwrap_or(0) < min_trades { return false; }
+        if self.net_pnl.unwrap_or(0.0) <= 0.0 { return false; }
+        match self.sharpe_ratio {
+            Some(sr) => sr > min_sharpe,
+            None => true, // sharpe absent (e.g. 0 trades); net_pnl + trades is sufficient signal
+        }
    }
 }

-/// Parse a `{"interval": null, "value": "123.45"}` ratio wrapper.
-/// Returns `None` for null, missing, or sentinel values (Decimal::MAX ≈ 7.9e28).
-fn parse_ratio_value(v: &Value) -> Option<f64> {
-    let s = v.get("value")?.as_str()?;
-    let f: f64 = s.parse().ok()?;
+/// Parse a numeric JSON value — accepts either a plain JSON number or a decimal string.
+/// Returns `None` for null, missing, or sentinel values (>1e20 in magnitude).
+fn parse_number(v: &Value) -> Option<f64> {
+    let f = v.as_f64().or_else(|| v.as_str()?.parse().ok())?;
    if f.abs() > 1e20 { None } else { Some(f) }
 }

-/// Parse a plain decimal string JSON value.
-/// Returns `None` for null, missing, or sentinel values.
-fn parse_decimal_str(v: &Value) -> Option<f64> {
-    let f: f64 = v.as_str()?.parse().ok()?;
-    if f.abs() > 1e20 { None } else { Some(f) }
+/// Serde deserializer for `Option<f64>` that accepts both JSON numbers and decimal strings.
+fn deserialize_opt_number<'de, D>(deserializer: D) -> Result<Option<f64>, D::Error>
+where
+    D: serde::Deserializer<'de>,
+{
+    let v = Value::deserialize(deserializer)?;
+    Ok(parse_number(&v))
 }

 /// Render a condition_audit_summary Value into a compact one-line string.
@@ -254,6 +335,32 @@ impl SwymClient {
        resp.json().await.context("parse candle coverage")
    }

+    /// Validate a strategy against the swym DSL schema.
+    ///
+    /// Calls `POST /api/v1/strategies/validate` and returns a structured list
+    /// of all validation errors. Returns `Ok(vec![])` when the strategy is valid.
+    /// Returns `Err` only on network or parse failures, not on DSL errors.
+    pub async fn validate_strategy(&self, strategy: &Value) -> Result<Vec<ValidationError>> {
+        let url = format!("{}/strategies/validate", self.base_url);
+        let resp = self
+            .client
+            .post(&url)
+            .json(strategy)
+            .send()
+            .await
+            .context("validate strategy request")?;
+
+        if !resp.status().is_success() {
+            let status = resp.status();
+            let body = resp.text().await.unwrap_or_default();
+            anyhow::bail!("validate strategy {status}: {body}");
+        }
+
+        let parsed: ValidationResponse =
+            resp.json().await.context("parse validation response")?;
+        Ok(parsed.errors)
+    }
+
    /// Submit a backtest run.
    pub async fn submit_backtest(
        &self,
@@ -261,6 +368,7 @@ impl SwymClient {
        instrument_symbol: &str,
        base_asset: &str,
        quote_asset: &str,
+        market_kind: &str,
        strategy: &Value,
        starts_at: &str,
        finishes_at: &str,
@@ -278,7 +386,7 @@ impl SwymClient {
                    "name_exchange": instrument_symbol,
                    "underlying": { "base": base_asset, "quote": quote_asset },
                    "quote": "underlying_quote",
-                    "kind": "spot"
+                    "kind": market_kind
                },
                "execution": {
                    "mocked_exchange": instrument_exchange,
@@ -352,6 +460,25 @@ impl SwymClient {
        }
    }

+    /// Fetch metrics for multiple completed runs via the compare endpoint.
+    /// Batches requests in groups of 50 (API maximum).
+    pub async fn compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>> {
+        let mut results = Vec::new();
+        for chunk in run_ids.chunks(50) {
+            let ids = chunk.iter().map(|id| id.to_string()).collect::<Vec<_>>().join(",");
+            let url = format!("{}/paper-runs/compare?ids={}", self.base_url, ids);
+            let resp = self.client.get(&url).send().await.context("compare runs request")?;
+            if !resp.status().is_success() {
+                let status = resp.status();
+                let body = resp.text().await.unwrap_or_default();
+                anyhow::bail!("compare runs {status}: {body}");
+            }
+            let mut batch: Vec<RunMetricsSummary> = resp.json().await.context("parse compare response")?;
+            results.append(&mut batch);
+        }
+        Ok(results)
+    }
+
    /// Fetch condition audit summary for a completed run.
    pub async fn condition_audit(&self, run_id: Uuid) -> Result<Value> {
        let url = format!("{}/paper-runs/{}/condition-audit", self.base_url, run_id);
Author	SHA1	Message	Date
rob thijssen	11fe79ed25	docs: add CLAUDE.md for future Claude Code instances Add comprehensive guidance document covering architecture, data flows, development commands, DSL schema reference, and common patterns for working with the scout strategy search agent. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-03-12 05:38:28 +02:00
rob thijssen	fcb9a2f553	chore: attempt dedupe guidance in prompt	2026-03-11 18:15:24 +02:00
rob thijssen	75c95f7935	feat: add triple-Supertrend consensus flip as strategy family 7 Adds awareness of the multi-Supertrend any_of flip pattern (based on the reference strategy at swym/assets/reference/supertrend-triple.json, itself a DSL port of the popular TradingView triple-Supertrend script). - prompts.rs: add strategy family 7 (Supertrend consensus flip) with guidance on any_of vs all_of, period/multiplier tuning, and the always-in-market / reverse-as-stop-loss trade-off - prompts.rs: add risk management exception for always-in-market flip strategies (reverse: true means the opposite signal is the stop) - prompts.rs: add Example 7 — correctly gated 2-rule triple-Supertrend flip with position state guards to prevent unintended scale-ins Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 18:40:15 +02:00
rob thijssen	6601da21cc	feat: add reverse flag and symmetric short support to DSL Update scout's schema and system prompt to reflect two upstream swym changes from 2026-03-10: - b535207: symmetric short quantity fix — buy-to-cover now correctly uses position_qty (executor was broken; scout's DSL patterns were already correct and will now work as intended) - 6f58949: reverse flag on Action — new optional "reverse": true field that submits position_qty + configured_qty when an opposite position is open, closing it and opening a new one in the opposite direction in a single order (flip-through-zero) Changes: - dsl-schema.json: add "reverse" boolean to Action definition - prompts.rs: add "Reverse / flip-through-zero" capability section and Example 6 (2-rule EMA flip strategy) to FUTURES_SHORT_EXAMPLES Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 18:28:54 +02:00
rob thijssen	8de3ae5fe1	Add Binance Futures support (long and short) - config.rs: add Instrument::market_kind() mapping exchange name to "spot"/"futures_um"/"futures_cm", and is_futures() helper - swym.rs: submit_backtest() accepts market_kind param; passes it as instrument.kind in the RunConfig instead of hardcoding "spot" - agent.rs: derive has_futures from instruments; pass to both system_prompt() and initial_prompt() - prompts.rs: - system_prompt() accepts has_futures; injects FUTURES_SHORT_EXAMPLES (Example 5: EMA trend-following short with ATR stop) when true - Rewrite position-state anti-patterns to cover both spot (long-only) and futures (long + short) semantics - initial_prompt() accepts has_futures; labels market as "spot" or "futures" and passes flag through to starting instruction context Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 18:13:06 +02:00
rob thijssen	a435d3a99d	Define concrete 'promising' threshold and enforce indicator diversity in ledger-informed prompt - Replace vague "promising metrics" with avg_sharpe >= 0.5 AND >= 10 trades per instrument - Add indicator-family diversity rule: if all prior strategies share the same core indicator (e.g. all Bollinger Bands), the first strategy of the new run must use a different family - Give explicit examples of alternative families: MACD, ATR breakout, volume spike, donchian channel breakout, stochastic oscillator - Extend the no-repeat ban to strategies with fewer than 5 trades per instrument Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 14:21:55 +02:00
rob thijssen	b476199de8	Fix ledger context being overridden by prescriptive initial prompt The 13:20:03 run showed the ledger context was counterproductive: the initial prompt's "Start with a multi-timeframe trend-following approach" instruction caused the model to ignore the prior summary and repeat EMA50-based strategies that produced 0 trades across all 15 iterations. Two fixes: - When prior_summary is present, replace the prescriptive starting instruction with one that explicitly defers to the ledger: refine the best prior strategy or try a different approach if all prior results were poor. Prevents the fixed instruction from overriding the context. - Cap ledger entries per unique strategy at 3. A strategy repeated across 11 iterations would contribute 33 entries, drowning out other approaches in the prior summary. 3 entries (one per instrument) is sufficient. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:54:35 +02:00
rob thijssen	d76d3b9061	Use write_all for ledger entries to improve concurrent-write safety writeln!(f, ...) makes two syscalls (data + newline) which can interleave between concurrent processes even with O_APPEND. Serialise entry to bytes and append the newline before write_all() so the entire entry lands in a single write() syscall, which O_APPEND makes atomic on Linux local filesystems for typical entry sizes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:12:38 +02:00
rob thijssen	0945c94cc8	Add --ledger-file arg for explicit ledger path control Defaults to <output_dir>/run_ledger.jsonl as before. Pass --ledger-file to read from (and write to) a specific ledger, enabling multiple ledger files to seed different search campaigns or merge results from separate runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:10:22 +02:00
rob thijssen	a0316be798	Add cross-run learning via run ledger and compare endpoint Persist strategy + run_id to results/run_ledger.jsonl after each backtest. On startup, load the ledger, fetch metrics via the new compare endpoint (batched in groups of 50), group by strategy, rank by avg Sharpe, and inject a summary of the top 5 and worst 3 prior strategies into the iteration-1 prompt. Also consumes the enriched result_summary fields from swym patch e47c18: sortino_ratio, calmar_ratio, max_drawdown, pnl_return, avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs. Sortino and max_drawdown are appended to summary_line() when present. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 13:05:39 +02:00
rob thijssen	609d64587b	docs: cross-run learnings plan	2026-03-10 13:04:13 +02:00
rob thijssen	6692bdb490	Prompt: fix method vs kind confusion causing 11/15 validation failures The 12:11:39 run shows the model using {"method":"position_quantity"} for every sell rule despite the existing CRITICAL note. Root cause: a contradictory anti-pattern ("Never use an expression object for quantity") was fighting the correct guidance, and the method/kind distinction wasn't emphatic enough. - Expand the CRITICAL note to explicitly contrast: buy uses SizingMethod ("method"), sell uses Expr ("kind") — they are different object types. - Remove the contradictory "never use an expression object" anti-pattern which conflicted with position_quantity and SizingMethod objects. - Add a final anti-pattern bullet as a second reminder of the same mistake. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 12:24:57 +02:00
rob thijssen	36689e3fbb	Prompt: fix field+offset kind omission and add interval guidance Two gaps revealed by the 2026-03-10T11:42:49 run: - Iterations 11-15 all failed with "missing field 'kind'" when the model wrote {"field":"volume","offset":-1} without the required "kind":"field". Expand the existing kind-required note with explicit offset examples. - Iteration 10 switched to 15m unprompted and got sharpe=-0.41 from overtrading. Add anti-pattern note: don't change interval when sharpe is negative — fix the signal logic instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 12:09:18 +02:00
rob thijssen	87d31f8d7e	Use flat result_summary fields from swym patch 8fb410311 BacktestResult::from_response now reads total_positions, winning_positions, losing_positions, win_rate, profit_factor, net_pnl, total_pnl, sharpe_ratio, and total_fees directly from the top-level result_summary object instead of deriving them from backtest_metadata + balance delta. Removes the quote/initial_balance parameters that were only needed for the workaround. Restores the full summary_line format with all metrics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 11:41:53 +02:00
rob thijssen	3892ab37c1	fix: parse actual result_summary structure (backtest_metadata + assets) The API doc described a flat result_summary that doesn't exist yet in the deployed backend. The actual shape is: { backtest_metadata: { position_count }, assets: [...], condition_audit_summary } - total_positions from backtest_metadata.position_count - net_pnl from assets[quote].tear_sheet.balance_end.total - initial_balance - win_rate, profit_factor, sharpe_ratio, total_fees, avg_bars_in_trade remain None until the API adds them from_response() takes quote and initial_balance again to locate the right asset and compute PnL. summary_line() only prints metrics that are actually present. is_promising() falls back to net_pnl>0 + trades when sharpe is unavailable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 10:32:13 +02:00
rob thijssen	85896752f2	fix: ValidationError.path optional, correct position_quantity usage in prompts - ValidationError.path is Option<String> — the API omits it for top-level structural errors. The required String was causing every validate call to fail to deserialize, falling through to submission instead of catching errors. - Log path as "(top-level)" when absent - Prompts: add explicit CRITICAL note that {"method":"position_quantity"} is wrong — position_quantity is an Expr (uses "kind") not a SizingMethod (uses "method"). The new SizingMethod examples caused the model to over-apply "method" to exits universally across the entire run. - Prompts: note that fixed_sum has no multiplier field (additionalProperties) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 09:45:17 +02:00
rob thijssen	ee260ea4d5	fix: parse flat result_summary structure per updated API doc The API result_summary is a flat object with top-level fields (total_positions, win_rate, profit_factor, net_pnl, sharpe_ratio, etc.) not a nested backtest_metadata/instruments map. This was causing all metrics to parse as None/zero for every completed run. - Rewrite BacktestResult::from_response() to read flat fields directly - Replace parse_ratio_value/parse_decimal_str with a single parse_number() that accepts both JSON numbers and decimal strings - Populate winning_positions, losing_positions, total_fees, avg_bars_in_trade (previously always None) - Simplify from_response signature — exchange/base/quote no longer needed - Add expected_count and coverage_pct to CandleCoverage struct - Update all example sell rules to use position_quantity instead of "0.01" - Note that "9999" is a valid sell-all alias (auto-capped by the API) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 09:37:55 +02:00
rob thijssen	3f8d4de7fb	feat: add declarative SizingMethod types from upstream schema Upstream added three new quantity sizing objects alongside DecimalString and Expr: - fixed_sum: buy N quote-currency worth at current price - percent_of_balance: buy N% of named asset's free balance - fixed_units: buy exactly N base units (semantic alias for decimal string) Update dsl-schema.json to include the three definitions and expand Action.quantity.oneOf to reference all five valid forms. Update prompts.rs Quantity section to present the declarative methods as the preferred approach — they're cleaner, more readable, and instrument-agnostic compared to raw Expr composition. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 09:33:43 +02:00
rob thijssen	7e1ff51ae0	feat: validate endpoint integration, Expr quantity sizing, apply_func input field fix - Add /api/v1/strategies/validate client to SwymClient; wire into agent loop before submission so all DSL errors are surfaced in one round-trip - Update dsl-schema.json to upstream: quantity is now oneOf[DecimalString, Expr], ExprApplyFunc uses "input" field (renamed from "expr") - Update prompts: document expression-based quantity sizing (fixed-fraction and ATR-based examples), fix apply_func to use "input" not "expr" throughout - Remove unused ValidationError import Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 09:12:12 +02:00
rob thijssen	5146b3f764	fix: replace negligible 0.001 quantity with meaningful sizing guidance The previous example quantity "0.001" represented <1% of the $10k initial balance for BTC and near-zero exposure for ETH/SOL, making P&L and Sharpe results statistically meaningless. - Update Quantity section with instrument-appropriate reference values (BTC: 0.01 ≈ $800, ETH: 3.0 ≈ $600, SOL: 50.0 ≈ $700) - Replace "0.001" with "0.01" in all four working examples - Explain that 5–10% of $10k initial balance is the sizing target - Explicitly warn against "0.001" as it produces negligible exposure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 07:41:28 +02:00
rob thijssen	759439313e	fix: two Bollinger Band DSL errors from 50-iteration run - bollinger_upper/lower func Exprs must NOT include a "field" parameter; they compute from close internally. Setting "field":"bollinger_upper" causes API rejection: expected one of open/high/low/close/volume. - bollinger Condition "band" only accepts "above_upper" or "below_lower"; "above_lower" and "below_upper" are invalid variants. Both errors appeared repeatedly across the 50-iteration run, causing failed backtest submissions on every Bollinger crossover strategy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 07:39:09 +02:00
rob thijssen	9a7761b452	fix: add hma/ma to unsupported list, clarify quantity exit semantics - Add `hma` (Hull MA) and generic `ma` to unsupported func names — both were used by R1 and rejected by the API - Note that Hull MA can be approximated via apply_func with wma - Add `"all"` to the quantity placeholder blacklist; explain that exit rules must repeat the entry decimal — there is no "close all" concept Observed in run 2026-03-09T20:10:55: 2 iterations failed on hma/ma, 3 iterations skipped by client-side validation on quantity="all". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 20:23:30 +02:00
rob thijssen	8d53d6383d	fix: correct DSL mistakes from observed R1 failures - ADX: clarify it is a FuncName inside {"kind":"func","name":"adx",...}, not a Condition kind — with inline usage example (ADX > 25 filter) - Expr "kind" field: add explicit note that every Expr object requires "kind"; {"field":"close"} without "kind" is rejected by the API - MACD: add Example 4 showing full crossover strategy composed from bin_op(sub, ema12, ema26) and apply_func(ema,9) as signal line All three mistakes were observed across consecutive R1-32B runs and caused repeated API submission failures. Each prompt addition follows the same pattern as the successful bollinger_upper fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 20:11:05 +02:00
rob thijssen	55e41b6795	fix: log R1 thinking, catch repeated DSL errors, add unsupported indicators Three improvements from the 2026-03-09T18:45:04 run analysis: R1 thinking visibility (claude.rs, agent.rs) extract_think_content() returns the raw <think> block content before it is stripped. agent.rs logs it at DEBUG level so 'RUST_LOG=debug' lets you see why the model keeps repeating a mistake — currently the think block is silently discarded after stripping. Prompt: unsupported indicators and bollinger_upper Expr mistake (prompts.rs) - bollinger_upper / bollinger_lower used as {"kind":"bollinger_upper",...} was the dominant failure in iters 9-15. Added explicit correction: use {"kind":"func","name":"bollinger_upper","period":20} in Expr context, never as a standalone kind. - roc, hma, vwap, macd, cci, stoch are NOT in the swym schema. Added a clear "NOT supported" list alongside the supported func names. Repeated API error detection in diagnose_history (agent.rs) If the same "unknown variant `X`" error appears 2+ times in the last 4 iterations, a targeted diagnosis note is emitted naming the bad variant and pointing to the DSL reference. This surfaces in the next iteration prompt so the model gets actionable feedback before it wastes another backtest budget on the same mistake. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 18:58:50 +02:00
rob thijssen	51e452b607	feat: discover max_output_tokens from server at startup Instead of hardcoding per-family token budgets, ClaudeClient queries the server at startup and sets max_output_tokens = context_length / 2. Two discovery strategies, tried in order: 1. LM Studio /api/v1/models — returns loaded_instances[].config.context_length (the actually-configured context, e.g. 64000) and max_context_length (theoretical max, e.g. 131072). We prefer the loaded value. 2. OpenAI-compat /v1/models/{id} — used as fallback for non-LM Studio backends that expose context_length on the model object. If both fail, the family default is kept (DeepSeekR1=32768, Generic=8192). lmstudio_context_length() matches model IDs with and without quantization suffixes (@q4_k_m etc.) so the --model flag doesn't need to be exact. For the current R1-32B setup: loaded context=64000 → max_output_tokens=32000, giving the thinking pass plenty of room while reserving half for input. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 18:44:41 +02:00
rob thijssen	89f7ba66e0	feat: model-family-aware token budgets and prompt style Add ModelFamily enum (config.rs) detected from the model name: - DeepSeekR1: matched on "deepseek-r1", "r1-distill" — R1 thinking blocks consume thousands of output tokens before the JSON; max_output_tokens raised to 32768 and HTTP timeout to 300s; prompt tells the model its <think> output is stripped and only the bare JSON is used - Generic: previous behaviour (8192 tokens, 120s timeout) ClaudeClient stores the detected family and uses it for max_tokens and the request timeout. family() accessor lets the caller (agent.rs) pass it into system_prompt(). prompts::system_prompt() now accepts &ModelFamily and injects a family-specific "output format" section in place of the hardcoded "How to respond" block. New families can be added by extending the enum and the match arms without touching prompt logic elsewhere. Also: log full anyhow cause chain (:#) on JSON extraction failure and show response length alongside the truncated preview, to make future diagnosis easier. Root cause of the 2026-03-09T18:29:22 run failure: R1's thinking tokens counted against max_tokens:8192, leaving only ~500 chars for the actual JSON, which was always truncated mid-object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 18:39:51 +02:00
rob thijssen	6f4f864d28	fix: increase max_tokens to 8192 for R1 reasoning overhead R1 models use 500-2000 tokens for <think> blocks before the final response. 4096 was too tight — the model would exhaust the budget mid-thought and never emit the JSON. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 18:17:48 +02:00
rob thijssen	185cb4586e	fix: strip R1 think blocks before JSON extraction DeepSeek-R1 models emit <think>...</think> before their actual response. The brace-counting extractor would grab the first { inside the thinking block (which contains partial JSON fragments) rather than the final strategy JSON. strip_think_blocks() removes all <think>...</think> sections including unterminated blocks (truncated responses), leaving only the final output for extract_json to process. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 18:17:06 +02:00