docs: add CLAUDE.md for future Claude Code instances

Add comprehensive guidance document covering architecture, data flows, development commands, DSL schema reference, and common patterns for working with the scout strategy search agent. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
chore: attempt dedupe guidance in prompt
2026-03-12 05:38:28 +02:00 · 2026-03-11 18:15:24 +02:00
3 changed files with 156 additions and 1 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,116 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+`scout` is an autonomous strategy search agent for the [swym](https://swym.rs) backtesting platform. It runs a loop: asks Claude to generate trading strategies → submits backtests to swym → evaluates results → feeds learnings back → repeats. Promising strategies are automatically validated on out-of-sample data to filter overfitting.
+
+## Architecture
+
+### Core Modules
+
+- **`agent.rs`** - Main orchestration logic. Contains the `run()` function that implements the search loop, strategy validation, and learning feedback. Key types: `IterationRecord`, `LedgerEntry`, `validate_strategy()`, `diagnose_history()`.
+- **`claude.rs`** - Claude API client. Handles model communication, JSON extraction from responses, and context length detection for R1-family models with thinking blocks.
+- **`swym.rs`** - Swym backtesting API client. Wraps all swym API calls: candle coverage, strategy validation, backtest submission, polling, and metrics retrieval.
+- **`prompts.rs`** - System and user prompts for the LLM. Generates the DSL schema context and iteration-specific prompts with prior results.
+- **`config.rs`** - CLI argument parsing and configuration. Defines `Cli` struct with all command-line flags and environment variables.
+
+### Key Data Flows
+
+1. **Strategy Generation**: `agent::run()` → `claude::chat()` → extracts JSON strategy → validates → submits to swym
+2. **Backtest Execution**: `swym::submit_backtest()` → `swym::poll_until_done()` → `BacktestResult::from_response()`
+3. **Learning Loop**: `load_prior_summary()` reads `run_ledger.jsonl` → fetches metrics via `swym::compare_runs()` → formats compact summary → appends to iteration prompt
+4. **OOS Validation**: Promising in-sample results trigger re-backtest on held-out data → strategies passing both phases saved to `validated_*.json`
+
+### Important Patterns
+
+- **Deduplication**: Strategies are deduplicated by full JSON serialization using a HashMap (`tested_strategies`). Identical strategies are skipped with a warning.
+- **Validation**: Two-stage validation—client-side (structure, quantity parsing, exit rules) and server-side (DSL schema validation via `/strategies/validate`).
+- **Context Management**: Conversation history is trimmed to keep last 6 messages (3 exchanges) to avoid token limits. Prior results are summarized in the next prompt.
+- **Error Recovery**: Consecutive failures (3×) trigger abort. Transient API errors are logged but don't stop the run.
+- **Ledger Persistence**: Each backtest writes a `LedgerEntry` to `run_ledger.jsonl` for cross-run learning. Uses atomic O_APPEND writes.
+
+## Development Commands
+
+```bash
+# Build
+cargo build
+
+# Run with default config
+cargo run
+
+# Run with custom flags
+cargo run -- \
+  --swym-url https://dev.swym.hanzalova.internal/api/v1 \
+  --max-iterations 50 \
+  --instruments binance_spot:BTCUSDC,binance_spot:ETHUSDC
+
+# Run tests
+cargo test
+
+# Run with debug logging
+RUST_LOG=debug cargo run
+```
+
+## DSL Schema
+
+Strategies are JSON objects with the schema defined in `src/dsl-schema.json`. The DSL uses a rule-based structure with `when` (entry conditions) and `then` (exit actions). Key concepts:
+
+- **Indicators**: `{"kind":"indicator","name":"...","params":{...}}`
+- **Comparators**: `{"kind":"compare","lhs":"...","op":"...","rhs":"..."}`
+- **Functions**: `{"kind":"func","name":"...","args":[...]}`
+
+See `src/dsl-schema.json` for the complete schema and `prompts.rs::system_prompt()` for how it's presented to Claude.
+
+## Model Families
+
+The code supports different Claude model families via `ModelFamily` enum in `config.rs`:
+
+- **Sonnet**: Standard model, no special handling
+- **Opus**: Larger context, higher cost
+- **R1**: Has thinking blocks (`<think>...</think>`) that need to be stripped before JSON extraction
+
+Context length is auto-detected from the server's `/api/v1/models` endpoint (LM Studio) or `/v1/models/{id}` (OpenAI-compatible). Output token budget is set to half the context window.
+
+## Output Files
+
+- `strategy_001.json` through `strategy_NNN.json` - Every strategy attempted (full JSON)
+- `validated_001.json` through `validated_NNN.json` - Strategies that passed OOS validation (includes in-sample + OOS metrics)
+- `best_strategy.json` - Strategy with highest average Sharpe across instruments
+- `run_ledger.jsonl` - Persistent record of all backtests for learning across runs
+
+## Common Tasks
+
+### Adding a new CLI flag
+
+1. Add field to `Cli` struct in `config.rs`
+2. Add clap derive attribute with `#[arg(short, long, env = "VAR_NAME")]`
+3. Use the flag in `agent::run()` via `cli.flag_name`
+
+### Extending the DSL
+
+1. Update `src/dsl-schema.json` with new expression kinds
+2. Add validation logic in `validate_strategy()` if needed
+3. Update prompts in `prompts.rs` to guide the model
+
+### Modifying the learning loop
+
+1. Edit `load_prior_summary()` in `agent.rs` to change how prior results are formatted
+2. Adjust `diagnose_history()` to add new diagnostics or change convergence detection
+3. Update `prompts.rs::iteration_prompt()` to incorporate new information
+
+### Adding new validation checks
+
+Add to `validate_strategy()` in `agent.rs`. Returns `(hard_errors, warnings)` where hard errors block submission and warnings are logged but allow the backtest to proceed.
+
+## Testing Strategy
+
+The codebase uses `anyhow` for error handling and `tracing` for logging. Key test areas:
+
+- Strategy JSON extraction from various response formats
+- Context length detection from LM Studio/OpenAI endpoints
+- Ledger entry serialization/deserialization
+- Backtest result parsing from swym API responses
+- Deduplication logic
+- Convergence detection in `diagnose_history()`
--- a/src/agent.rs
+++ b/src/agent.rs
@@ -218,6 +218,8 @@ pub async fn run(cli: &Cli) -> Result<()> {
    let mut conversation: Vec<Message> = Vec::new();
    let mut best_strategy: Option<(f64, Value)> = None; // (avg_sharpe, strategy)
    let mut consecutive_failures = 0u32;
+    // Deduplication: track canonical strategy JSON → first iteration it was tested.
+    let mut tested_strategies: std::collections::HashMap<String, u32> = std::collections::HashMap::new();

    let instrument_names: Vec<String> = instruments.iter().map(|i| i.symbol.clone()).collect();

@@ -392,6 +394,27 @@ pub async fn run(cli: &Cli) -> Result<()> {
            }
        }

+        // Deduplication check: skip strategies identical to one already tested this run.
+        let strategy_key = serde_json::to_string(&strategy).unwrap_or_default();
+        if let Some(&first_iter) = tested_strategies.get(&strategy_key) {
+            warn!("duplicate strategy (identical to iteration {first_iter}), skipping backtest");
+            let record = IterationRecord {
+                iteration,
+                strategy: strategy.clone(),
+                results: vec![],
+                validation_notes: vec![format!(
+                    "DUPLICATE: this exact strategy was already tested in iteration {first_iter}. \
+                     You submitted identical JSON. You MUST design a completely different strategy — \
+                     different indicator family, different entry conditions, or different timeframe. \
+                     Do NOT submit the same JSON again."
+                )],
+            };
+            info!("{}", record.summary());
+            history.push(record);
+            continue;
+        }
+        tested_strategies.insert(strategy_key, iteration);
+
        // Run backtests against all instruments (in-sample)
        let mut results: Vec<BacktestResult> = Vec::new();

--- a/src/prompts.rs
+++ b/src/prompts.rs
@@ -103,6 +103,14 @@ Buy a fixed number of base units (semantic alias for a decimal string):
  "right":{{"kind":"func","name":"atr","period":14}}}}
 ```

+CRITICAL — ATR sizing and balance limits: `N/atr(14)` expresses quantity in BASE asset units.
+For BTC, 4h ATR ≈ $1500–3000. So `1000/atr(14)` ≈ 0.4–0.7 BTC ≈ $32k–56k notional —
+silently rejected on a $10k account (fill returns None, 0 positions open, no error shown).
+The numerator N represents your intended dollar risk per trade. For a $10k account keep N ≤ 200.
+`200/atr(14)` ≈ 0.07–0.13 BTC ≈ $5.6k–10k notional — fits within a $10k account.
+Prefer `percent_of_balance` for most sizing. Only reach for ATR-based Expr sizing when you need
+volatility-scaled position risk, and keep the numerator proportional to your risk tolerance.
+
 **4. Exit rules** — use `position_quantity` to close the exact open size:
 ```json
 {{"kind":"position_quantity"}}
@@ -189,7 +197,11 @@ When I share results from previous iterations, use them to guide your next strat

 - **Zero trades**: The entry conditions are too restrictive or never co-occur.
  Relax thresholds, simplify conditions, or check if the indicator periods make
-  sense for the candle interval.
+  sense for the candle interval. Also check your position sizing — if using an
+  ATR-based Expr quantity (`N/atr(14)`), a large N can produce a notional value
+  exceeding your account balance (e.g. `1000/atr(14)` on BTC ≈ 0.4 BTC ≈ $32k),
+  which is silently rejected by the fill engine. Switch to `percent_of_balance`
+  or reduce N to ≤ 200 for a $10k account.

 - **Many trades but negative PnL**: The entry signal has no edge, or the exit
  logic is poor. Try different indicator combinations, add trend filters, or
@@ -516,6 +528,10 @@ CRITICAL: `apply_func` uses `"input"`, not `"expr"`. Writing `"expr":` will be r
 - Spot markets are long-only: gate buy (entry) rules with state "flat" and sell (exit) rules with state "long". Never add a short-entry (sell when flat) rule on spot.
 - Futures markets support both directions: long entry = buy when flat; long exit = sell when long; short entry = sell when flat; short exit (cover) = buy when short. Always include a stop-loss and time exit for both long and short legs.
 - Never use a placeholder string for `quantity` — `"ATR_SIZED"`, `"FULL_BALANCE"`, `"dynamic"`, etc. are all invalid and will be rejected.
+- Don't use large ATR-based sizing numerators. `N/atr(14)` gives BASE units; for BTC (ATR ≈ $2000
+  on 4h), `1000/atr(14)` ≈ 0.5 BTC ≈ $40k — silently rejected on a $10k account. Keep N ≤ 200
+  or use `percent_of_balance`. The condition audit may show entry conditions firing while 0 positions
+  open — this is the typical symptom of an oversized ATR quantity.
 - `{{"method":"position_quantity"}}` is WRONG for exit rules — use `{{"kind":"position_quantity"}}` (see Quantity section above).
 {futures_examples}"##,
        futures_examples = if has_futures { FUTURES_SHORT_EXAMPLES } else { "" },