scout/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

`scout` is an autonomous strategy search agent for the [swym](https://swym.rs) backtesting platform. It runs a loop: asks Claude to generate trading strategies → submits backtests to swym → evaluates results → feeds learnings back → repeats. Promising strategies are automatically validated on out-of-sample data to filter overfitting.

## Architecture

### Core Modules

- **`agent.rs`** - Main orchestration logic. Contains the `run()` function that implements the search loop, strategy validation, and learning feedback. Key types: `IterationRecord`, `LedgerEntry`, `validate_strategy()`, `diagnose_history()`.
- **`claude.rs`** - Claude API client. Handles model communication, JSON extraction from responses, and context length detection for R1-family models with thinking blocks.
- **`swym.rs`** - Swym backtesting API client. Wraps all swym API calls: candle coverage, strategy validation, backtest submission, polling, and metrics retrieval.
- **`prompts.rs`** - System and user prompts for the LLM. Generates the DSL schema context and iteration-specific prompts with prior results.
- **`config.rs`** - CLI argument parsing and configuration. Defines `Cli` struct with all command-line flags and environment variables.

### Key Data Flows

1. **Strategy Generation**: `agent::run()` → `claude::chat()` → extracts JSON strategy → validates → submits to swym
2. **Backtest Execution**: `swym::submit_backtest()` → `swym::poll_until_done()` → `BacktestResult::from_response()`
3. **Learning Loop**: `load_prior_summary()` reads `run_ledger.jsonl` → fetches metrics via `swym::compare_runs()` → formats compact summary → appends to iteration prompt
4. **OOS Validation**: Promising in-sample results trigger re-backtest on held-out data → strategies passing both phases saved to `validated_*.json`

### Important Patterns

- **Deduplication**: Strategies are deduplicated by full JSON serialization using a HashMap (`tested_strategies`). Identical strategies are skipped with a warning.
- **Validation**: Two-stage validation—client-side (structure, quantity parsing, exit rules) and server-side (DSL schema validation via `/strategies/validate`).
- **Context Management**: Conversation history is trimmed to keep last 6 messages (3 exchanges) to avoid token limits. Prior results are summarized in the next prompt.
- **Error Recovery**: Consecutive failures (3×) trigger abort. Transient API errors are logged but don't stop the run.
- **Ledger Persistence**: Each backtest writes a `LedgerEntry` to `run_ledger.jsonl` for cross-run learning. Uses atomic O_APPEND writes.

## Development Commands

```bash
# Build
cargo build

# Run with default config
cargo run

# Run with custom flags
cargo run -- \
  --swym-url https://dev.swym.hanzalova.internal/api/v1 \
  --max-iterations 50 \
  --instruments binance_spot:BTCUSDC,binance_spot:ETHUSDC

# Run tests
cargo test

# Run with debug logging
RUST_LOG=debug cargo run
```

## DSL Schema

Strategies are JSON objects with the schema defined in `src/dsl-schema.json`. The DSL uses a rule-based structure with `when` (entry conditions) and `then` (exit actions). Key concepts:

- **Indicators**: `{"kind":"indicator","name":"...","params":{...}}`
- **Comparators**: `{"kind":"compare","lhs":"...","op":"...","rhs":"..."}`
- **Functions**: `{"kind":"func","name":"...","args":[...]}`

See `src/dsl-schema.json` for the complete schema and `prompts.rs::system_prompt()` for how it's presented to Claude.

## Model Families

The code supports different Claude model families via `ModelFamily` enum in `config.rs`:

- **Sonnet**: Standard model, no special handling
- **Opus**: Larger context, higher cost
- **R1**: Has thinking blocks (`<think>...</think>`) that need to be stripped before JSON extraction

Context length is auto-detected from the server's `/api/v1/models` endpoint (LM Studio) or `/v1/models/{id}` (OpenAI-compatible). Output token budget is set to half the context window.

## Output Files

- `strategy_001.json` through `strategy_NNN.json` - Every strategy attempted (full JSON)
- `validated_001.json` through `validated_NNN.json` - Strategies that passed OOS validation (includes in-sample + OOS metrics)
- `best_strategy.json` - Strategy with highest average Sharpe across instruments
- `run_ledger.jsonl` - Persistent record of all backtests for learning across runs

## Common Tasks

### Adding a new CLI flag

1. Add field to `Cli` struct in `config.rs`
2. Add clap derive attribute with `#[arg(short, long, env = "VAR_NAME")]`
3. Use the flag in `agent::run()` via `cli.flag_name`

### Extending the DSL

1. Update `src/dsl-schema.json` with new expression kinds
2. Add validation logic in `validate_strategy()` if needed
3. Update prompts in `prompts.rs` to guide the model

### Modifying the learning loop

1. Edit `load_prior_summary()` in `agent.rs` to change how prior results are formatted
2. Adjust `diagnose_history()` to add new diagnostics or change convergence detection
3. Update `prompts.rs::iteration_prompt()` to incorporate new information

### Adding new validation checks

Add to `validate_strategy()` in `agent.rs`. Returns `(hard_errors, warnings)` where hard errors block submission and warnings are logged but allow the backtest to proceed.

## Testing Strategy

The codebase uses `anyhow` for error handling and `tracing` for logging. Key test areas:

- Strategy JSON extraction from various response formats
- Context length detection from LM Studio/OpenAI endpoints
- Ledger entry serialization/deserialization
- Backtest result parsing from swym API responses
- Deduplication logic
- Convergence detection in `diagnose_history()`