diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..ae32e00 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,116 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +`scout` is an autonomous strategy search agent for the [swym](https://swym.rs) backtesting platform. It runs a loop: asks Claude to generate trading strategies → submits backtests to swym → evaluates results → feeds learnings back → repeats. Promising strategies are automatically validated on out-of-sample data to filter overfitting. + +## Architecture + +### Core Modules + +- **`agent.rs`** - Main orchestration logic. Contains the `run()` function that implements the search loop, strategy validation, and learning feedback. Key types: `IterationRecord`, `LedgerEntry`, `validate_strategy()`, `diagnose_history()`. +- **`claude.rs`** - Claude API client. Handles model communication, JSON extraction from responses, and context length detection for R1-family models with thinking blocks. +- **`swym.rs`** - Swym backtesting API client. Wraps all swym API calls: candle coverage, strategy validation, backtest submission, polling, and metrics retrieval. +- **`prompts.rs`** - System and user prompts for the LLM. Generates the DSL schema context and iteration-specific prompts with prior results. +- **`config.rs`** - CLI argument parsing and configuration. Defines `Cli` struct with all command-line flags and environment variables. + +### Key Data Flows + +1. **Strategy Generation**: `agent::run()` → `claude::chat()` → extracts JSON strategy → validates → submits to swym +2. **Backtest Execution**: `swym::submit_backtest()` → `swym::poll_until_done()` → `BacktestResult::from_response()` +3. **Learning Loop**: `load_prior_summary()` reads `run_ledger.jsonl` → fetches metrics via `swym::compare_runs()` → formats compact summary → appends to iteration prompt +4. **OOS Validation**: Promising in-sample results trigger re-backtest on held-out data → strategies passing both phases saved to `validated_*.json` + +### Important Patterns + +- **Deduplication**: Strategies are deduplicated by full JSON serialization using a HashMap (`tested_strategies`). Identical strategies are skipped with a warning. +- **Validation**: Two-stage validation—client-side (structure, quantity parsing, exit rules) and server-side (DSL schema validation via `/strategies/validate`). +- **Context Management**: Conversation history is trimmed to keep last 6 messages (3 exchanges) to avoid token limits. Prior results are summarized in the next prompt. +- **Error Recovery**: Consecutive failures (3×) trigger abort. Transient API errors are logged but don't stop the run. +- **Ledger Persistence**: Each backtest writes a `LedgerEntry` to `run_ledger.jsonl` for cross-run learning. Uses atomic O_APPEND writes. + +## Development Commands + +```bash +# Build +cargo build + +# Run with default config +cargo run + +# Run with custom flags +cargo run -- \ + --swym-url https://dev.swym.hanzalova.internal/api/v1 \ + --max-iterations 50 \ + --instruments binance_spot:BTCUSDC,binance_spot:ETHUSDC + +# Run tests +cargo test + +# Run with debug logging +RUST_LOG=debug cargo run +``` + +## DSL Schema + +Strategies are JSON objects with the schema defined in `src/dsl-schema.json`. The DSL uses a rule-based structure with `when` (entry conditions) and `then` (exit actions). Key concepts: + +- **Indicators**: `{"kind":"indicator","name":"...","params":{...}}` +- **Comparators**: `{"kind":"compare","lhs":"...","op":"...","rhs":"..."}` +- **Functions**: `{"kind":"func","name":"...","args":[...]}` + +See `src/dsl-schema.json` for the complete schema and `prompts.rs::system_prompt()` for how it's presented to Claude. + +## Model Families + +The code supports different Claude model families via `ModelFamily` enum in `config.rs`: + +- **Sonnet**: Standard model, no special handling +- **Opus**: Larger context, higher cost +- **R1**: Has thinking blocks (`...`) that need to be stripped before JSON extraction + +Context length is auto-detected from the server's `/api/v1/models` endpoint (LM Studio) or `/v1/models/{id}` (OpenAI-compatible). Output token budget is set to half the context window. + +## Output Files + +- `strategy_001.json` through `strategy_NNN.json` - Every strategy attempted (full JSON) +- `validated_001.json` through `validated_NNN.json` - Strategies that passed OOS validation (includes in-sample + OOS metrics) +- `best_strategy.json` - Strategy with highest average Sharpe across instruments +- `run_ledger.jsonl` - Persistent record of all backtests for learning across runs + +## Common Tasks + +### Adding a new CLI flag + +1. Add field to `Cli` struct in `config.rs` +2. Add clap derive attribute with `#[arg(short, long, env = "VAR_NAME")]` +3. Use the flag in `agent::run()` via `cli.flag_name` + +### Extending the DSL + +1. Update `src/dsl-schema.json` with new expression kinds +2. Add validation logic in `validate_strategy()` if needed +3. Update prompts in `prompts.rs` to guide the model + +### Modifying the learning loop + +1. Edit `load_prior_summary()` in `agent.rs` to change how prior results are formatted +2. Adjust `diagnose_history()` to add new diagnostics or change convergence detection +3. Update `prompts.rs::iteration_prompt()` to incorporate new information + +### Adding new validation checks + +Add to `validate_strategy()` in `agent.rs`. Returns `(hard_errors, warnings)` where hard errors block submission and warnings are logged but allow the backtest to proceed. + +## Testing Strategy + +The codebase uses `anyhow` for error handling and `tracing` for logging. Key test areas: + +- Strategy JSON extraction from various response formats +- Context length detection from LM Studio/OpenAI endpoints +- Ledger entry serialization/deserialization +- Backtest result parsing from swym API responses +- Deduplication logic +- Convergence detection in `diagnose_history()` \ No newline at end of file