Add comprehensive guidance document covering architecture, data flows, development commands, DSL schema reference, and common patterns for working with the scout strategy search agent. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
116 lines
5.7 KiB
Markdown
116 lines
5.7 KiB
Markdown
# CLAUDE.md
|
||
|
||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||
|
||
## Project Overview
|
||
|
||
`scout` is an autonomous strategy search agent for the [swym](https://swym.rs) backtesting platform. It runs a loop: asks Claude to generate trading strategies → submits backtests to swym → evaluates results → feeds learnings back → repeats. Promising strategies are automatically validated on out-of-sample data to filter overfitting.
|
||
|
||
## Architecture
|
||
|
||
### Core Modules
|
||
|
||
- **`agent.rs`** - Main orchestration logic. Contains the `run()` function that implements the search loop, strategy validation, and learning feedback. Key types: `IterationRecord`, `LedgerEntry`, `validate_strategy()`, `diagnose_history()`.
|
||
- **`claude.rs`** - Claude API client. Handles model communication, JSON extraction from responses, and context length detection for R1-family models with thinking blocks.
|
||
- **`swym.rs`** - Swym backtesting API client. Wraps all swym API calls: candle coverage, strategy validation, backtest submission, polling, and metrics retrieval.
|
||
- **`prompts.rs`** - System and user prompts for the LLM. Generates the DSL schema context and iteration-specific prompts with prior results.
|
||
- **`config.rs`** - CLI argument parsing and configuration. Defines `Cli` struct with all command-line flags and environment variables.
|
||
|
||
### Key Data Flows
|
||
|
||
1. **Strategy Generation**: `agent::run()` → `claude::chat()` → extracts JSON strategy → validates → submits to swym
|
||
2. **Backtest Execution**: `swym::submit_backtest()` → `swym::poll_until_done()` → `BacktestResult::from_response()`
|
||
3. **Learning Loop**: `load_prior_summary()` reads `run_ledger.jsonl` → fetches metrics via `swym::compare_runs()` → formats compact summary → appends to iteration prompt
|
||
4. **OOS Validation**: Promising in-sample results trigger re-backtest on held-out data → strategies passing both phases saved to `validated_*.json`
|
||
|
||
### Important Patterns
|
||
|
||
- **Deduplication**: Strategies are deduplicated by full JSON serialization using a HashMap (`tested_strategies`). Identical strategies are skipped with a warning.
|
||
- **Validation**: Two-stage validation—client-side (structure, quantity parsing, exit rules) and server-side (DSL schema validation via `/strategies/validate`).
|
||
- **Context Management**: Conversation history is trimmed to keep last 6 messages (3 exchanges) to avoid token limits. Prior results are summarized in the next prompt.
|
||
- **Error Recovery**: Consecutive failures (3×) trigger abort. Transient API errors are logged but don't stop the run.
|
||
- **Ledger Persistence**: Each backtest writes a `LedgerEntry` to `run_ledger.jsonl` for cross-run learning. Uses atomic O_APPEND writes.
|
||
|
||
## Development Commands
|
||
|
||
```bash
|
||
# Build
|
||
cargo build
|
||
|
||
# Run with default config
|
||
cargo run
|
||
|
||
# Run with custom flags
|
||
cargo run -- \
|
||
--swym-url https://dev.swym.hanzalova.internal/api/v1 \
|
||
--max-iterations 50 \
|
||
--instruments binance_spot:BTCUSDC,binance_spot:ETHUSDC
|
||
|
||
# Run tests
|
||
cargo test
|
||
|
||
# Run with debug logging
|
||
RUST_LOG=debug cargo run
|
||
```
|
||
|
||
## DSL Schema
|
||
|
||
Strategies are JSON objects with the schema defined in `src/dsl-schema.json`. The DSL uses a rule-based structure with `when` (entry conditions) and `then` (exit actions). Key concepts:
|
||
|
||
- **Indicators**: `{"kind":"indicator","name":"...","params":{...}}`
|
||
- **Comparators**: `{"kind":"compare","lhs":"...","op":"...","rhs":"..."}`
|
||
- **Functions**: `{"kind":"func","name":"...","args":[...]}`
|
||
|
||
See `src/dsl-schema.json` for the complete schema and `prompts.rs::system_prompt()` for how it's presented to Claude.
|
||
|
||
## Model Families
|
||
|
||
The code supports different Claude model families via `ModelFamily` enum in `config.rs`:
|
||
|
||
- **Sonnet**: Standard model, no special handling
|
||
- **Opus**: Larger context, higher cost
|
||
- **R1**: Has thinking blocks (`<think>...</think>`) that need to be stripped before JSON extraction
|
||
|
||
Context length is auto-detected from the server's `/api/v1/models` endpoint (LM Studio) or `/v1/models/{id}` (OpenAI-compatible). Output token budget is set to half the context window.
|
||
|
||
## Output Files
|
||
|
||
- `strategy_001.json` through `strategy_NNN.json` - Every strategy attempted (full JSON)
|
||
- `validated_001.json` through `validated_NNN.json` - Strategies that passed OOS validation (includes in-sample + OOS metrics)
|
||
- `best_strategy.json` - Strategy with highest average Sharpe across instruments
|
||
- `run_ledger.jsonl` - Persistent record of all backtests for learning across runs
|
||
|
||
## Common Tasks
|
||
|
||
### Adding a new CLI flag
|
||
|
||
1. Add field to `Cli` struct in `config.rs`
|
||
2. Add clap derive attribute with `#[arg(short, long, env = "VAR_NAME")]`
|
||
3. Use the flag in `agent::run()` via `cli.flag_name`
|
||
|
||
### Extending the DSL
|
||
|
||
1. Update `src/dsl-schema.json` with new expression kinds
|
||
2. Add validation logic in `validate_strategy()` if needed
|
||
3. Update prompts in `prompts.rs` to guide the model
|
||
|
||
### Modifying the learning loop
|
||
|
||
1. Edit `load_prior_summary()` in `agent.rs` to change how prior results are formatted
|
||
2. Adjust `diagnose_history()` to add new diagnostics or change convergence detection
|
||
3. Update `prompts.rs::iteration_prompt()` to incorporate new information
|
||
|
||
### Adding new validation checks
|
||
|
||
Add to `validate_strategy()` in `agent.rs`. Returns `(hard_errors, warnings)` where hard errors block submission and warnings are logged but allow the backtest to proceed.
|
||
|
||
## Testing Strategy
|
||
|
||
The codebase uses `anyhow` for error handling and `tracing` for logging. Key test areas:
|
||
|
||
- Strategy JSON extraction from various response formats
|
||
- Context length detection from LM Studio/OpenAI endpoints
|
||
- Ledger entry serialization/deserialization
|
||
- Backtest result parsing from swym API responses
|
||
- Deduplication logic
|
||
- Convergence detection in `diagnose_history()` |