Compare commits
2 Commits
75c95f7935
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
11fe79ed25
|
|||
|
fcb9a2f553
|
116
CLAUDE.md
Normal file
116
CLAUDE.md
Normal file
@@ -0,0 +1,116 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
`scout` is an autonomous strategy search agent for the [swym](https://swym.rs) backtesting platform. It runs a loop: asks Claude to generate trading strategies → submits backtests to swym → evaluates results → feeds learnings back → repeats. Promising strategies are automatically validated on out-of-sample data to filter overfitting.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Modules
|
||||
|
||||
- **`agent.rs`** - Main orchestration logic. Contains the `run()` function that implements the search loop, strategy validation, and learning feedback. Key types: `IterationRecord`, `LedgerEntry`, `validate_strategy()`, `diagnose_history()`.
|
||||
- **`claude.rs`** - Claude API client. Handles model communication, JSON extraction from responses, and context length detection for R1-family models with thinking blocks.
|
||||
- **`swym.rs`** - Swym backtesting API client. Wraps all swym API calls: candle coverage, strategy validation, backtest submission, polling, and metrics retrieval.
|
||||
- **`prompts.rs`** - System and user prompts for the LLM. Generates the DSL schema context and iteration-specific prompts with prior results.
|
||||
- **`config.rs`** - CLI argument parsing and configuration. Defines `Cli` struct with all command-line flags and environment variables.
|
||||
|
||||
### Key Data Flows
|
||||
|
||||
1. **Strategy Generation**: `agent::run()` → `claude::chat()` → extracts JSON strategy → validates → submits to swym
|
||||
2. **Backtest Execution**: `swym::submit_backtest()` → `swym::poll_until_done()` → `BacktestResult::from_response()`
|
||||
3. **Learning Loop**: `load_prior_summary()` reads `run_ledger.jsonl` → fetches metrics via `swym::compare_runs()` → formats compact summary → appends to iteration prompt
|
||||
4. **OOS Validation**: Promising in-sample results trigger re-backtest on held-out data → strategies passing both phases saved to `validated_*.json`
|
||||
|
||||
### Important Patterns
|
||||
|
||||
- **Deduplication**: Strategies are deduplicated by full JSON serialization using a HashMap (`tested_strategies`). Identical strategies are skipped with a warning.
|
||||
- **Validation**: Two-stage validation—client-side (structure, quantity parsing, exit rules) and server-side (DSL schema validation via `/strategies/validate`).
|
||||
- **Context Management**: Conversation history is trimmed to keep last 6 messages (3 exchanges) to avoid token limits. Prior results are summarized in the next prompt.
|
||||
- **Error Recovery**: Consecutive failures (3×) trigger abort. Transient API errors are logged but don't stop the run.
|
||||
- **Ledger Persistence**: Each backtest writes a `LedgerEntry` to `run_ledger.jsonl` for cross-run learning. Uses atomic O_APPEND writes.
|
||||
|
||||
## Development Commands
|
||||
|
||||
```bash
|
||||
# Build
|
||||
cargo build
|
||||
|
||||
# Run with default config
|
||||
cargo run
|
||||
|
||||
# Run with custom flags
|
||||
cargo run -- \
|
||||
--swym-url https://dev.swym.hanzalova.internal/api/v1 \
|
||||
--max-iterations 50 \
|
||||
--instruments binance_spot:BTCUSDC,binance_spot:ETHUSDC
|
||||
|
||||
# Run tests
|
||||
cargo test
|
||||
|
||||
# Run with debug logging
|
||||
RUST_LOG=debug cargo run
|
||||
```
|
||||
|
||||
## DSL Schema
|
||||
|
||||
Strategies are JSON objects with the schema defined in `src/dsl-schema.json`. The DSL uses a rule-based structure with `when` (entry conditions) and `then` (exit actions). Key concepts:
|
||||
|
||||
- **Indicators**: `{"kind":"indicator","name":"...","params":{...}}`
|
||||
- **Comparators**: `{"kind":"compare","lhs":"...","op":"...","rhs":"..."}`
|
||||
- **Functions**: `{"kind":"func","name":"...","args":[...]}`
|
||||
|
||||
See `src/dsl-schema.json` for the complete schema and `prompts.rs::system_prompt()` for how it's presented to Claude.
|
||||
|
||||
## Model Families
|
||||
|
||||
The code supports different Claude model families via `ModelFamily` enum in `config.rs`:
|
||||
|
||||
- **Sonnet**: Standard model, no special handling
|
||||
- **Opus**: Larger context, higher cost
|
||||
- **R1**: Has thinking blocks (`<think>...</think>`) that need to be stripped before JSON extraction
|
||||
|
||||
Context length is auto-detected from the server's `/api/v1/models` endpoint (LM Studio) or `/v1/models/{id}` (OpenAI-compatible). Output token budget is set to half the context window.
|
||||
|
||||
## Output Files
|
||||
|
||||
- `strategy_001.json` through `strategy_NNN.json` - Every strategy attempted (full JSON)
|
||||
- `validated_001.json` through `validated_NNN.json` - Strategies that passed OOS validation (includes in-sample + OOS metrics)
|
||||
- `best_strategy.json` - Strategy with highest average Sharpe across instruments
|
||||
- `run_ledger.jsonl` - Persistent record of all backtests for learning across runs
|
||||
|
||||
## Common Tasks
|
||||
|
||||
### Adding a new CLI flag
|
||||
|
||||
1. Add field to `Cli` struct in `config.rs`
|
||||
2. Add clap derive attribute with `#[arg(short, long, env = "VAR_NAME")]`
|
||||
3. Use the flag in `agent::run()` via `cli.flag_name`
|
||||
|
||||
### Extending the DSL
|
||||
|
||||
1. Update `src/dsl-schema.json` with new expression kinds
|
||||
2. Add validation logic in `validate_strategy()` if needed
|
||||
3. Update prompts in `prompts.rs` to guide the model
|
||||
|
||||
### Modifying the learning loop
|
||||
|
||||
1. Edit `load_prior_summary()` in `agent.rs` to change how prior results are formatted
|
||||
2. Adjust `diagnose_history()` to add new diagnostics or change convergence detection
|
||||
3. Update `prompts.rs::iteration_prompt()` to incorporate new information
|
||||
|
||||
### Adding new validation checks
|
||||
|
||||
Add to `validate_strategy()` in `agent.rs`. Returns `(hard_errors, warnings)` where hard errors block submission and warnings are logged but allow the backtest to proceed.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
The codebase uses `anyhow` for error handling and `tracing` for logging. Key test areas:
|
||||
|
||||
- Strategy JSON extraction from various response formats
|
||||
- Context length detection from LM Studio/OpenAI endpoints
|
||||
- Ledger entry serialization/deserialization
|
||||
- Backtest result parsing from swym API responses
|
||||
- Deduplication logic
|
||||
- Convergence detection in `diagnose_history()`
|
||||
23
src/agent.rs
23
src/agent.rs
@@ -218,6 +218,8 @@ pub async fn run(cli: &Cli) -> Result<()> {
|
||||
let mut conversation: Vec<Message> = Vec::new();
|
||||
let mut best_strategy: Option<(f64, Value)> = None; // (avg_sharpe, strategy)
|
||||
let mut consecutive_failures = 0u32;
|
||||
// Deduplication: track canonical strategy JSON → first iteration it was tested.
|
||||
let mut tested_strategies: std::collections::HashMap<String, u32> = std::collections::HashMap::new();
|
||||
|
||||
let instrument_names: Vec<String> = instruments.iter().map(|i| i.symbol.clone()).collect();
|
||||
|
||||
@@ -392,6 +394,27 @@ pub async fn run(cli: &Cli) -> Result<()> {
|
||||
}
|
||||
}
|
||||
|
||||
// Deduplication check: skip strategies identical to one already tested this run.
|
||||
let strategy_key = serde_json::to_string(&strategy).unwrap_or_default();
|
||||
if let Some(&first_iter) = tested_strategies.get(&strategy_key) {
|
||||
warn!("duplicate strategy (identical to iteration {first_iter}), skipping backtest");
|
||||
let record = IterationRecord {
|
||||
iteration,
|
||||
strategy: strategy.clone(),
|
||||
results: vec![],
|
||||
validation_notes: vec![format!(
|
||||
"DUPLICATE: this exact strategy was already tested in iteration {first_iter}. \
|
||||
You submitted identical JSON. You MUST design a completely different strategy — \
|
||||
different indicator family, different entry conditions, or different timeframe. \
|
||||
Do NOT submit the same JSON again."
|
||||
)],
|
||||
};
|
||||
info!("{}", record.summary());
|
||||
history.push(record);
|
||||
continue;
|
||||
}
|
||||
tested_strategies.insert(strategy_key, iteration);
|
||||
|
||||
// Run backtests against all instruments (in-sample)
|
||||
let mut results: Vec<BacktestResult> = Vec::new();
|
||||
|
||||
|
||||
@@ -103,6 +103,14 @@ Buy a fixed number of base units (semantic alias for a decimal string):
|
||||
"right":{{"kind":"func","name":"atr","period":14}}}}
|
||||
```
|
||||
|
||||
CRITICAL — ATR sizing and balance limits: `N/atr(14)` expresses quantity in BASE asset units.
|
||||
For BTC, 4h ATR ≈ $1500–3000. So `1000/atr(14)` ≈ 0.4–0.7 BTC ≈ $32k–56k notional —
|
||||
silently rejected on a $10k account (fill returns None, 0 positions open, no error shown).
|
||||
The numerator N represents your intended dollar risk per trade. For a $10k account keep N ≤ 200.
|
||||
`200/atr(14)` ≈ 0.07–0.13 BTC ≈ $5.6k–10k notional — fits within a $10k account.
|
||||
Prefer `percent_of_balance` for most sizing. Only reach for ATR-based Expr sizing when you need
|
||||
volatility-scaled position risk, and keep the numerator proportional to your risk tolerance.
|
||||
|
||||
**4. Exit rules** — use `position_quantity` to close the exact open size:
|
||||
```json
|
||||
{{"kind":"position_quantity"}}
|
||||
@@ -189,7 +197,11 @@ When I share results from previous iterations, use them to guide your next strat
|
||||
|
||||
- **Zero trades**: The entry conditions are too restrictive or never co-occur.
|
||||
Relax thresholds, simplify conditions, or check if the indicator periods make
|
||||
sense for the candle interval.
|
||||
sense for the candle interval. Also check your position sizing — if using an
|
||||
ATR-based Expr quantity (`N/atr(14)`), a large N can produce a notional value
|
||||
exceeding your account balance (e.g. `1000/atr(14)` on BTC ≈ 0.4 BTC ≈ $32k),
|
||||
which is silently rejected by the fill engine. Switch to `percent_of_balance`
|
||||
or reduce N to ≤ 200 for a $10k account.
|
||||
|
||||
- **Many trades but negative PnL**: The entry signal has no edge, or the exit
|
||||
logic is poor. Try different indicator combinations, add trend filters, or
|
||||
@@ -516,6 +528,10 @@ CRITICAL: `apply_func` uses `"input"`, not `"expr"`. Writing `"expr":` will be r
|
||||
- Spot markets are long-only: gate buy (entry) rules with state "flat" and sell (exit) rules with state "long". Never add a short-entry (sell when flat) rule on spot.
|
||||
- Futures markets support both directions: long entry = buy when flat; long exit = sell when long; short entry = sell when flat; short exit (cover) = buy when short. Always include a stop-loss and time exit for both long and short legs.
|
||||
- Never use a placeholder string for `quantity` — `"ATR_SIZED"`, `"FULL_BALANCE"`, `"dynamic"`, etc. are all invalid and will be rejected.
|
||||
- Don't use large ATR-based sizing numerators. `N/atr(14)` gives BASE units; for BTC (ATR ≈ $2000
|
||||
on 4h), `1000/atr(14)` ≈ 0.5 BTC ≈ $40k — silently rejected on a $10k account. Keep N ≤ 200
|
||||
or use `percent_of_balance`. The condition audit may show entry conditions firing while 0 positions
|
||||
open — this is the typical symptom of an oversized ATR quantity.
|
||||
- `{{"method":"position_quantity"}}` is WRONG for exit rules — use `{{"kind":"position_quantity"}}` (see Quantity section above).
|
||||
{futures_examples}"##,
|
||||
futures_examples = if has_futures { FUTURES_SHORT_EXAMPLES } else { "" },
|
||||
|
||||
Reference in New Issue
Block a user