diff --git a/docs/plan/cross-run-learning.md b/docs/plan/cross-run-learning.md new file mode 100644 index 0000000..2f3709c --- /dev/null +++ b/docs/plan/cross-run-learning.md @@ -0,0 +1,133 @@ +# Plan: Cross-run learning via run ledger and compare endpoint + +## Context + +Scout currently starts from scratch every run — no memory of prior iterations. The upstream +patch `e47c18` adds: +1. **Enriched `result_summary`**: sortino_ratio, calmar_ratio, max_drawdown, pnl_return, + avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs +2. **Compare endpoint**: `GET /api/v1/paper-runs/compare?ids=uuid1,uuid2,...` returns + `RunMetricsSummary` for up to 50 runs in one call + +Goal: persist enough state across runs so that iteration 1 of a new run starts informed by +all previous runs' strategies and outcomes. + +## Changes + +### 1. Run ledger — persist strategy + run_id per backtest (`src/agent.rs`) + +After each successful `run_single_backtest`, append a JSONL entry to `{output_dir}/run_ledger.jsonl`: + +```json +{"run_id":"uuid","instrument":"BTCUSDC","candle_interval":"4h","strategy":{...},"timestamp":"2026-03-10T12:38:15Z"} +``` + +One line per instrument-backtest (3 per iteration for 3 instruments). The strategy JSON is +duplicated across instrument entries for the same iteration — this keeps the format flat and +self-contained. + +Use `OpenOptions::append(true).create(true)` — no locking needed since scout is single-threaded. + +### 2. Load prior runs on startup (`src/agent.rs`) + +At the top of `run()`, before the iteration loop: +1. Read `run_ledger.jsonl` if it exists (ignore if missing — first run) +2. Collect all `run_id`s +3. Call `swym.compare_runs(&run_ids)` (batching in groups of 50) +4. Join metrics back to strategies from the ledger +5. Group by strategy (entries with the same strategy JSON share an iteration) +6. Rank by average sharpe across instruments +7. Build a `prior_results_summary: Option` for the initial prompt + +### 3. Compare endpoint client (`src/swym.rs`) + +Add `RunMetricsSummary` struct: + +```rust +pub struct RunMetricsSummary { + pub id: Uuid, + pub status: String, + pub candle_interval: Option, + pub total_positions: Option, + pub win_rate: Option, + pub profit_factor: Option, + pub net_pnl: Option, + pub sharpe_ratio: Option, + pub sortino_ratio: Option, + pub calmar_ratio: Option, + pub max_drawdown: Option, + pub pnl_return: Option, + pub avg_win: Option, + pub avg_loss: Option, + pub max_win: Option, + pub max_loss: Option, + pub avg_hold_duration_secs: Option, +} +``` + +Add `SwymClient::compare_runs(&self, run_ids: &[Uuid]) -> Result>`: +- `GET {base_url}/paper-runs/compare?ids={comma_separated}` +- Parse JSON array response using `parse_number()` for decimal strings + +### 4. Enrich `BacktestResult` with new fields (`src/swym.rs`) + +Add to `BacktestResult`: `sortino_ratio`, `calmar_ratio`, `max_drawdown`, `pnl_return`, +`avg_win`, `avg_loss`, `max_win`, `max_loss`, `avg_hold_duration_secs`. + +Parse all in `from_response()` via existing `parse_number()`. + +Update `summary_line()` to include `max_dd={:.1}%` and `sortino={:.2}` when present — +these two are the most useful additions for the model's reasoning. + +### 5. Prior-results-aware initial prompt (`src/prompts.rs`) + +Modify `initial_prompt()` to accept `prior_summary: Option<&str>`. + +When present, insert before the "Design a trading strategy" instruction: + +``` +## Learnings from {N} prior backtests across {M} strategies + +{top 5 strategies ranked by avg sharpe, each showing:} +- Interval, rule count, avg metrics across instruments +- One-line description of the strategy approach (extracted from rule comments) +- Full strategy JSON for the top 1-2 + +{compact table of all prior strategies' avg metrics} + +Use these insights to avoid repeating failed approaches and to build on what worked. +``` + +Limit to ~2000 tokens of prior context to avoid crowding the prompt. If many prior runs, +show only the top 5 + bottom 3 (worst performers to avoid), plus a count of total runs. + +### 6. Ledger entry struct (`src/agent.rs`) + +```rust +#[derive(Serialize, Deserialize)] +struct LedgerEntry { + run_id: Uuid, + instrument: String, + candle_interval: String, + strategy: Value, + timestamp: String, +} +``` + +## Files to modify + +- `src/swym.rs` — `RunMetricsSummary` struct, `compare_runs()` method, enrich `BacktestResult` + with new fields, update `summary_line()` +- `src/agent.rs` — `LedgerEntry` struct, append-to-ledger after backtest, load-ledger-on-startup, + call compare endpoint, build prior summary, pass to initial prompt +- `src/prompts.rs` — `initial_prompt()` accepts optional prior summary + +## Verification + +1. `cargo build --release` +2. Run once → confirm `run_ledger.jsonl` is created with entries +3. Run again → confirm: + - Ledger is loaded, compare endpoint is called + - Iteration 1 prompt includes prior results summary (visible at debug log level) + - New entries are appended (not overwritten) +4. Check that enriched metrics (sortino, max_drawdown) appear in summary_line output