docs: cross-run learnings plan

2026-03-10 13:04:13 +02:00
parent 6692bdb490
commit 609d64587b
1 changed files with 133 additions and 0 deletions
--- a/docs/plan/cross-run-learning.md
+++ b/docs/plan/cross-run-learning.md
@@ -0,0 +1,133 @@
+# Plan: Cross-run learning via run ledger and compare endpoint
+
+## Context
+
+Scout currently starts from scratch every run — no memory of prior iterations. The upstream
+patch `e47c18` adds:
+1. **Enriched `result_summary`**: sortino_ratio, calmar_ratio, max_drawdown, pnl_return,
+   avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs
+2. **Compare endpoint**: `GET /api/v1/paper-runs/compare?ids=uuid1,uuid2,...` returns
+   `RunMetricsSummary` for up to 50 runs in one call
+
+Goal: persist enough state across runs so that iteration 1 of a new run starts informed by
+all previous runs' strategies and outcomes.
+
+## Changes
+
+### 1. Run ledger — persist strategy + run_id per backtest (`src/agent.rs`)
+
+After each successful `run_single_backtest`, append a JSONL entry to `{output_dir}/run_ledger.jsonl`:
+
+```json
+{"run_id":"uuid","instrument":"BTCUSDC","candle_interval":"4h","strategy":{...},"timestamp":"2026-03-10T12:38:15Z"}
+```
+
+One line per instrument-backtest (3 per iteration for 3 instruments). The strategy JSON is
+duplicated across instrument entries for the same iteration — this keeps the format flat and
+self-contained.
+
+Use `OpenOptions::append(true).create(true)` — no locking needed since scout is single-threaded.
+
+### 2. Load prior runs on startup (`src/agent.rs`)
+
+At the top of `run()`, before the iteration loop:
+1. Read `run_ledger.jsonl` if it exists (ignore if missing — first run)
+2. Collect all `run_id`s
+3. Call `swym.compare_runs(&run_ids)` (batching in groups of 50)
+4. Join metrics back to strategies from the ledger
+5. Group by strategy (entries with the same strategy JSON share an iteration)
+6. Rank by average sharpe across instruments
+7. Build a `prior_results_summary: Option<String>` for the initial prompt
+
+### 3. Compare endpoint client (`src/swym.rs`)
+
+Add `RunMetricsSummary` struct:
+
+```rust
+pub struct RunMetricsSummary {
+    pub id: Uuid,
+    pub status: String,
+    pub candle_interval: Option<String>,
+    pub total_positions: Option<u32>,
+    pub win_rate: Option<f64>,
+    pub profit_factor: Option<f64>,
+    pub net_pnl: Option<f64>,
+    pub sharpe_ratio: Option<f64>,
+    pub sortino_ratio: Option<f64>,
+    pub calmar_ratio: Option<f64>,
+    pub max_drawdown: Option<f64>,
+    pub pnl_return: Option<f64>,
+    pub avg_win: Option<f64>,
+    pub avg_loss: Option<f64>,
+    pub max_win: Option<f64>,
+    pub max_loss: Option<f64>,
+    pub avg_hold_duration_secs: Option<f64>,
+}
+```
+
+Add `SwymClient::compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>>`:
+- `GET {base_url}/paper-runs/compare?ids={comma_separated}`
+- Parse JSON array response using `parse_number()` for decimal strings
+
+### 4. Enrich `BacktestResult` with new fields (`src/swym.rs`)
+
+Add to `BacktestResult`: `sortino_ratio`, `calmar_ratio`, `max_drawdown`, `pnl_return`,
+`avg_win`, `avg_loss`, `max_win`, `max_loss`, `avg_hold_duration_secs`.
+
+Parse all in `from_response()` via existing `parse_number()`.
+
+Update `summary_line()` to include `max_dd={:.1}%` and `sortino={:.2}` when present —
+these two are the most useful additions for the model's reasoning.
+
+### 5. Prior-results-aware initial prompt (`src/prompts.rs`)
+
+Modify `initial_prompt()` to accept `prior_summary: Option<&str>`.
+
+When present, insert before the "Design a trading strategy" instruction:
+
+```
+## Learnings from {N} prior backtests across {M} strategies
+
+{top 5 strategies ranked by avg sharpe, each showing:}
+- Interval, rule count, avg metrics across instruments
+- One-line description of the strategy approach (extracted from rule comments)
+- Full strategy JSON for the top 1-2
+
+{compact table of all prior strategies' avg metrics}
+
+Use these insights to avoid repeating failed approaches and to build on what worked.
+```
+
+Limit to ~2000 tokens of prior context to avoid crowding the prompt. If many prior runs,
+show only the top 5 + bottom 3 (worst performers to avoid), plus a count of total runs.
+
+### 6. Ledger entry struct (`src/agent.rs`)
+
+```rust
+#[derive(Serialize, Deserialize)]
+struct LedgerEntry {
+    run_id: Uuid,
+    instrument: String,
+    candle_interval: String,
+    strategy: Value,
+    timestamp: String,
+}
+```
+
+## Files to modify
+
+- `src/swym.rs` — `RunMetricsSummary` struct, `compare_runs()` method, enrich `BacktestResult`
+  with new fields, update `summary_line()`
+- `src/agent.rs` — `LedgerEntry` struct, append-to-ledger after backtest, load-ledger-on-startup,
+  call compare endpoint, build prior summary, pass to initial prompt
+- `src/prompts.rs` — `initial_prompt()` accepts optional prior summary
+
+## Verification
+
+1. `cargo build --release`
+2. Run once → confirm `run_ledger.jsonl` is created with entries
+3. Run again → confirm:
+   - Ledger is loaded, compare endpoint is called
+   - Iteration 1 prompt includes prior results summary (visible at debug log level)
+   - New entries are appended (not overwritten)
+4. Check that enriched metrics (sortino, max_drawdown) appear in summary_line output