Files
scout/docs/plan/cross-run-learning.md

134 lines
4.9 KiB
Markdown

# Plan: Cross-run learning via run ledger and compare endpoint
## Context
Scout currently starts from scratch every run — no memory of prior iterations. The upstream
patch `e47c18` adds:
1. **Enriched `result_summary`**: sortino_ratio, calmar_ratio, max_drawdown, pnl_return,
avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs
2. **Compare endpoint**: `GET /api/v1/paper-runs/compare?ids=uuid1,uuid2,...` returns
`RunMetricsSummary` for up to 50 runs in one call
Goal: persist enough state across runs so that iteration 1 of a new run starts informed by
all previous runs' strategies and outcomes.
## Changes
### 1. Run ledger — persist strategy + run_id per backtest (`src/agent.rs`)
After each successful `run_single_backtest`, append a JSONL entry to `{output_dir}/run_ledger.jsonl`:
```json
{"run_id":"uuid","instrument":"BTCUSDC","candle_interval":"4h","strategy":{...},"timestamp":"2026-03-10T12:38:15Z"}
```
One line per instrument-backtest (3 per iteration for 3 instruments). The strategy JSON is
duplicated across instrument entries for the same iteration — this keeps the format flat and
self-contained.
Use `OpenOptions::append(true).create(true)` — no locking needed since scout is single-threaded.
### 2. Load prior runs on startup (`src/agent.rs`)
At the top of `run()`, before the iteration loop:
1. Read `run_ledger.jsonl` if it exists (ignore if missing — first run)
2. Collect all `run_id`s
3. Call `swym.compare_runs(&run_ids)` (batching in groups of 50)
4. Join metrics back to strategies from the ledger
5. Group by strategy (entries with the same strategy JSON share an iteration)
6. Rank by average sharpe across instruments
7. Build a `prior_results_summary: Option<String>` for the initial prompt
### 3. Compare endpoint client (`src/swym.rs`)
Add `RunMetricsSummary` struct:
```rust
pub struct RunMetricsSummary {
pub id: Uuid,
pub status: String,
pub candle_interval: Option<String>,
pub total_positions: Option<u32>,
pub win_rate: Option<f64>,
pub profit_factor: Option<f64>,
pub net_pnl: Option<f64>,
pub sharpe_ratio: Option<f64>,
pub sortino_ratio: Option<f64>,
pub calmar_ratio: Option<f64>,
pub max_drawdown: Option<f64>,
pub pnl_return: Option<f64>,
pub avg_win: Option<f64>,
pub avg_loss: Option<f64>,
pub max_win: Option<f64>,
pub max_loss: Option<f64>,
pub avg_hold_duration_secs: Option<f64>,
}
```
Add `SwymClient::compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>>`:
- `GET {base_url}/paper-runs/compare?ids={comma_separated}`
- Parse JSON array response using `parse_number()` for decimal strings
### 4. Enrich `BacktestResult` with new fields (`src/swym.rs`)
Add to `BacktestResult`: `sortino_ratio`, `calmar_ratio`, `max_drawdown`, `pnl_return`,
`avg_win`, `avg_loss`, `max_win`, `max_loss`, `avg_hold_duration_secs`.
Parse all in `from_response()` via existing `parse_number()`.
Update `summary_line()` to include `max_dd={:.1}%` and `sortino={:.2}` when present —
these two are the most useful additions for the model's reasoning.
### 5. Prior-results-aware initial prompt (`src/prompts.rs`)
Modify `initial_prompt()` to accept `prior_summary: Option<&str>`.
When present, insert before the "Design a trading strategy" instruction:
```
## Learnings from {N} prior backtests across {M} strategies
{top 5 strategies ranked by avg sharpe, each showing:}
- Interval, rule count, avg metrics across instruments
- One-line description of the strategy approach (extracted from rule comments)
- Full strategy JSON for the top 1-2
{compact table of all prior strategies' avg metrics}
Use these insights to avoid repeating failed approaches and to build on what worked.
```
Limit to ~2000 tokens of prior context to avoid crowding the prompt. If many prior runs,
show only the top 5 + bottom 3 (worst performers to avoid), plus a count of total runs.
### 6. Ledger entry struct (`src/agent.rs`)
```rust
#[derive(Serialize, Deserialize)]
struct LedgerEntry {
run_id: Uuid,
instrument: String,
candle_interval: String,
strategy: Value,
timestamp: String,
}
```
## Files to modify
- `src/swym.rs``RunMetricsSummary` struct, `compare_runs()` method, enrich `BacktestResult`
with new fields, update `summary_line()`
- `src/agent.rs``LedgerEntry` struct, append-to-ledger after backtest, load-ledger-on-startup,
call compare endpoint, build prior summary, pass to initial prompt
- `src/prompts.rs``initial_prompt()` accepts optional prior summary
## Verification
1. `cargo build --release`
2. Run once → confirm `run_ledger.jsonl` is created with entries
3. Run again → confirm:
- Ledger is loaded, compare endpoint is called
- Iteration 1 prompt includes prior results summary (visible at debug log level)
- New entries are appended (not overwritten)
4. Check that enriched metrics (sortino, max_drawdown) appear in summary_line output