Files
scout/docs/plan/cross-run-learning.md

4.9 KiB

Plan: Cross-run learning via run ledger and compare endpoint

Context

Scout currently starts from scratch every run — no memory of prior iterations. The upstream patch e47c18 adds:

  1. Enriched result_summary: sortino_ratio, calmar_ratio, max_drawdown, pnl_return, avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs
  2. Compare endpoint: GET /api/v1/paper-runs/compare?ids=uuid1,uuid2,... returns RunMetricsSummary for up to 50 runs in one call

Goal: persist enough state across runs so that iteration 1 of a new run starts informed by all previous runs' strategies and outcomes.

Changes

1. Run ledger — persist strategy + run_id per backtest (src/agent.rs)

After each successful run_single_backtest, append a JSONL entry to {output_dir}/run_ledger.jsonl:

{"run_id":"uuid","instrument":"BTCUSDC","candle_interval":"4h","strategy":{...},"timestamp":"2026-03-10T12:38:15Z"}

One line per instrument-backtest (3 per iteration for 3 instruments). The strategy JSON is duplicated across instrument entries for the same iteration — this keeps the format flat and self-contained.

Use OpenOptions::append(true).create(true) — no locking needed since scout is single-threaded.

2. Load prior runs on startup (src/agent.rs)

At the top of run(), before the iteration loop:

  1. Read run_ledger.jsonl if it exists (ignore if missing — first run)
  2. Collect all run_ids
  3. Call swym.compare_runs(&run_ids) (batching in groups of 50)
  4. Join metrics back to strategies from the ledger
  5. Group by strategy (entries with the same strategy JSON share an iteration)
  6. Rank by average sharpe across instruments
  7. Build a prior_results_summary: Option<String> for the initial prompt

3. Compare endpoint client (src/swym.rs)

Add RunMetricsSummary struct:

pub struct RunMetricsSummary {
    pub id: Uuid,
    pub status: String,
    pub candle_interval: Option<String>,
    pub total_positions: Option<u32>,
    pub win_rate: Option<f64>,
    pub profit_factor: Option<f64>,
    pub net_pnl: Option<f64>,
    pub sharpe_ratio: Option<f64>,
    pub sortino_ratio: Option<f64>,
    pub calmar_ratio: Option<f64>,
    pub max_drawdown: Option<f64>,
    pub pnl_return: Option<f64>,
    pub avg_win: Option<f64>,
    pub avg_loss: Option<f64>,
    pub max_win: Option<f64>,
    pub max_loss: Option<f64>,
    pub avg_hold_duration_secs: Option<f64>,
}

Add SwymClient::compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>>:

  • GET {base_url}/paper-runs/compare?ids={comma_separated}
  • Parse JSON array response using parse_number() for decimal strings

4. Enrich BacktestResult with new fields (src/swym.rs)

Add to BacktestResult: sortino_ratio, calmar_ratio, max_drawdown, pnl_return, avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs.

Parse all in from_response() via existing parse_number().

Update summary_line() to include max_dd={:.1}% and sortino={:.2} when present — these two are the most useful additions for the model's reasoning.

5. Prior-results-aware initial prompt (src/prompts.rs)

Modify initial_prompt() to accept prior_summary: Option<&str>.

When present, insert before the "Design a trading strategy" instruction:

## Learnings from {N} prior backtests across {M} strategies

{top 5 strategies ranked by avg sharpe, each showing:}
- Interval, rule count, avg metrics across instruments
- One-line description of the strategy approach (extracted from rule comments)
- Full strategy JSON for the top 1-2

{compact table of all prior strategies' avg metrics}

Use these insights to avoid repeating failed approaches and to build on what worked.

Limit to ~2000 tokens of prior context to avoid crowding the prompt. If many prior runs, show only the top 5 + bottom 3 (worst performers to avoid), plus a count of total runs.

6. Ledger entry struct (src/agent.rs)

#[derive(Serialize, Deserialize)]
struct LedgerEntry {
    run_id: Uuid,
    instrument: String,
    candle_interval: String,
    strategy: Value,
    timestamp: String,
}

Files to modify

  • src/swym.rsRunMetricsSummary struct, compare_runs() method, enrich BacktestResult with new fields, update summary_line()
  • src/agent.rsLedgerEntry struct, append-to-ledger after backtest, load-ledger-on-startup, call compare endpoint, build prior summary, pass to initial prompt
  • src/prompts.rsinitial_prompt() accepts optional prior summary

Verification

  1. cargo build --release
  2. Run once → confirm run_ledger.jsonl is created with entries
  3. Run again → confirm:
    • Ledger is loaded, compare endpoint is called
    • Iteration 1 prompt includes prior results summary (visible at debug log level)
    • New entries are appended (not overwritten)
  4. Check that enriched metrics (sortino, max_drawdown) appear in summary_line output