4.9 KiB
Plan: Cross-run learning via run ledger and compare endpoint
Context
Scout currently starts from scratch every run — no memory of prior iterations. The upstream
patch e47c18 adds:
- Enriched
result_summary: sortino_ratio, calmar_ratio, max_drawdown, pnl_return, avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs - Compare endpoint:
GET /api/v1/paper-runs/compare?ids=uuid1,uuid2,...returnsRunMetricsSummaryfor up to 50 runs in one call
Goal: persist enough state across runs so that iteration 1 of a new run starts informed by all previous runs' strategies and outcomes.
Changes
1. Run ledger — persist strategy + run_id per backtest (src/agent.rs)
After each successful run_single_backtest, append a JSONL entry to {output_dir}/run_ledger.jsonl:
{"run_id":"uuid","instrument":"BTCUSDC","candle_interval":"4h","strategy":{...},"timestamp":"2026-03-10T12:38:15Z"}
One line per instrument-backtest (3 per iteration for 3 instruments). The strategy JSON is duplicated across instrument entries for the same iteration — this keeps the format flat and self-contained.
Use OpenOptions::append(true).create(true) — no locking needed since scout is single-threaded.
2. Load prior runs on startup (src/agent.rs)
At the top of run(), before the iteration loop:
- Read
run_ledger.jsonlif it exists (ignore if missing — first run) - Collect all
run_ids - Call
swym.compare_runs(&run_ids)(batching in groups of 50) - Join metrics back to strategies from the ledger
- Group by strategy (entries with the same strategy JSON share an iteration)
- Rank by average sharpe across instruments
- Build a
prior_results_summary: Option<String>for the initial prompt
3. Compare endpoint client (src/swym.rs)
Add RunMetricsSummary struct:
pub struct RunMetricsSummary {
pub id: Uuid,
pub status: String,
pub candle_interval: Option<String>,
pub total_positions: Option<u32>,
pub win_rate: Option<f64>,
pub profit_factor: Option<f64>,
pub net_pnl: Option<f64>,
pub sharpe_ratio: Option<f64>,
pub sortino_ratio: Option<f64>,
pub calmar_ratio: Option<f64>,
pub max_drawdown: Option<f64>,
pub pnl_return: Option<f64>,
pub avg_win: Option<f64>,
pub avg_loss: Option<f64>,
pub max_win: Option<f64>,
pub max_loss: Option<f64>,
pub avg_hold_duration_secs: Option<f64>,
}
Add SwymClient::compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>>:
GET {base_url}/paper-runs/compare?ids={comma_separated}- Parse JSON array response using
parse_number()for decimal strings
4. Enrich BacktestResult with new fields (src/swym.rs)
Add to BacktestResult: sortino_ratio, calmar_ratio, max_drawdown, pnl_return,
avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs.
Parse all in from_response() via existing parse_number().
Update summary_line() to include max_dd={:.1}% and sortino={:.2} when present —
these two are the most useful additions for the model's reasoning.
5. Prior-results-aware initial prompt (src/prompts.rs)
Modify initial_prompt() to accept prior_summary: Option<&str>.
When present, insert before the "Design a trading strategy" instruction:
## Learnings from {N} prior backtests across {M} strategies
{top 5 strategies ranked by avg sharpe, each showing:}
- Interval, rule count, avg metrics across instruments
- One-line description of the strategy approach (extracted from rule comments)
- Full strategy JSON for the top 1-2
{compact table of all prior strategies' avg metrics}
Use these insights to avoid repeating failed approaches and to build on what worked.
Limit to ~2000 tokens of prior context to avoid crowding the prompt. If many prior runs, show only the top 5 + bottom 3 (worst performers to avoid), plus a count of total runs.
6. Ledger entry struct (src/agent.rs)
#[derive(Serialize, Deserialize)]
struct LedgerEntry {
run_id: Uuid,
instrument: String,
candle_interval: String,
strategy: Value,
timestamp: String,
}
Files to modify
src/swym.rs—RunMetricsSummarystruct,compare_runs()method, enrichBacktestResultwith new fields, updatesummary_line()src/agent.rs—LedgerEntrystruct, append-to-ledger after backtest, load-ledger-on-startup, call compare endpoint, build prior summary, pass to initial promptsrc/prompts.rs—initial_prompt()accepts optional prior summary
Verification
cargo build --release- Run once → confirm
run_ledger.jsonlis created with entries - Run again → confirm:
- Ledger is loaded, compare endpoint is called
- Iteration 1 prompt includes prior results summary (visible at debug log level)
- New entries are appended (not overwritten)
- Check that enriched metrics (sortino, max_drawdown) appear in summary_line output