134 lines
4.9 KiB
Markdown
134 lines
4.9 KiB
Markdown
# Plan: Cross-run learning via run ledger and compare endpoint
|
|
|
|
## Context
|
|
|
|
Scout currently starts from scratch every run — no memory of prior iterations. The upstream
|
|
patch `e47c18` adds:
|
|
1. **Enriched `result_summary`**: sortino_ratio, calmar_ratio, max_drawdown, pnl_return,
|
|
avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs
|
|
2. **Compare endpoint**: `GET /api/v1/paper-runs/compare?ids=uuid1,uuid2,...` returns
|
|
`RunMetricsSummary` for up to 50 runs in one call
|
|
|
|
Goal: persist enough state across runs so that iteration 1 of a new run starts informed by
|
|
all previous runs' strategies and outcomes.
|
|
|
|
## Changes
|
|
|
|
### 1. Run ledger — persist strategy + run_id per backtest (`src/agent.rs`)
|
|
|
|
After each successful `run_single_backtest`, append a JSONL entry to `{output_dir}/run_ledger.jsonl`:
|
|
|
|
```json
|
|
{"run_id":"uuid","instrument":"BTCUSDC","candle_interval":"4h","strategy":{...},"timestamp":"2026-03-10T12:38:15Z"}
|
|
```
|
|
|
|
One line per instrument-backtest (3 per iteration for 3 instruments). The strategy JSON is
|
|
duplicated across instrument entries for the same iteration — this keeps the format flat and
|
|
self-contained.
|
|
|
|
Use `OpenOptions::append(true).create(true)` — no locking needed since scout is single-threaded.
|
|
|
|
### 2. Load prior runs on startup (`src/agent.rs`)
|
|
|
|
At the top of `run()`, before the iteration loop:
|
|
1. Read `run_ledger.jsonl` if it exists (ignore if missing — first run)
|
|
2. Collect all `run_id`s
|
|
3. Call `swym.compare_runs(&run_ids)` (batching in groups of 50)
|
|
4. Join metrics back to strategies from the ledger
|
|
5. Group by strategy (entries with the same strategy JSON share an iteration)
|
|
6. Rank by average sharpe across instruments
|
|
7. Build a `prior_results_summary: Option<String>` for the initial prompt
|
|
|
|
### 3. Compare endpoint client (`src/swym.rs`)
|
|
|
|
Add `RunMetricsSummary` struct:
|
|
|
|
```rust
|
|
pub struct RunMetricsSummary {
|
|
pub id: Uuid,
|
|
pub status: String,
|
|
pub candle_interval: Option<String>,
|
|
pub total_positions: Option<u32>,
|
|
pub win_rate: Option<f64>,
|
|
pub profit_factor: Option<f64>,
|
|
pub net_pnl: Option<f64>,
|
|
pub sharpe_ratio: Option<f64>,
|
|
pub sortino_ratio: Option<f64>,
|
|
pub calmar_ratio: Option<f64>,
|
|
pub max_drawdown: Option<f64>,
|
|
pub pnl_return: Option<f64>,
|
|
pub avg_win: Option<f64>,
|
|
pub avg_loss: Option<f64>,
|
|
pub max_win: Option<f64>,
|
|
pub max_loss: Option<f64>,
|
|
pub avg_hold_duration_secs: Option<f64>,
|
|
}
|
|
```
|
|
|
|
Add `SwymClient::compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>>`:
|
|
- `GET {base_url}/paper-runs/compare?ids={comma_separated}`
|
|
- Parse JSON array response using `parse_number()` for decimal strings
|
|
|
|
### 4. Enrich `BacktestResult` with new fields (`src/swym.rs`)
|
|
|
|
Add to `BacktestResult`: `sortino_ratio`, `calmar_ratio`, `max_drawdown`, `pnl_return`,
|
|
`avg_win`, `avg_loss`, `max_win`, `max_loss`, `avg_hold_duration_secs`.
|
|
|
|
Parse all in `from_response()` via existing `parse_number()`.
|
|
|
|
Update `summary_line()` to include `max_dd={:.1}%` and `sortino={:.2}` when present —
|
|
these two are the most useful additions for the model's reasoning.
|
|
|
|
### 5. Prior-results-aware initial prompt (`src/prompts.rs`)
|
|
|
|
Modify `initial_prompt()` to accept `prior_summary: Option<&str>`.
|
|
|
|
When present, insert before the "Design a trading strategy" instruction:
|
|
|
|
```
|
|
## Learnings from {N} prior backtests across {M} strategies
|
|
|
|
{top 5 strategies ranked by avg sharpe, each showing:}
|
|
- Interval, rule count, avg metrics across instruments
|
|
- One-line description of the strategy approach (extracted from rule comments)
|
|
- Full strategy JSON for the top 1-2
|
|
|
|
{compact table of all prior strategies' avg metrics}
|
|
|
|
Use these insights to avoid repeating failed approaches and to build on what worked.
|
|
```
|
|
|
|
Limit to ~2000 tokens of prior context to avoid crowding the prompt. If many prior runs,
|
|
show only the top 5 + bottom 3 (worst performers to avoid), plus a count of total runs.
|
|
|
|
### 6. Ledger entry struct (`src/agent.rs`)
|
|
|
|
```rust
|
|
#[derive(Serialize, Deserialize)]
|
|
struct LedgerEntry {
|
|
run_id: Uuid,
|
|
instrument: String,
|
|
candle_interval: String,
|
|
strategy: Value,
|
|
timestamp: String,
|
|
}
|
|
```
|
|
|
|
## Files to modify
|
|
|
|
- `src/swym.rs` — `RunMetricsSummary` struct, `compare_runs()` method, enrich `BacktestResult`
|
|
with new fields, update `summary_line()`
|
|
- `src/agent.rs` — `LedgerEntry` struct, append-to-ledger after backtest, load-ledger-on-startup,
|
|
call compare endpoint, build prior summary, pass to initial prompt
|
|
- `src/prompts.rs` — `initial_prompt()` accepts optional prior summary
|
|
|
|
## Verification
|
|
|
|
1. `cargo build --release`
|
|
2. Run once → confirm `run_ledger.jsonl` is created with entries
|
|
3. Run again → confirm:
|
|
- Ledger is loaded, compare endpoint is called
|
|
- Iteration 1 prompt includes prior results summary (visible at debug log level)
|
|
- New entries are appended (not overwritten)
|
|
4. Check that enriched metrics (sortino, max_drawdown) appear in summary_line output
|