docs: cross-run learnings plan
This commit is contained in:
133
docs/plan/cross-run-learning.md
Normal file
133
docs/plan/cross-run-learning.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# Plan: Cross-run learning via run ledger and compare endpoint
|
||||
|
||||
## Context
|
||||
|
||||
Scout currently starts from scratch every run — no memory of prior iterations. The upstream
|
||||
patch `e47c18` adds:
|
||||
1. **Enriched `result_summary`**: sortino_ratio, calmar_ratio, max_drawdown, pnl_return,
|
||||
avg_win, avg_loss, max_win, max_loss, avg_hold_duration_secs
|
||||
2. **Compare endpoint**: `GET /api/v1/paper-runs/compare?ids=uuid1,uuid2,...` returns
|
||||
`RunMetricsSummary` for up to 50 runs in one call
|
||||
|
||||
Goal: persist enough state across runs so that iteration 1 of a new run starts informed by
|
||||
all previous runs' strategies and outcomes.
|
||||
|
||||
## Changes
|
||||
|
||||
### 1. Run ledger — persist strategy + run_id per backtest (`src/agent.rs`)
|
||||
|
||||
After each successful `run_single_backtest`, append a JSONL entry to `{output_dir}/run_ledger.jsonl`:
|
||||
|
||||
```json
|
||||
{"run_id":"uuid","instrument":"BTCUSDC","candle_interval":"4h","strategy":{...},"timestamp":"2026-03-10T12:38:15Z"}
|
||||
```
|
||||
|
||||
One line per instrument-backtest (3 per iteration for 3 instruments). The strategy JSON is
|
||||
duplicated across instrument entries for the same iteration — this keeps the format flat and
|
||||
self-contained.
|
||||
|
||||
Use `OpenOptions::append(true).create(true)` — no locking needed since scout is single-threaded.
|
||||
|
||||
### 2. Load prior runs on startup (`src/agent.rs`)
|
||||
|
||||
At the top of `run()`, before the iteration loop:
|
||||
1. Read `run_ledger.jsonl` if it exists (ignore if missing — first run)
|
||||
2. Collect all `run_id`s
|
||||
3. Call `swym.compare_runs(&run_ids)` (batching in groups of 50)
|
||||
4. Join metrics back to strategies from the ledger
|
||||
5. Group by strategy (entries with the same strategy JSON share an iteration)
|
||||
6. Rank by average sharpe across instruments
|
||||
7. Build a `prior_results_summary: Option<String>` for the initial prompt
|
||||
|
||||
### 3. Compare endpoint client (`src/swym.rs`)
|
||||
|
||||
Add `RunMetricsSummary` struct:
|
||||
|
||||
```rust
|
||||
pub struct RunMetricsSummary {
|
||||
pub id: Uuid,
|
||||
pub status: String,
|
||||
pub candle_interval: Option<String>,
|
||||
pub total_positions: Option<u32>,
|
||||
pub win_rate: Option<f64>,
|
||||
pub profit_factor: Option<f64>,
|
||||
pub net_pnl: Option<f64>,
|
||||
pub sharpe_ratio: Option<f64>,
|
||||
pub sortino_ratio: Option<f64>,
|
||||
pub calmar_ratio: Option<f64>,
|
||||
pub max_drawdown: Option<f64>,
|
||||
pub pnl_return: Option<f64>,
|
||||
pub avg_win: Option<f64>,
|
||||
pub avg_loss: Option<f64>,
|
||||
pub max_win: Option<f64>,
|
||||
pub max_loss: Option<f64>,
|
||||
pub avg_hold_duration_secs: Option<f64>,
|
||||
}
|
||||
```
|
||||
|
||||
Add `SwymClient::compare_runs(&self, run_ids: &[Uuid]) -> Result<Vec<RunMetricsSummary>>`:
|
||||
- `GET {base_url}/paper-runs/compare?ids={comma_separated}`
|
||||
- Parse JSON array response using `parse_number()` for decimal strings
|
||||
|
||||
### 4. Enrich `BacktestResult` with new fields (`src/swym.rs`)
|
||||
|
||||
Add to `BacktestResult`: `sortino_ratio`, `calmar_ratio`, `max_drawdown`, `pnl_return`,
|
||||
`avg_win`, `avg_loss`, `max_win`, `max_loss`, `avg_hold_duration_secs`.
|
||||
|
||||
Parse all in `from_response()` via existing `parse_number()`.
|
||||
|
||||
Update `summary_line()` to include `max_dd={:.1}%` and `sortino={:.2}` when present —
|
||||
these two are the most useful additions for the model's reasoning.
|
||||
|
||||
### 5. Prior-results-aware initial prompt (`src/prompts.rs`)
|
||||
|
||||
Modify `initial_prompt()` to accept `prior_summary: Option<&str>`.
|
||||
|
||||
When present, insert before the "Design a trading strategy" instruction:
|
||||
|
||||
```
|
||||
## Learnings from {N} prior backtests across {M} strategies
|
||||
|
||||
{top 5 strategies ranked by avg sharpe, each showing:}
|
||||
- Interval, rule count, avg metrics across instruments
|
||||
- One-line description of the strategy approach (extracted from rule comments)
|
||||
- Full strategy JSON for the top 1-2
|
||||
|
||||
{compact table of all prior strategies' avg metrics}
|
||||
|
||||
Use these insights to avoid repeating failed approaches and to build on what worked.
|
||||
```
|
||||
|
||||
Limit to ~2000 tokens of prior context to avoid crowding the prompt. If many prior runs,
|
||||
show only the top 5 + bottom 3 (worst performers to avoid), plus a count of total runs.
|
||||
|
||||
### 6. Ledger entry struct (`src/agent.rs`)
|
||||
|
||||
```rust
|
||||
#[derive(Serialize, Deserialize)]
|
||||
struct LedgerEntry {
|
||||
run_id: Uuid,
|
||||
instrument: String,
|
||||
candle_interval: String,
|
||||
strategy: Value,
|
||||
timestamp: String,
|
||||
}
|
||||
```
|
||||
|
||||
## Files to modify
|
||||
|
||||
- `src/swym.rs` — `RunMetricsSummary` struct, `compare_runs()` method, enrich `BacktestResult`
|
||||
with new fields, update `summary_line()`
|
||||
- `src/agent.rs` — `LedgerEntry` struct, append-to-ledger after backtest, load-ledger-on-startup,
|
||||
call compare endpoint, build prior summary, pass to initial prompt
|
||||
- `src/prompts.rs` — `initial_prompt()` accepts optional prior summary
|
||||
|
||||
## Verification
|
||||
|
||||
1. `cargo build --release`
|
||||
2. Run once → confirm `run_ledger.jsonl` is created with entries
|
||||
3. Run again → confirm:
|
||||
- Ledger is loaded, compare endpoint is called
|
||||
- Iteration 1 prompt includes prior results summary (visible at debug log level)
|
||||
- New entries are appended (not overwritten)
|
||||
4. Check that enriched metrics (sortino, max_drawdown) appear in summary_line output
|
||||
Reference in New Issue
Block a user