Three improvements from the 2026-03-09T18:45:04 run analysis:
**R1 thinking visibility (claude.rs, agent.rs)**
extract_think_content() returns the raw <think> block content before it
is stripped. agent.rs logs it at DEBUG level so 'RUST_LOG=debug' lets
you see why the model keeps repeating a mistake — currently the think
block is silently discarded after stripping.
**Prompt: unsupported indicators and bollinger_upper Expr mistake (prompts.rs)**
- bollinger_upper / bollinger_lower used as {"kind":"bollinger_upper",...}
was the dominant failure in iters 9-15. Added explicit correction:
use {"kind":"func","name":"bollinger_upper","period":20} in Expr context,
never as a standalone kind.
- roc, hma, vwap, macd, cci, stoch are NOT in the swym schema. Added a
clear "NOT supported" list alongside the supported func names.
**Repeated API error detection in diagnose_history (agent.rs)**
If the same "unknown variant `X`" error appears 2+ times in the last 4
iterations, a targeted diagnosis note is emitted naming the bad variant
and pointing to the DSL reference. This surfaces in the next iteration
prompt so the model gets actionable feedback before it wastes another
backtest budget on the same mistake.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
scout
Autonomous strategy search agent for the swym backtesting platform.
Runs a loop: asks Claude to generate trading strategies → submits backtests to swym → evaluates results → feeds learnings back → repeats. Promising strategies are automatically validated on out-of-sample data to filter overfitting.
Quick start
export ANTHROPIC_API_KEY="sk-ant-..."
cargo run -- \
--swym-url https://dev.swym.hanzalova.internal/api/v1 \
--max-iterations 50 \
--instruments binance_spot:BTCUSDC,binance_spot:ETHUSDC,binance_spot:SOLUSDC \
--backtest-from 2025-01-01T00:00:00Z \
--backtest-to 2025-10-01T00:00:00Z \
--oos-from 2025-10-01T00:00:00Z \
--oos-to 2026-03-01T00:00:00Z
How it works
-
Coverage check — verifies candle data exists for all instruments and finds common available intervals.
-
Strategy generation — sends the DSL schema + prior results to Claude, which produces a new strategy JSON each iteration.
-
In-sample backtest — submits the strategy against all instruments for the training period. Evaluates Sharpe ratio, profit factor, win rate, net PnL.
-
Out-of-sample validation — if any instrument shows Sharpe > threshold with enough trades, the strategy is re-tested on held-out data. Only strategies that pass both phases are saved as "validated".
-
Learning loop — all results (including failures) are fed back to Claude so it can learn from what works and what doesn't. The conversation is trimmed to avoid context exhaustion while the full results history is passed as structured text.
Configuration
All options are available as CLI flags and environment variables:
| Flag | Env | Default | Description |
|---|---|---|---|
--swym-url |
SWYM_API_URL |
https://dev.swym.hanzalova.internal/api/v1 |
Swym API base URL |
--anthropic-key |
ANTHROPIC_API_KEY |
required | Anthropic API key |
--model |
CLAUDE_MODEL |
claude-sonnet-4-20250514 |
Claude model |
--max-iterations |
50 |
Maximum search iterations | |
--min-sharpe |
1.0 |
Minimum Sharpe for "promising" | |
--min-trades |
10 |
Minimum trades for significance | |
--instruments |
BTC,ETH,SOL vs USDC | Comma-separated exchange:SYMBOL | |
--backtest-from |
2025-01-01 |
In-sample start | |
--backtest-to |
2025-10-01 |
In-sample end | |
--oos-from |
2025-10-01 |
Out-of-sample start | |
--oos-to |
2026-03-01 |
Out-of-sample end | |
--initial-balance |
10000 |
Starting USDC balance | |
--fees-percent |
0.001 |
Fee per trade (0.1%) | |
--output-dir |
./scout-results |
Where to save strategies and reports |
Output
scout-results/
├── strategy_001.json # Every strategy attempted
├── strategy_002.json
├── ...
├── validated_017.json # Strategies that passed OOS validation
├── validated_031.json # (includes in-sample + OOS metrics)
└── best_strategy.json # Highest avg Sharpe across instruments
Tips
-
Start with Sonnet (
claude-sonnet-4-20250514) for cost efficiency during exploration. Switch to Opus for refinement of promising strategies. -
50 iterations is a reasonable starting point. The agent typically finds interesting patterns within 20-30 iterations if they exist.
-
Watch the logs — the per-iteration summaries show you what the agent is learning in real time.
-
Adjust dates to match your actual candle coverage. The agent checks coverage at startup and will fail fast if data is missing.
-
The OOS validation threshold is intentionally relaxed (70% of in-sample Sharpe, half the trade count) because out-of-sample degradation is expected. Strategies that maintain edge through this filter are genuinely interesting.