scout

swym/scout

Author	SHA1	Message	Date
rob thijssen	ee260ea4d5	fix: parse flat result_summary structure per updated API doc The API result_summary is a flat object with top-level fields (total_positions, win_rate, profit_factor, net_pnl, sharpe_ratio, etc.) not a nested backtest_metadata/instruments map. This was causing all metrics to parse as None/zero for every completed run. - Rewrite BacktestResult::from_response() to read flat fields directly - Replace parse_ratio_value/parse_decimal_str with a single parse_number() that accepts both JSON numbers and decimal strings - Populate winning_positions, losing_positions, total_fees, avg_bars_in_trade (previously always None) - Simplify from_response signature — exchange/base/quote no longer needed - Add expected_count and coverage_pct to CandleCoverage struct - Update all example sell rules to use position_quantity instead of "0.01" - Note that "9999" is a valid sell-all alias (auto-capped by the API) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 09:37:55 +02:00
rob thijssen	3f8d4de7fb	feat: add declarative SizingMethod types from upstream schema Upstream added three new quantity sizing objects alongside DecimalString and Expr: - fixed_sum: buy N quote-currency worth at current price - percent_of_balance: buy N% of named asset's free balance - fixed_units: buy exactly N base units (semantic alias for decimal string) Update dsl-schema.json to include the three definitions and expand Action.quantity.oneOf to reference all five valid forms. Update prompts.rs Quantity section to present the declarative methods as the preferred approach — they're cleaner, more readable, and instrument-agnostic compared to raw Expr composition. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 09:33:43 +02:00
rob thijssen	7e1ff51ae0	feat: validate endpoint integration, Expr quantity sizing, apply_func input field fix - Add /api/v1/strategies/validate client to SwymClient; wire into agent loop before submission so all DSL errors are surfaced in one round-trip - Update dsl-schema.json to upstream: quantity is now oneOf[DecimalString, Expr], ExprApplyFunc uses "input" field (renamed from "expr") - Update prompts: document expression-based quantity sizing (fixed-fraction and ATR-based examples), fix apply_func to use "input" not "expr" throughout - Remove unused ValidationError import Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 09:12:12 +02:00
rob thijssen	5146b3f764	fix: replace negligible 0.001 quantity with meaningful sizing guidance The previous example quantity "0.001" represented <1% of the $10k initial balance for BTC and near-zero exposure for ETH/SOL, making P&L and Sharpe results statistically meaningless. - Update Quantity section with instrument-appropriate reference values (BTC: 0.01 ≈ $800, ETH: 3.0 ≈ $600, SOL: 50.0 ≈ $700) - Replace "0.001" with "0.01" in all four working examples - Explain that 5–10% of $10k initial balance is the sizing target - Explicitly warn against "0.001" as it produces negligible exposure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 07:41:28 +02:00
rob thijssen	759439313e	fix: two Bollinger Band DSL errors from 50-iteration run - bollinger_upper/lower func Exprs must NOT include a "field" parameter; they compute from close internally. Setting "field":"bollinger_upper" causes API rejection: expected one of open/high/low/close/volume. - bollinger Condition "band" only accepts "above_upper" or "below_lower"; "above_lower" and "below_upper" are invalid variants. Both errors appeared repeatedly across the 50-iteration run, causing failed backtest submissions on every Bollinger crossover strategy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 07:39:09 +02:00
rob thijssen	9a7761b452	fix: add hma/ma to unsupported list, clarify quantity exit semantics - Add `hma` (Hull MA) and generic `ma` to unsupported func names — both were used by R1 and rejected by the API - Note that Hull MA can be approximated via apply_func with wma - Add `"all"` to the quantity placeholder blacklist; explain that exit rules must repeat the entry decimal — there is no "close all" concept Observed in run 2026-03-09T20:10:55: 2 iterations failed on hma/ma, 3 iterations skipped by client-side validation on quantity="all". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 20:23:30 +02:00
rob thijssen	8d53d6383d	fix: correct DSL mistakes from observed R1 failures - ADX: clarify it is a FuncName inside {"kind":"func","name":"adx",...}, not a Condition kind — with inline usage example (ADX > 25 filter) - Expr "kind" field: add explicit note that every Expr object requires "kind"; {"field":"close"} without "kind" is rejected by the API - MACD: add Example 4 showing full crossover strategy composed from bin_op(sub, ema12, ema26) and apply_func(ema,9) as signal line All three mistakes were observed across consecutive R1-32B runs and caused repeated API submission failures. Each prompt addition follows the same pattern as the successful bollinger_upper fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 20:11:05 +02:00
rob thijssen	55e41b6795	fix: log R1 thinking, catch repeated DSL errors, add unsupported indicators Three improvements from the 2026-03-09T18:45:04 run analysis: R1 thinking visibility (claude.rs, agent.rs) extract_think_content() returns the raw <think> block content before it is stripped. agent.rs logs it at DEBUG level so 'RUST_LOG=debug' lets you see why the model keeps repeating a mistake — currently the think block is silently discarded after stripping. Prompt: unsupported indicators and bollinger_upper Expr mistake (prompts.rs) - bollinger_upper / bollinger_lower used as {"kind":"bollinger_upper",...} was the dominant failure in iters 9-15. Added explicit correction: use {"kind":"func","name":"bollinger_upper","period":20} in Expr context, never as a standalone kind. - roc, hma, vwap, macd, cci, stoch are NOT in the swym schema. Added a clear "NOT supported" list alongside the supported func names. Repeated API error detection in diagnose_history (agent.rs) If the same "unknown variant `X`" error appears 2+ times in the last 4 iterations, a targeted diagnosis note is emitted naming the bad variant and pointing to the DSL reference. This surfaces in the next iteration prompt so the model gets actionable feedback before it wastes another backtest budget on the same mistake. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 18:58:50 +02:00
rob thijssen	51e452b607	feat: discover max_output_tokens from server at startup Instead of hardcoding per-family token budgets, ClaudeClient queries the server at startup and sets max_output_tokens = context_length / 2. Two discovery strategies, tried in order: 1. LM Studio /api/v1/models — returns loaded_instances[].config.context_length (the actually-configured context, e.g. 64000) and max_context_length (theoretical max, e.g. 131072). We prefer the loaded value. 2. OpenAI-compat /v1/models/{id} — used as fallback for non-LM Studio backends that expose context_length on the model object. If both fail, the family default is kept (DeepSeekR1=32768, Generic=8192). lmstudio_context_length() matches model IDs with and without quantization suffixes (@q4_k_m etc.) so the --model flag doesn't need to be exact. For the current R1-32B setup: loaded context=64000 → max_output_tokens=32000, giving the thinking pass plenty of room while reserving half for input. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 18:44:41 +02:00
rob thijssen	89f7ba66e0	feat: model-family-aware token budgets and prompt style Add ModelFamily enum (config.rs) detected from the model name: - DeepSeekR1: matched on "deepseek-r1", "r1-distill" — R1 thinking blocks consume thousands of output tokens before the JSON; max_output_tokens raised to 32768 and HTTP timeout to 300s; prompt tells the model its <think> output is stripped and only the bare JSON is used - Generic: previous behaviour (8192 tokens, 120s timeout) ClaudeClient stores the detected family and uses it for max_tokens and the request timeout. family() accessor lets the caller (agent.rs) pass it into system_prompt(). prompts::system_prompt() now accepts &ModelFamily and injects a family-specific "output format" section in place of the hardcoded "How to respond" block. New families can be added by extending the enum and the match arms without touching prompt logic elsewhere. Also: log full anyhow cause chain (:#) on JSON extraction failure and show response length alongside the truncated preview, to make future diagnosis easier. Root cause of the 2026-03-09T18:29:22 run failure: R1's thinking tokens counted against max_tokens:8192, leaving only ~500 chars for the actual JSON, which was always truncated mid-object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 18:39:51 +02:00
rob thijssen	6f4f864d28	fix: increase max_tokens to 8192 for R1 reasoning overhead R1 models use 500-2000 tokens for <think> blocks before the final response. 4096 was too tight — the model would exhaust the budget mid-thought and never emit the JSON. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 18:17:48 +02:00
rob thijssen	185cb4586e	fix: strip R1 think blocks before JSON extraction DeepSeek-R1 models emit <think>...</think> before their actual response. The brace-counting extractor would grab the first { inside the thinking block (which contains partial JSON fragments) rather than the final strategy JSON. strip_think_blocks() removes all <think>...</think> sections including unterminated blocks (truncated responses), leaving only the final output for extract_json to process. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 18:17:06 +02:00
rob thijssen	b947f48b01	feat: client-side validation, cycling detection, quantity prompt fix - validate_strategy(): hard error if quantity is not a parseable decimal (catches "ATR_SIZED" etc. before sending to swym API); soft warning if a sell rule has no entry_price stop-loss or no bars_since_entry time exit - Hard validation errors skip the backtest and feed errors back to the LLM via IterationRecord.validation_notes included in summary() - json_contains_kind(): recursive helper to search strategy JSON tree - diagnose_history(): add cycling detection — triggers is_converged when any avg_sharpe value appears 3+ times in history (not just last 3 streak), catching the alternating RSI<30 / RSI<25 pattern seen in the latest run - prompts: clarify that quantity must parse as a float; list invalid placeholder strings ("ATR_SIZED", "FULL_BALANCE", "dynamic", etc.) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 17:56:59 +02:00
rob thijssen	e27aabae34	feat(agent): improve LLM feedback loop and convergence detection Three related improvements to help the model learn and explore effectively: Strategy JSON in history: include the compact strategy JSON in each IterationRecord::summary() so the LLM knows exactly what was tested in every past iteration, not just the outcome metrics. Without this the model had no record of what it tried once conversation history was trimmed. Rule comment in audit: include rule_comment from the condition audit in the formatted audit string so the LLM can correlate hit-rate data with the rule's stated purpose. Convergence detection and anti-anchoring: diagnose_history() now returns (String, bool) where the bool signals that the last 3 iterations had avg_sharpe spread < 0.03 (model stuck in local optimum). When converged: - Emit a ⚠ CONVERGENCE DETECTED note listing untried candle intervals - Suppress best_so_far JSON to break the anchoring effect that was causing the model to produce near-identical strategies for 13+ iterations - Targeted "try a different approach" instruction Also add volume-as-field clarification to the DSL mistakes section in the system prompt, fixing the "unknown variant `volume`" submit error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 14:38:07 +02:00
rob thijssen	fb1145acae	fix(swym): parse result_summary from actual API response structure The swym API response structure differs from what the code previously assumed. Fix all field extraction to match the real shape: - total_positions: backtest_metadata.position_count (not top-level) - sharpe_ratio, win_rate, profit_factor: instruments.{key}.{field}.value wrapped decimal strings (not plain floats); treat Decimal::MAX sentinel (~7.9e28) as None - net_pnl: instruments.{key}.pnl (plain decimal string) - instrument key derived as "{exchange_no_underscores}-{base}_{quote}" Also fix coverage-based backtest_from clamping: after the coverage check, compute the effective backtest start as the max first_open across all instruments × common intervals, so strategies never fail with "requested range outside available data". Log per-interval date ranges for each instrument at startup. Additionally: - Compact format_audit_summary to handle {"rules":[...],"total_bars":N} structure with per-condition true_count/evaluated breakdown - Drop avg_bars from summary_line (field absent from API) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 14:22:29 +02:00
rob thijssen	c7a2d65539	fix(prompts): forbid dynamic quantity expressions, require plain decimal string The model was generating Expr objects for quantity (e.g. ATR-based sizing), causing consistent QuantitySpec deserialization failures. Replace the "prefer dynamic sizing" hint with an explicit rule: quantity must always be a fixed decimal string like "0.001". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 13:11:40 +02:00
rob thijssen	292c101859	docs(prompts): add DSL expression kind reference and three working examples Shows correct usage of rsi/bollinger/ema_trend condition shortcuts, entry_price and bars_since_entry ExprKind values, and func/cross_over/bin_op expressions. Also calls out common model mistakes (rsi as ExprKind, bars_since_entry as FuncName, expr_field) and adds a note that spot strategies are long-only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 13:09:01 +02:00
rob thijssen	fc9b7e094a	feat(agent): add strategy quality introspection Log full strategy JSON at debug level, show full anyhow cause chain on submit failures, surface condition_audit_summary for 0-trade results in both logs and the summary fed back to the AI each iteration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 12:58:49 +02:00
rob thijssen	deb28f6714	chore: local defaults	2026-03-09 12:24:30 +02:00
rob thijssen	b7aa458e40	feat(claude): add configurable API base URL via --anthropic-url Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 10:28:44 +02:00
rob thijssen	934566879e	chore: init	2026-03-09 10:15:33 +02:00

21 Commits