feat: gate candle backtests on 95% coverage with actionable diagnostics

Rejects backtest submissions where the requested date range has fewer than 95% of the expected candles, rather than silently queuing a run against sparse data. The 400 error includes actual vs expected counts, coverage percentage, and an ingestion status hint derived from the per-interval candle cursor (caught up / lagging / never ingested). Also enriches GET /api/v1/market-candles/coverage with expected_count and coverage_pct fields so callers can pre-check readiness before submitting a backtest. Documents the full incomplete-data workflow in docs/api.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 09:30:00 +02:00
parent ad6b38cb4e
commit 3d41574fab
3 changed files with 170 additions and 13 deletions
--- a/docs/api.md
+++ b/docs/api.md
@@ -47,10 +47,12 @@ The standard iteration loop for developing a profitable strategy:

 ```
 1. Create an ingest config          → historical trade data flows in via ingest-binance
-2. Backfill candles                 → aggregate trades into OHLCV bars at desired intervals
-3. Check data coverage              → confirm the date range you want to backtest is available
+2. Backfill candles                 → POST /api/v1/market-candles/backfill per interval
+3. Check data coverage              → GET /api/v1/market-candles/coverage/{exchange}/{symbol}
+                                       Verify coverage_pct ≥ 95% for your target date range
 4. Author a strategy                → POST /api/v1/strategies  (optional, but enables grouping)
 5. Submit a backtest                → POST /api/v1/paper-runs (mode: "backtest")
+                                       400 with coverage details if data is incomplete
 6. Poll until complete              → GET /api/v1/paper-runs/{id}
 7. Analyse result_summary           → trade stats, Sharpe ratio, win rate, etc.
 8. Download positions               → GET /api/v1/paper-runs/{id}/positions  (equity curve)
@@ -58,6 +60,30 @@ The standard iteration loop for developing a profitable strategy:
 10. Revise the strategy, repeat
 ```

+### Handling incomplete data
+
+The backtest submission endpoint enforces a **95% candle coverage** requirement. If fewer than 95%
+of the expected candles are present for the requested date range, the request is rejected with a
+`400 Bad Request` response explaining the shortfall and what to do:
+
+```json
+{
+  "error": "insufficient 1h candle data for BTCUSDT on binance_spot: 4380 of 8760 expected candles available (50.0% coverage, minimum 95%). Candle ingestion last reached 2025-07-01 — it may still be catching up. Retry later or trigger a backfill via POST /api/v1/market-candles/backfill."
+}
+```
+
+The error includes an ingestion status hint derived from the per-interval cursor:
+
+| Hint | Meaning | Action |
+|---|---|---|
+| "Candle ingestion appears up to date" | Cursor is current; data is genuinely sparse | Run `POST /api/v1/market-candles/backfill` for the gap period |
+| "Candle ingestion last reached {date}" | Cursor lags behind; worker is catching up | Wait and retry, or run a targeted backfill |
+| "No candle ingestion cursor found" | Interval has never been ingested by the worker | Run `POST /api/v1/market-candles/backfill` to populate via Binance REST API |
+
+Pre-check coverage before submitting a backtest using `GET /api/v1/market-candles/coverage/{exchange}/{symbol}`.
+The response now includes `expected_count` and `coverage_pct` fields so you can verify readiness
+without incurring a failed backtest submission.
+
 ---

 ## Data Preparation
@@ -216,19 +242,31 @@ Check which candle intervals are available and their date ranges.
    "interval": "1h",
    "first_open": "2025-01-01T00:00:00Z",
    "last_close": "2026-01-01T00:00:00Z",
-    "count": 8760
+    "count": 8755,
+    "expected_count": 8760,
+    "coverage_pct": 99.94
  },
  {
    "interval": "4h",
    "first_open": "2025-01-01T00:00:00Z",
    "last_close": "2026-01-01T00:00:00Z",
-    "count": 2190
+    "count": 1800,
+    "expected_count": 2190,
+    "coverage_pct": 82.19
  }
 ]
 ```

-Use this before submitting a backtest to confirm data is available for your chosen interval and
-date range.
+| Field | Description |
+|---|---|
+| `count` | Actual candle rows stored in the database |
+| `expected_count` | Expected rows based on interval duration across the available range |
+| `coverage_pct` | `count / expected_count × 100`, capped at 100. Values below 95 indicate gaps. |
+
+Use this before submitting a backtest to confirm data is complete for your chosen interval and
+date range. The backtest endpoint requires `coverage_pct ≥ 95` for the specific `[starts_at,
+finishes_at]` window; `coverage_pct` here is computed over the full available range, so a
+sub-range may be complete even if the overall coverage is lower.

 ---

@@ -520,10 +558,21 @@ Submit a new paper run (backtest or live).
 **Validation rules:**
 - `finishes_at` must be after `starts_at`
 - For `"backtest"`: `starts_at` must be in the past; data must exist for the instrument and interval; the range must fall within available data
+- For `"backtest"` with candles: the requested range must have **≥ 95% candle coverage** (actual count vs expected count derived from interval duration). Returns 400 with a diagnostic message and ingestion status hint if below threshold.
 - For `"live"`: `finishes_at` must be in the future; `candle_interval` must not be set
- For `RuleBased` strategies: all timeframes referenced by expressions must have available candle data
+- For `RuleBased` strategies: all timeframes referenced by expressions must have available candle data with ≥ 95% coverage
 - Raw-tick backtests are rejected if the date range contains more than 500,000,000 trades

+**Insufficient coverage response (400):**
+
+```json
+{
+  "error": "insufficient 1h candle data for BTCUSDT on binance_spot: 4380 of 8760 expected candles available (50.0% coverage, minimum 95%). Candle ingestion last reached 2025-07-01 — it may still be catching up. Retry later or trigger a backfill via POST /api/v1/market-candles/backfill."
+}
+```
+
+See [Handling incomplete data](#handling-incomplete-data) for the interpretation guide.
+
 **Response (201):** `PaperRunResponse`

 ---
--- a/services/api/src/handlers/market_candles.rs
+++ b/services/api/src/handlers/market_candles.rs
@@ -127,7 +127,12 @@ pub struct CoverageEntry {
    pub interval: String,
    pub first_open: DateTime<Utc>,
    pub last_close: DateTime<Utc>,
+    /// Actual number of candles stored in the database for this interval.
    pub count: i64,
+    /// Expected number of candles for the available range (derived from interval duration).
+    pub expected_count: i64,
+    /// Coverage as a percentage (0–100). Values below 95 indicate gaps in the data.
+    pub coverage_pct: f64,
 }

 pub async fn get_candle_coverage(
@@ -144,11 +149,24 @@ pub async fn get_candle_coverage(

    let entries = rows
        .into_iter()
-        .map(|(interval, first_open, last_close, count)| CoverageEntry {
-            interval,
-            first_open,
-            last_close,
-            count,
+        .map(|(interval, first_open, last_close, count)| {
+            let range_secs = (last_close - first_open).num_seconds().max(0) as u64;
+            let interval_secs =
+                swym_dal::models::strategy_config::parse_interval_secs(&interval).unwrap_or(1);
+            let expected_count = (range_secs / interval_secs) as i64;
+            let coverage_pct = if expected_count > 0 {
+                (count as f64 / expected_count as f64 * 100.0).min(100.0)
+            } else {
+                100.0
+            };
+            CoverageEntry {
+                interval,
+                first_open,
+                last_close,
+                count,
+                expected_count,
+                coverage_pct,
+            }
        })
        .collect();

--- a/services/api/src/handlers/paper_runs.rs
+++ b/services/api/src/handlers/paper_runs.rs
@@ -14,7 +14,7 @@ use swym_dal::models::paper_run::{PaperRunRow, PaperRunStatus};
 use swym_dal::models::paper_run_position::PaperRunPositionRow;
 use swym_dal::models::strategy_config::{StrategyConfig, collect_timeframes};
 use swym_dal::models::condition_audit::ConditionAuditRow;
-use swym_dal::repo::{condition_audit, instrument, market_event, paper_run, paper_run_position, strategy};
+use swym_dal::repo::{condition_audit, ingest_config, instrument, market_event, paper_run, paper_run_position, strategy};
 use swym_dal::strategy_hash::{compute_strategy_hash, normalize_strategy};

 // -- Request / Response types --
@@ -224,6 +224,16 @@ pub async fn create_paper_run(
                    )));
                }

+                validate_candle_completeness(
+                    &state.pool,
+                    instrument.id,
+                    &format!("{name_exchange} on {exchange_name}"),
+                    interval,
+                    req.starts_at,
+                    req.finishes_at,
+                )
+                .await?;
+
                // For rule-based strategies, also validate every additional timeframe
                // referenced by expressions in the rule tree.
                if let StrategyConfig::RuleBased(ref params) = run_config.strategy {
@@ -267,6 +277,16 @@ pub async fn create_paper_run(
                                data_end = tf_range.1,
                            )));
                        }
+
+                        validate_candle_completeness(
+                            &state.pool,
+                            instrument.id,
+                            &format!("{name_exchange} on {exchange_name}"),
+                            tf,
+                            req.starts_at,
+                            req.finishes_at,
+                        )
+                        .await?;
                    }
                }
            } else {
@@ -589,3 +609,73 @@ pub async fn list_paper_run_candles(
        candles,
    }))
 }
+
+// ---------------------------------------------------------------------------
+// Candle completeness validation
+// ---------------------------------------------------------------------------
+
+const MIN_CANDLE_COVERAGE: f64 = 0.95;
+
+/// Validate that candle coverage for `[from, to)` meets the minimum threshold.
+///
+/// Computes the expected candle count from the interval duration and compares
+/// it to the actual count in the database. Returns `Err(BadRequest)` with a
+/// diagnostic message (including an ingestion status hint) when coverage is
+/// below [`MIN_CANDLE_COVERAGE`].
+async fn validate_candle_completeness(
+    pool: &sqlx::PgPool,
+    instrument_id: i32,
+    instrument_label: &str,
+    interval: &str,
+    from: DateTime<Utc>,
+    to: DateTime<Utc>,
+) -> Result<(), ApiError> {
+    use swym_dal::models::strategy_config::parse_interval_secs;
+
+    let interval_secs = parse_interval_secs(interval)
+        .expect("interval already validated before this call");
+    let range_secs = (to - from).num_seconds().max(0) as u64;
+    let expected = (range_secs / interval_secs) as i64;
+
+    if expected == 0 {
+        return Ok(());
+    }
+
+    let actual = market_event::count_candles(pool, instrument_id, interval, from, to).await?;
+    let coverage = actual as f64 / expected as f64;
+
+    if coverage >= MIN_CANDLE_COVERAGE {
+        return Ok(());
+    }
+
+    // Build an ingestion status hint from the candle cursor.
+    let cursor = ingest_config::get_candle_cursor(pool, instrument_id, interval).await?;
+    let ingestion_hint = match cursor {
+        Some(date) => {
+            let yesterday = (Utc::now() - chrono::Duration::days(1)).date_naive();
+            if date >= yesterday {
+                "Candle ingestion appears up to date; the data may be genuinely sparse \
+                 for this period."
+                    .to_string()
+            } else {
+                format!(
+                    "Candle ingestion last reached {date}; it may still be catching up. \
+                     Retry later or trigger a backfill via POST /api/v1/market-candles/backfill."
+                )
+            }
+        }
+        None => {
+            "No candle ingestion cursor found for this interval. \
+             Trigger a backfill via POST /api/v1/market-candles/backfill."
+                .to_string()
+        }
+    };
+
+    Err(ApiError::BadRequest(format!(
+        "insufficient {interval} candle data for {instrument_label}: \
+         {actual} of {expected} expected candles available \
+         ({pct:.1}% coverage, minimum {min:.0}%). {ingestion_hint}",
+        pct = coverage * 100.0,
+        min = MIN_CANDLE_COVERAGE * 100.0,
+    )))
+}