feat: gate candle backtests on 95% coverage with actionable diagnostics

Rejects backtest submissions where the requested date range has fewer
than 95% of the expected candles, rather than silently queuing a run
against sparse data. The 400 error includes actual vs expected counts,
coverage percentage, and an ingestion status hint derived from the
per-interval candle cursor (caught up / lagging / never ingested).

Also enriches GET /api/v1/market-candles/coverage with expected_count
and coverage_pct fields so callers can pre-check readiness before
submitting a backtest. Documents the full incomplete-data workflow in
docs/api.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-09 09:30:00 +02:00
parent ad6b38cb4e
commit 3d41574fab
3 changed files with 170 additions and 13 deletions

View File

@@ -47,10 +47,12 @@ The standard iteration loop for developing a profitable strategy:
```
1. Create an ingest config → historical trade data flows in via ingest-binance
2. Backfill candles → aggregate trades into OHLCV bars at desired intervals
3. Check data coverage → confirm the date range you want to backtest is available
2. Backfill candles → POST /api/v1/market-candles/backfill per interval
3. Check data coverage → GET /api/v1/market-candles/coverage/{exchange}/{symbol}
Verify coverage_pct ≥ 95% for your target date range
4. Author a strategy → POST /api/v1/strategies (optional, but enables grouping)
5. Submit a backtest → POST /api/v1/paper-runs (mode: "backtest")
400 with coverage details if data is incomplete
6. Poll until complete → GET /api/v1/paper-runs/{id}
7. Analyse result_summary → trade stats, Sharpe ratio, win rate, etc.
8. Download positions → GET /api/v1/paper-runs/{id}/positions (equity curve)
@@ -58,6 +60,30 @@ The standard iteration loop for developing a profitable strategy:
10. Revise the strategy, repeat
```
### Handling incomplete data
The backtest submission endpoint enforces a **95% candle coverage** requirement. If fewer than 95%
of the expected candles are present for the requested date range, the request is rejected with a
`400 Bad Request` response explaining the shortfall and what to do:
```json
{
"error": "insufficient 1h candle data for BTCUSDT on binance_spot: 4380 of 8760 expected candles available (50.0% coverage, minimum 95%). Candle ingestion last reached 2025-07-01 — it may still be catching up. Retry later or trigger a backfill via POST /api/v1/market-candles/backfill."
}
```
The error includes an ingestion status hint derived from the per-interval cursor:
| Hint | Meaning | Action |
|---|---|---|
| "Candle ingestion appears up to date" | Cursor is current; data is genuinely sparse | Run `POST /api/v1/market-candles/backfill` for the gap period |
| "Candle ingestion last reached {date}" | Cursor lags behind; worker is catching up | Wait and retry, or run a targeted backfill |
| "No candle ingestion cursor found" | Interval has never been ingested by the worker | Run `POST /api/v1/market-candles/backfill` to populate via Binance REST API |
Pre-check coverage before submitting a backtest using `GET /api/v1/market-candles/coverage/{exchange}/{symbol}`.
The response now includes `expected_count` and `coverage_pct` fields so you can verify readiness
without incurring a failed backtest submission.
---
## Data Preparation
@@ -216,19 +242,31 @@ Check which candle intervals are available and their date ranges.
"interval": "1h",
"first_open": "2025-01-01T00:00:00Z",
"last_close": "2026-01-01T00:00:00Z",
"count": 8760
"count": 8755,
"expected_count": 8760,
"coverage_pct": 99.94
},
{
"interval": "4h",
"first_open": "2025-01-01T00:00:00Z",
"last_close": "2026-01-01T00:00:00Z",
"count": 2190
"count": 1800,
"expected_count": 2190,
"coverage_pct": 82.19
}
]
```
Use this before submitting a backtest to confirm data is available for your chosen interval and
date range.
| Field | Description |
|---|---|
| `count` | Actual candle rows stored in the database |
| `expected_count` | Expected rows based on interval duration across the available range |
| `coverage_pct` | `count / expected_count × 100`, capped at 100. Values below 95 indicate gaps. |
Use this before submitting a backtest to confirm data is complete for your chosen interval and
date range. The backtest endpoint requires `coverage_pct ≥ 95` for the specific `[starts_at,
finishes_at]` window; `coverage_pct` here is computed over the full available range, so a
sub-range may be complete even if the overall coverage is lower.
---
@@ -520,10 +558,21 @@ Submit a new paper run (backtest or live).
**Validation rules:**
- `finishes_at` must be after `starts_at`
- For `"backtest"`: `starts_at` must be in the past; data must exist for the instrument and interval; the range must fall within available data
- For `"backtest"` with candles: the requested range must have **≥ 95% candle coverage** (actual count vs expected count derived from interval duration). Returns 400 with a diagnostic message and ingestion status hint if below threshold.
- For `"live"`: `finishes_at` must be in the future; `candle_interval` must not be set
- For `RuleBased` strategies: all timeframes referenced by expressions must have available candle data
- For `RuleBased` strategies: all timeframes referenced by expressions must have available candle data with ≥ 95% coverage
- Raw-tick backtests are rejected if the date range contains more than 500,000,000 trades
**Insufficient coverage response (400):**
```json
{
"error": "insufficient 1h candle data for BTCUSDT on binance_spot: 4380 of 8760 expected candles available (50.0% coverage, minimum 95%). Candle ingestion last reached 2025-07-01 — it may still be catching up. Retry later or trigger a backfill via POST /api/v1/market-candles/backfill."
}
```
See [Handling incomplete data](#handling-incomplete-data) for the interpretation guide.
**Response (201):** `PaperRunResponse`
---

View File

@@ -127,7 +127,12 @@ pub struct CoverageEntry {
pub interval: String,
pub first_open: DateTime<Utc>,
pub last_close: DateTime<Utc>,
/// Actual number of candles stored in the database for this interval.
pub count: i64,
/// Expected number of candles for the available range (derived from interval duration).
pub expected_count: i64,
/// Coverage as a percentage (0100). Values below 95 indicate gaps in the data.
pub coverage_pct: f64,
}
pub async fn get_candle_coverage(
@@ -144,11 +149,24 @@ pub async fn get_candle_coverage(
let entries = rows
.into_iter()
.map(|(interval, first_open, last_close, count)| CoverageEntry {
interval,
first_open,
last_close,
count,
.map(|(interval, first_open, last_close, count)| {
let range_secs = (last_close - first_open).num_seconds().max(0) as u64;
let interval_secs =
swym_dal::models::strategy_config::parse_interval_secs(&interval).unwrap_or(1);
let expected_count = (range_secs / interval_secs) as i64;
let coverage_pct = if expected_count > 0 {
(count as f64 / expected_count as f64 * 100.0).min(100.0)
} else {
100.0
};
CoverageEntry {
interval,
first_open,
last_close,
count,
expected_count,
coverage_pct,
}
})
.collect();

View File

@@ -14,7 +14,7 @@ use swym_dal::models::paper_run::{PaperRunRow, PaperRunStatus};
use swym_dal::models::paper_run_position::PaperRunPositionRow;
use swym_dal::models::strategy_config::{StrategyConfig, collect_timeframes};
use swym_dal::models::condition_audit::ConditionAuditRow;
use swym_dal::repo::{condition_audit, instrument, market_event, paper_run, paper_run_position, strategy};
use swym_dal::repo::{condition_audit, ingest_config, instrument, market_event, paper_run, paper_run_position, strategy};
use swym_dal::strategy_hash::{compute_strategy_hash, normalize_strategy};
// -- Request / Response types --
@@ -224,6 +224,16 @@ pub async fn create_paper_run(
)));
}
validate_candle_completeness(
&state.pool,
instrument.id,
&format!("{name_exchange} on {exchange_name}"),
interval,
req.starts_at,
req.finishes_at,
)
.await?;
// For rule-based strategies, also validate every additional timeframe
// referenced by expressions in the rule tree.
if let StrategyConfig::RuleBased(ref params) = run_config.strategy {
@@ -267,6 +277,16 @@ pub async fn create_paper_run(
data_end = tf_range.1,
)));
}
validate_candle_completeness(
&state.pool,
instrument.id,
&format!("{name_exchange} on {exchange_name}"),
tf,
req.starts_at,
req.finishes_at,
)
.await?;
}
}
} else {
@@ -589,3 +609,73 @@ pub async fn list_paper_run_candles(
candles,
}))
}
// ---------------------------------------------------------------------------
// Candle completeness validation
// ---------------------------------------------------------------------------
const MIN_CANDLE_COVERAGE: f64 = 0.95;
/// Validate that candle coverage for `[from, to)` meets the minimum threshold.
///
/// Computes the expected candle count from the interval duration and compares
/// it to the actual count in the database. Returns `Err(BadRequest)` with a
/// diagnostic message (including an ingestion status hint) when coverage is
/// below [`MIN_CANDLE_COVERAGE`].
async fn validate_candle_completeness(
pool: &sqlx::PgPool,
instrument_id: i32,
instrument_label: &str,
interval: &str,
from: DateTime<Utc>,
to: DateTime<Utc>,
) -> Result<(), ApiError> {
use swym_dal::models::strategy_config::parse_interval_secs;
let interval_secs = parse_interval_secs(interval)
.expect("interval already validated before this call");
let range_secs = (to - from).num_seconds().max(0) as u64;
let expected = (range_secs / interval_secs) as i64;
if expected == 0 {
return Ok(());
}
let actual = market_event::count_candles(pool, instrument_id, interval, from, to).await?;
let coverage = actual as f64 / expected as f64;
if coverage >= MIN_CANDLE_COVERAGE {
return Ok(());
}
// Build an ingestion status hint from the candle cursor.
let cursor = ingest_config::get_candle_cursor(pool, instrument_id, interval).await?;
let ingestion_hint = match cursor {
Some(date) => {
let yesterday = (Utc::now() - chrono::Duration::days(1)).date_naive();
if date >= yesterday {
"Candle ingestion appears up to date; the data may be genuinely sparse \
for this period."
.to_string()
} else {
format!(
"Candle ingestion last reached {date}; it may still be catching up. \
Retry later or trigger a backfill via POST /api/v1/market-candles/backfill."
)
}
}
None => {
"No candle ingestion cursor found for this interval. \
Trigger a backfill via POST /api/v1/market-candles/backfill."
.to_string()
}
};
Err(ApiError::BadRequest(format!(
"insufficient {interval} candle data for {instrument_label}: \
{actual} of {expected} expected candles available \
({pct:.1}% coverage, minimum {min:.0}%). {ingestion_hint}",
pct = coverage * 100.0,
min = MIN_CANDLE_COVERAGE * 100.0,
)))
}