bench: reproducible benchmark harness + published numbers #22
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
There are zero benchmarks in the tree. The 7/10 rating rests on unproven performance; this issue either proves it or tells us exactly what to fix.
Why: the single biggest credibility artifact. "Near-frontier AI for mortals" needs a table an outsider can verify.
Deliverables:
doc/benchmarks.md; headline numbers surfaced in READMEDependencies: #21 (token-level metrics) for engine-truth measurement.
PR #32 lands the harness (
script/bench.py) and the first published numbers (doc/benchmarks.md, 2026-06-12): 1.7B@3060 81 tok/s, 8B@4090 62 tok/s, 27B@2×5090 Q6K TP=2 at a steady 35 tok/s with flat decode 128→4k — and the 4k-prefill TTFT of 7.1 s recorded as #23's before-number.Remaining for full closure: the llama.cpp / Ollama comparison columns (same checkpoints, same hosts — the harness accepts any OpenAI-compatible base URL via
--label, so adding them is an install-and-run exercise), and cold-load timing (visible per-deploy in the journal + deploy validation; tracked under #1).