feat(worker): capture commits on non-default branches and forks

The ingestion paths each had a gap that let non-default-branch work
slip through: /search/commits silently excludes forks, the per-repo
REST commit scan only walked the default branch, and the user events
feed ages out after 90 days. Catch them by enumerating branches per
repo and scanning each (with per-branch state cursors so a brand-new
branch isn't cut off by the default branch's cursor), pre-filtering
branches via a GraphQL HEAD-author check so big upstream forks like
azure-docs don't trigger hundreds of wasted REST calls, treating
GitHub's HTTP 500 on author-filtered empty branches as "no commits"
rather than a server error, and adding fork:true to the search query.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-17 16:04:58 +03:00
parent 9a8c0955b5
commit 818a535903
4 changed files with 286 additions and 27 deletions

1
Cargo.lock generated
View File

@@ -1307,6 +1307,7 @@ dependencies = [
"chrono",
"moments-core",
"moments-entities",
"percent-encoding",
"reqwest",
"serde",
"serde_json",