The ingestion paths each had a gap that let non-default-branch work
slip through: /search/commits silently excludes forks, the per-repo
REST commit scan only walked the default branch, and the user events
feed ages out after 90 days. Catch them by enumerating branches per
repo and scanning each (with per-branch state cursors so a brand-new
branch isn't cut off by the default branch's cursor), pre-filtering
branches via a GraphQL HEAD-author check so big upstream forks like
azure-docs don't trigger hundreds of wasted REST calls, treating
GitHub's HTTP 500 on author-filtered empty branches as "no commits"
rather than a server error, and adding fork:true to the search query.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the first ingestion source. Page-1 polling is ETag-conditional
(304s don't count against rate limit); the very first run paginates
back through Link "next" pages up to a 10-page safety cap so the
table starts populated rather than waiting for new activity.
Hits /users/{user}/events/public — works without auth, returns the
right scope for a public timeline. Token (GITHUB_TOKEN) is optional;
when present it raises the rate limit from 60 to 5000/hr.
New plumbing:
moments-core::sources
- EventSource trait (poll() -> count)
- PollerStateStore trait (etag persistence port)
- run_poller driver: tokio interval + jittered exponential backoff
moments-data::github
- GithubSource impl, raw payload preserved as JSONB
- parse_link_next for pagination
- 4 unit tests covering parser + Link parsing
migration 0002_poller_state.sql
- one row per source: source, etag, last_modified, last_fetched
Worker binary spawns one tokio task per source (just github for now)
and aborts on SIGINT. Verified by smoke-curling the upstream endpoint:
ETag and Link headers are present; payload shape matches the parser.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>