Commit Graph

7 Commits

Author SHA1 Message Date
a71b4e6b84 feat(github): per-repo commit enumeration for full history backfill
Adds a new github-repo EventSource that enumerates all repos via
/user/repos and walks each repo's /commits?author= endpoint, which
has no 1000-result cap unlike the Search API. Events use the same
github-commit:{sha} ID scheme as github_search for dedup. Per-repo
poller state enables full backfill on first run, page-1-only on
subsequent polls. Weekly poll interval by default.

Closes #1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 14:59:26 +03:00
88fbbba60b feat(hg): revset-based author query, group discovery, one-shot ingest script
Rewrites the hg worker to use json-log?rev=author() which matches the
changeset author (not the pusher), capturing commits landed by sheriffs.
Repos are discovered within configured groups plus individually listed
repos. The worker skips entirely after the first successful backfill.

Adds script/hg-ingest.sh for offline ingestion via local hg clones —
clones one repo at a time, caches extracted changesets to .tsv, inserts
via psql, and sets poller_state when done.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 13:58:21 +03:00
7919a2d9ab feat(worker): add hg-edge and bugzilla pollers
Wires two historical sources for completeness with the 2019 timeline:

- hg-edge.mozilla.org: scans json-pushes for a configured set of
  build/* repos and matches changeset author client-side, since the
  pushlog `user=` filter targets the pusher (sheriffs/reviewers in
  this case) rather than the author. Daily poll cadence — mozilla
  retired hg, no new events expected.
- bugzilla.mozilla.org: queries /rest/bug?creator=<email>. Without
  an api key the unauthenticated endpoint only returns public bugs,
  which is what the public timeline wants anyway.

Reshape renders "<author> committed <short_node> in <repo>" for hg
and "filed bug #<id> in <product>" for bugzilla, both linking back
to the canonical upstream URL via a stamped `_host` payload field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:55:41 +03:00
f750e8de47 feat(worker): add gitea activity feed poller
Hits /api/v1/users/{user}/activities/feeds?only-performed-by=true
on the configured gitea host (default git.lair.cafe). Page-1 polling
on a 10-min cadence; first run paginates back through up to 20
pages (1000 items) to seed history.

Gitea has no ETag support on this endpoint, so each tick is a fresh
fetch — relying on idempotent upsert by `gitea:<id>` for dedup.

Reshape covers the gitea op_type set:
  commit_repo  → "pushed N commits to repo:branch" + commits body,
                  parsing the JSON-encoded `content` field
  push_tag     → "tagged X in repo"
  create_repo  → "created repo"
  rename/transfer/delete_branch/delete_tag/star/fork — straightforward
  create/close/reopen_issue        → "{verb} issue #N in repo: title"
  create/close/reopen_pull_request → "{verb} pull request #N"
  merge_pull_request               → GitMerge icon
  comment_issue, comment_pull      → markdown body from comment.body
  approve/reject_pull_request, publish_release
  fallback for anything else (mirror_sync_*, future op_types)

Issue / PR / release events use gitea's pipe-separated
`<index>|<title>` content field; pushes have JSON-encoded content.

Host stamping: parse_gitea_event injects `_host` into each row's
payload so the reshape layer can construct web URLs without a
config dependency. Multi-host gitea would still work as long as
each source instance has its own host configured.

Worker config:
  GITEA_HOST                  default git.lair.cafe
  GITEA_USER                  default grenade
  GITEA_TOKEN                 optional (raises rate limit; required
                                for private repo activity to surface)
  GITEA_POLL_INTERVAL_SECS    default 600

Tests: +2 in moments-data (commit_repo parses, private flag
captured), +4 in moments-core (commit_repo with body, create_issue
pipe-content, merge icon swap, fallback) — 27 total green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:41:55 +03:00
e4052c4c9a feat(worker): add github search api source for historical backfill
The Events API is hard-capped at 90 days (15 events for grenade
right now). The Search API has its own 1000-result-per-query cap
but reaches the start of the user's GitHub history — for grenade,
430 issues/PRs going back to 2012-08-08.

  GET /search/issues?q=author:<user>&sort=created&order=desc

Polled hourly by default but defaults to 24h interval since this is
backfill, not a live feed. After the first run most upserts are
no-ops. Stored as Source::Github with action "Issue" or "PullRequest"
(distinguished by the .pull_request field on the search item),
keyed `github-issue:<owner>/<repo>#<n>`.

/search/commits is deliberately not used: GitHub matches the same
commit across every fork that contains it, so 275k of grenade's
"commits" are mostly duplicated fork hits in repos he never authored
to. If commit history becomes valuable we should enumerate his repos
and walk per-repo /commits?author= instead.

Visibility: search/issues items don't carry .private, so we lookup
/repos/{full_name} once per unique repo encountered (cached for the
duration of the poll). Failure to resolve is treated as private —
better to under-expose than over-expose on the public timeline.

Reshape: presentation/github.rs gains an Issue/PullRequest path that
extracts from the search item shape (html_url, number, title, state,
.pull_request.merged_at) rather than the events-API wrapper. Merged
PRs use the GitMerge icon, mirroring the events-API path.

Worker now spawns two tokio tasks (events + search), aborts both
on SIGINT. New env: SEARCH_POLL_INTERVAL_SECS (default 86400).

Tests: +2 in moments-data (URL parsing), +2 in moments-core
(search Issue + merged-PR reshape) — 14 total green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:49:06 +03:00
45ceec2ec7 feat(worker): add github events poller
Adds the first ingestion source. Page-1 polling is ETag-conditional
(304s don't count against rate limit); the very first run paginates
back through Link "next" pages up to a 10-page safety cap so the
table starts populated rather than waiting for new activity.

Hits /users/{user}/events/public — works without auth, returns the
right scope for a public timeline. Token (GITHUB_TOKEN) is optional;
when present it raises the rate limit from 60 to 5000/hr.

New plumbing:

  moments-core::sources
    - EventSource trait (poll() -> count)
    - PollerStateStore trait (etag persistence port)
    - run_poller driver: tokio interval + jittered exponential backoff

  moments-data::github
    - GithubSource impl, raw payload preserved as JSONB
    - parse_link_next for pagination
    - 4 unit tests covering parser + Link parsing

  migration 0002_poller_state.sql
    - one row per source: source, etag, last_modified, last_fetched

Worker binary spawns one tokio task per source (just github for now)
and aborts on SIGINT. Verified by smoke-curling the upstream endpoint:
ETag and Link headers are present; payload shape matches the parser.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:59:15 +03:00
6775309043 chore: scaffold moments workspace
Cargo workspace with five crates per architecture conventions:

- moments-entities: Source enum, Event, EventQuery, SourceSummary
- moments-core:     EventReader / EventWriter ports
- moments-data:     PgStore (sqlx postgres adapter) + 0001_init.sql
- moments-api:      axum binary; /v1/{healthz,events,sources}
- moments-worker:   skeleton; pollers land in step 2

Sources committed-to for ingestion: github, gitea, hg, bugzilla.
Workstation events explicitly retired (not deferred).

Build + clippy clean. sqlx queries use the runtime API for now;
will switch to compile-time-checked macros + .sqlx offline cache
once magrathea has the moments_{ro,rw} roles and database created.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:47:06 +03:00