Complements the existing avg-by-weekday chart with its orthogonal
partner: which hour of the day the user typically commits. The api
buckets events by EXTRACT(hour FROM occurred_at AT TIME ZONE $tz) so
the chart matches the clock the user sees rather than UTC; the UI
passes the browser's resolved IANA timezone. Renders as 24 mini-bars
below the weekday chart with labels every 4 hours and per-bar
tooltips showing the average events/day at that hour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Gitea writes one Action row per interested user-context. A push to an
org repo by user U produces two rows — one with user_id=U, one with
user_id=org — differing only in `id` and `user_id`. Polling both the
user feed and org feeds (which we do, and need to, since neither alone
catches every cross-namespace event) surfaced both rows; the
`gitea:{action_row_id}` id gave them distinct ids, so the upsert dedup
never fired and ~38% of events on org-repo project pages rendered
twice. Switch to a content-derived id keyed on (op_type, act_user_id,
repo_id, ref_name, comment_id, created) so the two rows collide on
upsert, and add a migration that re-keys existing rows to the same
formula while collapsing the duplicates already in the table.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ingestion paths each had a gap that let non-default-branch work
slip through: /search/commits silently excludes forks, the per-repo
REST commit scan only walked the default branch, and the user events
feed ages out after 90 days. Catch them by enumerating branches per
repo and scanning each (with per-branch state cursors so a brand-new
branch isn't cut off by the default branch's cursor), pre-filtering
branches via a GraphQL HEAD-author check so big upstream forks like
azure-docs don't trigger hundreds of wasted REST calls, treating
GitHub's HTTP 500 on author-filtered empty branches as "no commits"
rather than a server error, and adding fork:true to the search query.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Aggregate graph endpoints (daily counts, language daily counts, source
summaries, OG image) now include private repository activity. These
endpoints only expose numeric counts — no commit messages, repo names,
or other metadata — so private details remain hidden. The activity
timeline continues to serve only public events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each commit was counted once per language in the repo regardless of
that language's share, so Shell (present in many repos as small
deploy scripts) appeared larger than Rust. Now weights each commit
by the language's byte proportion in the repo (e.g. a commit to a
95% Rust / 5% Shell repo contributes 0.95 to Rust, 0.05 to Shell).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full-stack feature showing programming languages by commit activity
as a stream graph on the dashboard.
Backend:
- migration: repo_languages table (source, repo, language, bytes, color)
- worker: fetch language breakdowns via GitHub GraphQL (batched,
20 repos/request) and Gitea REST API during poll cycles
- API: GET /v1/languages/daily (daily commit counts per language),
GET /v1/languages/repos (all stored repo language data)
- fix timezone bug in daily_counts and language_daily_counts: the
PostgreSQL server timezone (Europe/Sofia, UTC+3) shifted day
boundaries, miscounting events near midnight. Now uses explicit
UTC boundaries in generate_series JOINs.
- use per-source CASE for repo name extraction in language query
to match gitea payload structure (repo.full_name vs repo.name)
- Gitea languages use GitHub colors via COALESCE fallback
Frontend:
- LanguageStreamGraph component: pure SVG stream graph, weekly
buckets, centered baseline, top 8 languages + Other, GitHub
canonical language colors, legend with color dots
- DashPage/ProjectPage: fetch repo languages once via new endpoint
instead of per-repo forge proxy calls (eliminates 200+ GitHub
API calls and 403 rate limit errors)
- removed fetchLanguages forge proxy wrapper (dead code)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The REST /user/repos endpoint only returns repos where the user is
owner, collaborator, or org member. Repos contributed to via PRs
(e.g. polkadot-js/api, zed-industries/zed) were never discovered
and their commits were missing from moments.
Now supplements /user/repos with a GraphQL
repositoriesContributedTo query, which returns all repos the user
has committed to, opened issues/PRs on, or reviewed — with cursor-
based pagination and no result cap.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After initial backfill, scan_repo was fetching only page 1 (100 most
recent commits) per repo. If more than 100 commits landed between
7-day polls, older ones in that window were permanently missed.
Now stores the newest commit date in poller_state.last_modified and
passes it as &since= on subsequent polls, with full pagination, so
only genuinely new commits are fetched but none are skipped.
On first poll after deploy, last_modified is NULL so no since filter
is applied — triggering a full re-backfill that catches any
previously missed commits.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The events query's COALESCE for github source was missing _repo,
so per-repo commit events from github_repo had no repo match and
project pages showed 0 activities.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /repos/{owner}/{repo}/commits endpoint doesn't include repo info
in its response. Without _repo in the payload, these commits were
invisible to the projects query. Add _repo to parse_commit and include
it in the COALESCE chain for github source repo extraction.
After deploy, reset github-repo poller state to re-ingest with _repo:
DELETE FROM poller_state WHERE source LIKE 'github-repo%';
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add /v1/activity/daily endpoint returning per-day event counts via
generate_series + LEFT JOIN. Frontend renders an SVG contribution
graph with circles colored by quantile-based thresholds. Clicking a
day navigates to /activity/YYYY-MM-DD showing that day's events.
New /activity/:timespan route parses single dates (YYYY-MM-DD) and
ranges (YYYY-MM-DD..YYYY-MM-DD) from the URL to initialize the
activity timeline filter.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use CASE/source instead of COALESCE for repo name extraction — Gitea's
repo.name is the short name while full_name includes the owner prefix.
Fix Gitea README fetch to use /contents/README.md with base64 decoding
instead of the nonexistent /readme endpoint.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add repo filter param to /v1/events (SQL COALESCE across payload
shapes per source). New /project/:source/* route renders a filtered
activity timeline for a single repo. Dashboard cards link to the
drill-down page.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restructure routes: / and /dash show a project overview dashboard,
/activity hosts the existing timeline, /cv remains. Shared Layout
component provides consistent nav header and footer across all routes.
New /v1/projects endpoint aggregates per-repo activity stats (commits,
issues, PRs, date range) from existing event data via SQL. Dashboard
ranks projects by weighted recency + volume score and renders a card
grid.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a new github-repo EventSource that enumerates all repos via
/user/repos and walks each repo's /commits?author= endpoint, which
has no 1000-result cap unlike the Search API. Events use the same
github-commit:{sha} ID scheme as github_search for dedup. Per-repo
poller state enables full backfill on first run, page-1-only on
subsequent polls. Weekly poll interval by default.
Closes #1
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrites the hg worker to use json-log?rev=author() which matches the
changeset author (not the pusher), capturing commits landed by sheriffs.
Repos are discovered within configured groups plus individually listed
repos. The worker skips entirely after the first successful backfill.
Adds script/hg-ingest.sh for offline ingestion via local hg clones —
clones one repo at a time, caches extracted changesets to .tsv, inserts
via psql, and sets poller_state when done.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The user activity feed only returns events from the user's own namespace.
This adds org discovery via /api/v1/user/orgs and polls each org's
activity feed, filtering for events by the configured user. Per-org
poller state keys enable independent backfill. Org feed errors are
non-fatal to avoid disrupting the user feed poll.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires two historical sources for completeness with the 2019 timeline:
- hg-edge.mozilla.org: scans json-pushes for a configured set of
build/* repos and matches changeset author client-side, since the
pushlog `user=` filter targets the pusher (sheriffs/reviewers in
this case) rather than the author. Daily poll cadence — mozilla
retired hg, no new events expected.
- bugzilla.mozilla.org: queries /rest/bug?creator=<email>. Without
an api key the unauthenticated endpoint only returns public bugs,
which is what the public timeline wants anyway.
Reshape renders "<author> committed <short_node> in <repo>" for hg
and "filed bug #<id> in <product>" for bugzilla, both linking back
to the canonical upstream URL via a stamped `_host` payload field.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hits /api/v1/users/{user}/activities/feeds?only-performed-by=true
on the configured gitea host (default git.lair.cafe). Page-1 polling
on a 10-min cadence; first run paginates back through up to 20
pages (1000 items) to seed history.
Gitea has no ETag support on this endpoint, so each tick is a fresh
fetch — relying on idempotent upsert by `gitea:<id>` for dedup.
Reshape covers the gitea op_type set:
commit_repo → "pushed N commits to repo:branch" + commits body,
parsing the JSON-encoded `content` field
push_tag → "tagged X in repo"
create_repo → "created repo"
rename/transfer/delete_branch/delete_tag/star/fork — straightforward
create/close/reopen_issue → "{verb} issue #N in repo: title"
create/close/reopen_pull_request → "{verb} pull request #N"
merge_pull_request → GitMerge icon
comment_issue, comment_pull → markdown body from comment.body
approve/reject_pull_request, publish_release
fallback for anything else (mirror_sync_*, future op_types)
Issue / PR / release events use gitea's pipe-separated
`<index>|<title>` content field; pushes have JSON-encoded content.
Host stamping: parse_gitea_event injects `_host` into each row's
payload so the reshape layer can construct web URLs without a
config dependency. Multi-host gitea would still work as long as
each source instance has its own host configured.
Worker config:
GITEA_HOST default git.lair.cafe
GITEA_USER default grenade
GITEA_TOKEN optional (raises rate limit; required
for private repo activity to surface)
GITEA_POLL_INTERVAL_SECS default 600
Tests: +2 in moments-data (commit_repo parses, private flag
captured), +4 in moments-core (commit_repo with body, create_issue
pipe-content, merge icon swap, fallback) — 27 total green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walk back the earlier decision to skip /search/commits. The fork
inflation that worried me isn't misattribution — those commits
really were authored by the user; they just persist in forks after
the original repo went away. Skipping them dropped legitimate
historical work from the timeline.
The duplicate-SHA-across-forks issue is a pure dedup concern:
* keyed `github-commit:<sha>` (SHA only — globally unique by Git's
content addressing; same commit in two forks lands in one row);
* within a single page, dedup by id before INSERT (postgres ON
CONFLICT errors when the conflict target appears twice in one
statement);
* across pages and runs, last-write-wins via upsert. The repo
association may flip between forks but the commit content is
identical.
Visibility is read inline from `repository.private` on the search
item, no extra lookup needed. Also opportunistically populates the
shared visibility cache so the issue loop in the same poll skips
/repos/{full_name} GETs for any repo it already saw via commits.
Reshape: presentation/github.rs gains a Commit path — short SHA
linked, repo linked, first line of the commit message as subtitle.
GitCommit icon.
Tests: +3 in github_search (parse uses sha as id, marks private,
rejects non-github URL), +1 in presentation (commit reshape uses
short sha + first message line) — 18 total green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Events API is hard-capped at 90 days (15 events for grenade
right now). The Search API has its own 1000-result-per-query cap
but reaches the start of the user's GitHub history — for grenade,
430 issues/PRs going back to 2012-08-08.
GET /search/issues?q=author:<user>&sort=created&order=desc
Polled hourly by default but defaults to 24h interval since this is
backfill, not a live feed. After the first run most upserts are
no-ops. Stored as Source::Github with action "Issue" or "PullRequest"
(distinguished by the .pull_request field on the search item),
keyed `github-issue:<owner>/<repo>#<n>`.
/search/commits is deliberately not used: GitHub matches the same
commit across every fork that contains it, so 275k of grenade's
"commits" are mostly duplicated fork hits in repos he never authored
to. If commit history becomes valuable we should enumerate his repos
and walk per-repo /commits?author= instead.
Visibility: search/issues items don't carry .private, so we lookup
/repos/{full_name} once per unique repo encountered (cached for the
duration of the poll). Failure to resolve is treated as private —
better to under-expose than over-expose on the public timeline.
Reshape: presentation/github.rs gains an Issue/PullRequest path that
extracts from the search item shape (html_url, number, title, state,
.pull_request.merged_at) rather than the events-API wrapper. Merged
PRs use the GitMerge icon, mirroring the events-API path.
Worker now spawns two tokio tasks (events + search), aborts both
on SIGINT. New env: SEARCH_POLL_INTERVAL_SECS (default 86400).
Tests: +2 in moments-data (URL parsing), +2 in moments-core
(search Issue + merged-PR reshape) — 14 total green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The DB now stores everything GitHub will give us, the API only ever
returns public events (for now).
Endpoint switch in the github poller: when GITHUB_TOKEN is set we
hit /users/{u}/events (public + private), otherwise fall back to
/users/{u}/events/public. Either way each event's top-level `public`
boolean is captured into a new column.
Schema:
migration 0003_event_public.sql adds events.public BOOLEAN NOT NULL
DEFAULT true, plus an index on (public, occurred_at DESC).
Wire:
Event gains a `public: bool` field.
EventQuery gains `include_private: bool` (default false).
list_events and source_summaries gate on it.
moments-api pins include_private = false at every call site —
threading it as a query param is a future-auth concern, not now.
The default-true on the column keeps existing rows correct: the 11
events already in the DB came from /events/public and are genuinely
public.
After this change, clear poller_state so the next worker run does a
fresh backfill via /events:
DELETE FROM poller_state WHERE source = 'github';
Tests: +2 in github poller (private flag captured, default-public
on missing field) — 10 total green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the first ingestion source. Page-1 polling is ETag-conditional
(304s don't count against rate limit); the very first run paginates
back through Link "next" pages up to a 10-page safety cap so the
table starts populated rather than waiting for new activity.
Hits /users/{user}/events/public — works without auth, returns the
right scope for a public timeline. Token (GITHUB_TOKEN) is optional;
when present it raises the rate limit from 60 to 5000/hr.
New plumbing:
moments-core::sources
- EventSource trait (poll() -> count)
- PollerStateStore trait (etag persistence port)
- run_poller driver: tokio interval + jittered exponential backoff
moments-data::github
- GithubSource impl, raw payload preserved as JSONB
- parse_link_next for pagination
- 4 unit tests covering parser + Link parsing
migration 0002_poller_state.sql
- one row per source: source, etag, last_modified, last_fetched
Worker binary spawns one tokio task per source (just github for now)
and aborts on SIGINT. Verified by smoke-curling the upstream endpoint:
ETag and Link headers are present; payload shape matches the parser.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>