Full-stack feature showing programming languages by commit activity
as a stream graph on the dashboard.
Backend:
- migration: repo_languages table (source, repo, language, bytes, color)
- worker: fetch language breakdowns via GitHub GraphQL (batched,
20 repos/request) and Gitea REST API during poll cycles
- API: GET /v1/languages/daily (daily commit counts per language),
GET /v1/languages/repos (all stored repo language data)
- fix timezone bug in daily_counts and language_daily_counts: the
PostgreSQL server timezone (Europe/Sofia, UTC+3) shifted day
boundaries, miscounting events near midnight. Now uses explicit
UTC boundaries in generate_series JOINs.
- use per-source CASE for repo name extraction in language query
to match gitea payload structure (repo.full_name vs repo.name)
- Gitea languages use GitHub colors via COALESCE fallback
Frontend:
- LanguageStreamGraph component: pure SVG stream graph, weekly
buckets, centered baseline, top 8 languages + Other, GitHub
canonical language colors, legend with color dots
- DashPage/ProjectPage: fetch repo languages once via new endpoint
instead of per-repo forge proxy calls (eliminates 200+ GitHub
API calls and 403 rate limit errors)
- removed fetchLanguages forge proxy wrapper (dead code)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The DB now stores everything GitHub will give us, the API only ever
returns public events (for now).
Endpoint switch in the github poller: when GITHUB_TOKEN is set we
hit /users/{u}/events (public + private), otherwise fall back to
/users/{u}/events/public. Either way each event's top-level `public`
boolean is captured into a new column.
Schema:
migration 0003_event_public.sql adds events.public BOOLEAN NOT NULL
DEFAULT true, plus an index on (public, occurred_at DESC).
Wire:
Event gains a `public: bool` field.
EventQuery gains `include_private: bool` (default false).
list_events and source_summaries gate on it.
moments-api pins include_private = false at every call site —
threading it as a query param is a future-auth concern, not now.
The default-true on the column keeps existing rows correct: the 11
events already in the DB came from /events/public and are genuinely
public.
After this change, clear poller_state so the next worker run does a
fresh backfill via /events:
DELETE FROM poller_state WHERE source = 'github';
Tests: +2 in github poller (private flag captured, default-public
on missing field) — 10 total green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the first ingestion source. Page-1 polling is ETag-conditional
(304s don't count against rate limit); the very first run paginates
back through Link "next" pages up to a 10-page safety cap so the
table starts populated rather than waiting for new activity.
Hits /users/{user}/events/public — works without auth, returns the
right scope for a public timeline. Token (GITHUB_TOKEN) is optional;
when present it raises the rate limit from 60 to 5000/hr.
New plumbing:
moments-core::sources
- EventSource trait (poll() -> count)
- PollerStateStore trait (etag persistence port)
- run_poller driver: tokio interval + jittered exponential backoff
moments-data::github
- GithubSource impl, raw payload preserved as JSONB
- parse_link_next for pagination
- 4 unit tests covering parser + Link parsing
migration 0002_poller_state.sql
- one row per source: source, etag, last_modified, last_fetched
Worker binary spawns one tokio task per source (just github for now)
and aborts on SIGINT. Verified by smoke-curling the upstream endpoint:
ETag and Link headers are present; payload shape matches the parser.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>