Compare commits

..

4 Commits

Author SHA1 Message Date
c66aaeb268 feat: discover contributed repos via GitHub GraphQL API
The REST /user/repos endpoint only returns repos where the user is
owner, collaborator, or org member. Repos contributed to via PRs
(e.g. polkadot-js/api, zed-industries/zed) were never discovered
and their commits were missing from moments.

Now supplements /user/repos with a GraphQL
repositoriesContributedTo query, which returns all repos the user
has committed to, opened issues/PRs on, or reviewed — with cursor-
based pagination and no result cap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 05:38:57 +03:00
2a20b47a29 fix: resolve clippy redundant_closure warning in moments-api
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 05:04:18 +03:00
f77a8ab48f fix: use since cursor in github-repo polls to prevent missed commits
After initial backfill, scan_repo was fetching only page 1 (100 most
recent commits) per repo. If more than 100 commits landed between
7-day polls, older ones in that window were permanently missed.

Now stores the newest commit date in poller_state.last_modified and
passes it as &since= on subsequent polls, with full pagination, so
only genuinely new commits are fetched but none are skipped.

On first poll after deploy, last_modified is NULL so no since filter
is applied — triggering a full re-backfill that catches any
previously missed commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 05:03:41 +03:00
1679153c43 docs: add CLAUDE.md and ignore .zed/
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 04:43:00 +03:00
4 changed files with 214 additions and 14 deletions

1
.gitignore vendored
View File

@@ -2,6 +2,7 @@
**/*.rs.bk **/*.rs.bk
.env .env
.env.local .env.local
.zed/
# frontend # frontend
/ui/node_modules /ui/node_modules

78
CLAUDE.md Normal file
View File

@@ -0,0 +1,78 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
**moments** is a personal activity timeline and portfolio site. It ingests developer activity from multiple forges (GitHub, Gitea, Mercurial, Bugzilla), stores raw JSON payloads in PostgreSQL, and serves a React frontend showing contribution graphs, a ranked project dashboard, and a filterable activity timeline.
## Architecture
Hexagonal (ports & adapters) Rust backend with a React/TypeScript frontend.
### Crate Dependency Graph
```
moments-entities — pure types/DTOs, no DB or HTTP deps
^
moments-core — port traits (EventReader, EventWriter, EventSource, PollerStateStore)
+ presentation reshape + poller loop
^
moments-data — sole adapter: PgStore implements all core traits
+ EventSource impls (github, gitea, hg, bugzilla)
+ SQL migrations
^
moments-api — axum HTTP API binary (read-only, connects as moments_ro)
moments-worker — ingestion daemon binary (runs migrations, connects as moments_rw)
```
### Key Design Decisions
- **Raw payload storage**: upstream JSON is stored verbatim in `events.payload` (JSONB). The `reshape()` function in `moments-core/src/presentation.rs` transforms payloads into `TimelineItem` at request time — no re-ingestion needed to change presentation.
- **Public/private gate**: `events.public` boolean controls API visibility. Only `public = true` rows are served.
- **Wire types are hand-maintained**: `ui/src/api/client.ts` mirrors Rust entity types manually.
- **Migrations**: run automatically on worker startup via `sqlx::migrate!`. The API binary never runs migrations.
### Frontend
React 19 + Vite 6 (SWC) + TypeScript + Bootstrap 5. State/data via `@tanstack/react-query`. Package manager is **pnpm**.
Routes: `/` (dashboard), `/activity` (timeline), `/project/:source/*` (project detail), `/cv` (resume).
## Build & Dev Commands
### Rust
```sh
cargo build --workspace # build all crates
cargo build --workspace --release # release build
cargo clippy --workspace # lint
cargo fmt --check # format check
cargo test --workspace # run tests
# Run binaries (need DATABASE_URL)
DATABASE_URL=postgres://localhost/moments cargo run -p moments-api
DATABASE_URL=postgres://localhost/moments cargo run -p moments-worker
```
### Frontend
```sh
cd ui
pnpm install # install deps
pnpm dev # dev server on :5173 (proxies /api/* to localhost:8080)
pnpm lint # tsc --noEmit type-check
pnpm build # production build (tsc -b && vite build)
```
## Database
PostgreSQL with three migrations in `crates/moments-data/migrations/`. Two roles: `moments_rw` (worker, full access) and `moments_ro` (API, SELECT-only).
## API Endpoints
All under `/v1/`: `healthz`, `events`, `sources`, `projects`, `activity/daily`, `forge/{source}/*`, `og/contributions.png`.
## Deployment
Production uses `./script/deploy.sh`. Services run under systemd with hardened units. Secrets resolved from `pass` store via template substitution. Nginx reverse-proxies `/api/` to the API host.

View File

@@ -170,7 +170,7 @@ async fn og_contributions(
.iter() .iter()
.filter_map(|s| s.earliest) .filter_map(|s| s.earliest)
.min() .min()
.unwrap_or_else(|| Utc::now()) .unwrap_or_else(Utc::now)
.date_naive(); .date_naive();
let today = Utc::now().date_naive(); let today = Utc::now().date_naive();

View File

@@ -1,16 +1,20 @@
//! Per-repo commit enumeration for full GitHub history. //! Per-repo commit enumeration for full GitHub history.
//! //!
//! The Search API caps at 1000 results; this source enumerates all repos //! Discovers repos via two sources:
//! the user can access via `/user/repos` and walks each repo's commit //! 1. REST `/user/repos` — repos where the user is owner, collaborator,
//! history via `/repos/{owner}/{repo}/commits?author={user}` — no cap. //! or org member.
//! 2. GraphQL `repositoriesContributedTo` — repos the user has committed
//! to, opened issues/PRs on, or reviewed, even without collaborator
//! status. No result cap (cursor-paginated).
//!
//! Then walks each repo's commit history via
//! `/repos/{owner}/{repo}/commits?author={user}` with a `since` cursor
//! to avoid re-fetching known commits.
//! //!
//! Events use `github-commit:{sha}` as their ID, matching the scheme in //! Events use `github-commit:{sha}` as their ID, matching the scheme in
//! `github_search`, so duplicates are resolved via idempotent upsert. //! `github_search`, so duplicates are resolved via idempotent upsert.
//!
//! Per-repo poller state keys (`github-repo:{owner}/{repo}`) track which
//! repos have been fully backfilled. First run paginates the full history;
//! subsequent runs fetch only page 1.
use std::collections::HashSet;
use std::sync::Arc; use std::sync::Arc;
use async_trait::async_trait; use async_trait::async_trait;
@@ -114,22 +118,132 @@ impl GithubRepoSource {
break; break;
} }
} }
// Supplement with repos from GraphQL repositoriesContributedTo.
// This catches repos where the user contributed via PRs but isn't
// an owner, collaborator, or org member — no result cap.
let mut known: HashSet<String> = repos.iter().map(|r| r.full_name.clone()).collect();
let contributed = self.discover_contributed_repos().await;
match contributed {
Ok(extra) => {
for r in extra {
if known.insert(r.full_name.clone()) {
repos.push(r);
}
}
}
Err(e) => {
warn!(error = %e, "GraphQL contributed-repos discovery failed; continuing with known repos");
}
}
Ok(repos) Ok(repos)
} }
/// Fetch commits for a single repo, paginating fully on first run. /// Discover repos the user has contributed to via GraphQL.
/// Uses cursor-based pagination with no result cap.
async fn discover_contributed_repos(&self) -> Result<Vec<Repo>, SourceError> {
let token = match &self.config.token {
Some(t) => t,
None => return Ok(vec![]),
};
let mut repos = Vec::new();
let mut cursor: Option<String> = None;
loop {
let after = match &cursor {
Some(c) => format!(", after: \"{}\"", c),
None => String::new(),
};
let query = format!(
r#"{{ user(login: "{}") {{ repositoriesContributedTo(first: 100, contributionTypes: [COMMIT, PULL_REQUEST, ISSUE]{}) {{ pageInfo {{ hasNextPage endCursor }} nodes {{ nameWithOwner isPrivate }} }} }} }}"#,
self.config.user, after
);
let body = serde_json::json!({ "query": query });
let resp = self
.client
.post("https://api.github.com/graphql")
.header(header::AUTHORIZATION, format!("Bearer {token}"))
.header(header::USER_AGENT, USER_AGENT)
.header(header::CONTENT_TYPE, "application/json")
.json(&body)
.send()
.await
.map_err(|e| SourceError::Http(e.to_string()))?;
if !resp.status().is_success() {
return Err(SourceError::Http(format!(
"{} POST graphql",
resp.status()
)));
}
let data: Value = resp
.json()
.await
.map_err(|e| SourceError::Parse(e.to_string()))?;
// Check for GraphQL-level errors
if let Some(errors) = data.get("errors").and_then(Value::as_array) {
if let Some(msg) = errors.first().and_then(|e| e.get("message")).and_then(Value::as_str) {
return Err(SourceError::Http(format!("GraphQL error: {msg}")));
}
}
let contributed = &data["data"]["user"]["repositoriesContributedTo"];
let nodes = contributed["nodes"].as_array();
if let Some(nodes) = nodes {
for node in nodes {
let full_name = node
.get("nameWithOwner")
.and_then(Value::as_str);
let private = node
.get("isPrivate")
.and_then(Value::as_bool)
.unwrap_or(false);
if let Some(name) = full_name {
repos.push(Repo {
full_name: name.to_string(),
private,
});
}
}
}
let has_next = contributed["pageInfo"]["hasNextPage"]
.as_bool()
.unwrap_or(false);
if !has_next {
break;
}
cursor = contributed["pageInfo"]["endCursor"]
.as_str()
.map(String::from);
}
debug!(repos = repos.len(), "discovered contributed repos via GraphQL");
Ok(repos)
}
/// Fetch commits for a single repo, paginating fully on first run
/// and using `since` on subsequent runs to catch everything new.
async fn scan_repo(&self, repo: &Repo) -> Result<usize, SourceError> { async fn scan_repo(&self, repo: &Repo) -> Result<usize, SourceError> {
let state_key = format!("github-repo:{}", repo.full_name); let state_key = format!("github-repo:{}", repo.full_name);
let prior = self.state.load(&state_key).await?; let prior = self.state.load(&state_key).await?;
let first_run = prior.is_none(); let since = prior.as_ref().and_then(|s| s.last_modified);
let max_pages = if first_run { MAX_BACKFILL_PAGES } else { 1 };
let mut total = 0usize; let mut total = 0usize;
for page in 1..=max_pages { let mut newest: Option<DateTime<Utc>> = since;
let url = format!( for page in 1..=MAX_BACKFILL_PAGES {
let mut url = format!(
"https://api.github.com/repos/{}/commits?author={}&per_page={}&page={}", "https://api.github.com/repos/{}/commits?author={}&per_page={}&page={}",
repo.full_name, self.config.user, self.config.per_page, page repo.full_name, self.config.user, self.config.per_page, page
); );
if let Some(since_dt) = since {
url.push_str(&format!("&since={}", since_dt.to_rfc3339()));
}
let req = self.apply_headers(self.client.get(&url)); let req = self.apply_headers(self.client.get(&url));
let resp = req let resp = req
.send() .send()
@@ -165,6 +279,13 @@ impl GithubRepoSource {
.iter() .iter()
.filter_map(|item| parse_commit(item, repo)) .filter_map(|item| parse_commit(item, repo))
.collect(); .collect();
for ev in &events {
newest = Some(match newest {
Some(n) if ev.occurred_at > n => ev.occurred_at,
Some(n) => n,
None => ev.occurred_at,
});
}
total += self.writer.upsert_events(&events).await?; total += self.writer.upsert_events(&events).await?;
if items.len() < self.config.per_page as usize { if items.len() < self.config.per_page as usize {
@@ -172,7 +293,7 @@ impl GithubRepoSource {
} }
} }
self.state.touch(&state_key).await?; self.state.save(&state_key, None, newest).await?;
Ok(total) Ok(total)
} }
} }