feat(worker): add github search api source for historical backfill
The Events API is hard-capped at 90 days (15 events for grenade
right now). The Search API has its own 1000-result-per-query cap
but reaches the start of the user's GitHub history — for grenade,
430 issues/PRs going back to 2012-08-08.
GET /search/issues?q=author:<user>&sort=created&order=desc
Polled hourly by default but defaults to 24h interval since this is
backfill, not a live feed. After the first run most upserts are
no-ops. Stored as Source::Github with action "Issue" or "PullRequest"
(distinguished by the .pull_request field on the search item),
keyed `github-issue:<owner>/<repo>#<n>`.
/search/commits is deliberately not used: GitHub matches the same
commit across every fork that contains it, so 275k of grenade's
"commits" are mostly duplicated fork hits in repos he never authored
to. If commit history becomes valuable we should enumerate his repos
and walk per-repo /commits?author= instead.
Visibility: search/issues items don't carry .private, so we lookup
/repos/{full_name} once per unique repo encountered (cached for the
duration of the poll). Failure to resolve is treated as private —
better to under-expose than over-expose on the public timeline.
Reshape: presentation/github.rs gains an Issue/PullRequest path that
extracts from the search item shape (html_url, number, title, state,
.pull_request.merged_at) rather than the events-API wrapper. Merged
PRs use the GitMerge icon, mirroring the events-API path.
Worker now spawns two tokio tasks (events + search), aborts both
on SIGINT. New env: SEARCH_POLL_INTERVAL_SECS (default 86400).
Tests: +2 in moments-data (URL parsing), +2 in moments-core
(search Issue + merged-PR reshape) — 14 total green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -5,6 +5,7 @@ use moments_core::{EventSource, run_poller};
|
||||
use moments_data::{
|
||||
PgStore,
|
||||
github::{GithubConfig, GithubSource},
|
||||
github_search::{GithubSearchConfig, GithubSearchSource},
|
||||
};
|
||||
use reqwest::Client;
|
||||
use tracing::info;
|
||||
@@ -22,9 +23,14 @@ struct Args {
|
||||
#[arg(long, env = "GITHUB_TOKEN")]
|
||||
github_token: Option<String>,
|
||||
|
||||
/// Seconds between poll attempts per source.
|
||||
/// Seconds between Events-API polls (live feed, last 90 days).
|
||||
#[arg(long, env = "POLL_INTERVAL_SECS", default_value = "600")]
|
||||
interval_secs: u64,
|
||||
|
||||
/// Seconds between Search-API polls (historical issue/PR backfill).
|
||||
/// Defaults to 24h — this is a backfill, not a live feed.
|
||||
#[arg(long, env = "SEARCH_POLL_INTERVAL_SECS", default_value = "86400")]
|
||||
search_interval_secs: u64,
|
||||
}
|
||||
|
||||
#[tokio::main]
|
||||
@@ -50,18 +56,35 @@ async fn main() -> anyhow::Result<()> {
|
||||
},
|
||||
)) as Arc<dyn EventSource>;
|
||||
|
||||
let github_search = Arc::new(GithubSearchSource::new(
|
||||
http.clone(),
|
||||
store.clone(),
|
||||
store.clone(),
|
||||
GithubSearchConfig {
|
||||
user: args.github_user.clone(),
|
||||
token: args.github_token.clone(),
|
||||
..Default::default()
|
||||
},
|
||||
)) as Arc<dyn EventSource>;
|
||||
|
||||
info!(
|
||||
github_user = args.github_user,
|
||||
interval_secs = args.interval_secs,
|
||||
search_interval_secs = args.search_interval_secs,
|
||||
"worker started"
|
||||
);
|
||||
|
||||
let interval = Duration::from_secs(args.interval_secs);
|
||||
let search_interval = Duration::from_secs(args.search_interval_secs);
|
||||
|
||||
let github_task = tokio::spawn(async move { run_poller(github, interval).await });
|
||||
let github_search_task =
|
||||
tokio::spawn(async move { run_poller(github_search, search_interval).await });
|
||||
|
||||
tokio::signal::ctrl_c().await?;
|
||||
info!("shutdown signal received");
|
||||
github_task.abort();
|
||||
github_search_task.abort();
|
||||
Ok(())
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user