agents-md-radar
Live leaderboard of agent-instruction files harvested from popular GitHub repos. CLAUDE.md · AGENTS.md · .cursorrules · .windsurfrules · .clinerules · .aiderrules · GEMINI.md · .github/copilot-instructions.md
In May 2026 the open-source ecosystem is flooded with AI coding-agent instruction files — every popular repo has its own CLAUDE.md or AGENTS.md. There is no single page that says: "here are the best examples right now, from the most popular repos, refreshed hourly."
agents-md-radar is that page.
What it does
- Crawls GitHub via the GitHub Search API (rotating 6 query buckets every hour).
- For every discovered repo, fetches the raw contents of 8 known instruction-file names from
raw.githubusercontent.com(refreshed every 30 minutes, oldest-checked-first). - Extracts per-file patterns: line count, section structure, mentioned models (Opus / Sonnet / GPT / Gemini), presence of test / security / build sections, MCP mentions, imperative density.
- Serves a dark, dependency-free dashboard with a leaderboard sorted by repo stars.
Data sources (no mock / seed data)
| Source | Endpoint | Auth | Cadence |
|--------|----------|------|---------|
| GitHub Search — repos | https://api.github.com/search/repositories?... | Optional GITHUB_TOKEN | every 60 min |
| GitHub Search — code | https://api.github.com/search/code?q=filename:... | Optional GITHUB_TOKEN | every 60 min |
| GitHub repo metadata | https://api.github.com/repos/{owner}/{repo} | Optional GITHUB_TOKEN | on first sighting of each repo |
| GitHub raw file | https://raw.githubusercontent.com/{owner}/{repo}/{branch}/{file} | None | every 30 min |
Without GITHUB_TOKEN the API limit is 60 req/hr unauthenticated — adequate for the default cadence. Setting GITHUB_TOKEN raises it to 5,000 req/hr.
Tech stack
- Node.js 20+, Express 4
better-sqlite3(WAL mode) for storagenode-cronfor the schedulerhelmet,compression- Vanilla JS SPA — no framework, no build step
Quick start
npm install
node server.js
# open http://localhost:4811/agents-md-radar/
On first boot the dashboard is empty for ~5 s; the in-process crawler then issues a real GitHub search and a real raw-file fetch round and populates the DB. After ~2 minutes you should see a populated leaderboard.
Endpoints
All public. No authentication anywhere.
| Method | Path | Description |
|--------|------|-------------|
| GET | /agents-md-radar/ | SPA dashboard |
| GET | /agents-md-radar/health | Liveness + counts + last-crawl timestamps |
| GET | /agents-md-radar/api/files | Leaderboard. Query: filename, min_stars, q, limit, offset |
| GET | /agents-md-radar/api/files/:id | Full content + extracted patterns |
| GET | /agents-md-radar/api/repos | Repos with at least one rule file |
| GET | /agents-md-radar/api/stats | Aggregate counts + pattern positivity rates |
| GET | /agents-md-radar/api/crawls | Recent crawl runs |
| POST | /agents-md-radar/api/crawl | Manually trigger one search + refresh tick |
Schema highlights
repos— one row per GitHub repo we've ever seen, plus current star/fork counts (refreshed when re-seen).rule_files— one row per (repo, filename). Stores capped content (200 KB), sha256, line/word counts, http_status. Status 404 rows tell us "we checked, file isn't there" and we skip them for 7 days.patterns— one row per (rule_file, key). Re-computed every time content changes.crawl_runs— every search/file-refresh tick, with timing, status, rate-limit remaining.
All tables use SQLite WAL mode.
License
MIT.
---
Built by Cowork (Claude Opus 4.7) as a daily R&D experiment for the holyai.me showcase.