agents-md-radar

Live leaderboard of agent-instruction files harvested from popular GitHub repos. CLAUDE.md · AGENTS.md · .cursorrules · .windsurfrules · .clinerules · .aiderrules · GEMINI.md · .github/copilot-instructions.md

In May 2026 the open-source ecosystem is flooded with AI coding-agent instruction files — every popular repo has its own CLAUDE.md or AGENTS.md. There is no single page that says: "here are the best examples right now, from the most popular repos, refreshed hourly."

agents-md-radar is that page.

What it does

Crawls GitHub via the GitHub Search API (rotating 6 query buckets every hour).
For every discovered repo, fetches the raw contents of 8 known instruction-file names from raw.githubusercontent.com (refreshed every 30 minutes, oldest-checked-first).
Extracts per-file patterns: line count, section structure, mentioned models (Opus / Sonnet / GPT / Gemini), presence of test / security / build sections, MCP mentions, imperative density.
Serves a dark, dependency-free dashboard with a leaderboard sorted by repo stars.

Data sources (no mock / seed data)

| Source | Endpoint | Auth | Cadence |
|--------|----------|------|---------|
| GitHub Search — repos | https://api.github.com/search/repositories?... | Optional GITHUB_TOKEN | every 60 min |
| GitHub Search — code | https://api.github.com/search/code?q=filename:... | Optional GITHUB_TOKEN | every 60 min |
| GitHub repo metadata | https://api.github.com/repos/{owner}/{repo} | Optional GITHUB_TOKEN | on first sighting of each repo |
| GitHub raw file | https://raw.githubusercontent.com/{owner}/{repo}/{branch}/{file} | None | every 30 min |

Without GITHUB_TOKEN the API limit is 60 req/hr unauthenticated — adequate for the default cadence. Setting GITHUB_TOKEN raises it to 5,000 req/hr.

Tech stack

Node.js 20+, Express 4
better-sqlite3 (WAL mode) for storage
node-cron for the scheduler
helmet, compression
Vanilla JS SPA — no framework, no build step

Quick start

npm install
node server.js
# open http://localhost:4811/agents-md-radar/

On first boot the dashboard is empty for ~5 s; the in-process crawler then issues a real GitHub search and a real raw-file fetch round and populates the DB. After ~2 minutes you should see a populated leaderboard.

Endpoints

All public. No authentication anywhere.

| Method | Path | Description |
|--------|------|-------------|
| GET | /agents-md-radar/ | SPA dashboard |
| GET | /agents-md-radar/health | Liveness + counts + last-crawl timestamps |
| GET | /agents-md-radar/api/files | Leaderboard. Query: filename, min_stars, q, limit, offset |
| GET | /agents-md-radar/api/files/:id | Full content + extracted patterns |
| GET | /agents-md-radar/api/repos | Repos with at least one rule file |
| GET | /agents-md-radar/api/stats | Aggregate counts + pattern positivity rates |
| GET | /agents-md-radar/api/crawls | Recent crawl runs |
| POST | /agents-md-radar/api/crawl | Manually trigger one search + refresh tick |

Schema highlights

repos — one row per GitHub repo we've ever seen, plus current star/fork counts (refreshed when re-seen).
rule_files — one row per (repo, filename). Stores capped content (200 KB), sha256, line/word counts, http_status. Status 404 rows tell us "we checked, file isn't there" and we skip them for 7 days.
patterns — one row per (rule_file, key). Re-computed every time content changes.
crawl_runs — every search/file-refresh tick, with timing, status, rate-limit remaining.

All tables use SQLite WAL mode.

License

MIT.

---

Built by Cowork (Claude Opus 4.7) as a daily R&D experiment for the holyai.me showcase.