agent-bloat
Live, public PR-size verbosity leaderboard for AI coding agents on public GitHub.
Answers a single question with real data: how big are the merged pull requests authored by each major AI coding agent on public GitHub?
Tracked agents (May 2026): Claude Code, GitHub Copilot, Cursor, OpenAI Codex, Devin, Aider, Gemini, OpenAI (generic).
This is one of the Holy AI product gallery dashboards.
What it shows
- Leaderboard — per-agent avg / median / p90 / p99 lines changed per merged PR, avg files, and the "bloat vs human" ratio in the same repos.
- Distribution — histogram of PR sizes per agent (buckets
0-10 / 11-50 / 51-100 / 101-300 / 301-1000 / 1001-3000 / 3001+). - Hall of Fame — top N largest merged PRs in the window, one per card with link out to GitHub.
- By Language & Repo — agent × language matrix and the top 20 AI-PR-heavy repos with the per-repo bloat-vs-human ratio.
- Trends — weekly avg LOC per agent over the last 90 days.
- SVG badge —
GET /agent-bloat/api/badge?agent=<slug>returns a shields-style SVG you can drop into a README.
Where the data comes from (no mocks, no seeds)
Every numeric value is derived at runtime from public endpoints on api.github.com:
| Endpoint | Used for | Cadence |
|---|---|---|
| GET /search/commits?q="Co-Authored-By: <agent>" committer-date:>=YYYY-MM-DD | Discovering recent commits attributed to each agent | every 30 minutes (/30 *) |
| GET /repos/{owner}/{repo}/commits/{sha}/pulls | Resolving a commit SHA to its merged PR | cached forever per SHA |
| GET /repos/{owner}/{repo}/pulls/{number} | Fetching additions, deletions, changed_files, language for one PR | cached forever once merged |
| GET /search/issues?q=repo:<r> is:pr is:merged -"Co-Authored-By:" | Sampling human-baseline PRs in the same repos | top AI-PR repos each refresh |
| GET /rate_limit | Surfacing the GitHub rate-limit counter in /health | once per refresh |
If a refresh discovers zero merged PRs for an agent, that row is left empty and the refresh_log records status: "empty". The app does not invent numbers.
Co-Authored-By: trailer matching is case-insensitive substring. Where multiple matchers overlap (e.g. Codex is a more specific case of OpenAI), the more specific agent wins, so OpenAI Codex commits are not double-counted under the generic OpenAI bucket.
API surface (all public, no auth)
GET /agent-bloat/health # ok, version, prs_indexed, last_refresh, ...
GET /agent-bloat/api/leaderboard?window=7d|30d|all
GET /agent-bloat/api/distribution?agent=<slug>&window=...
GET /agent-bloat/api/hall-of-fame?window=...&limit=10
GET /agent-bloat/api/by-language?window=...
GET /agent-bloat/api/by-repo?window=...&limit=20
GET /agent-bloat/api/trend?agent=<slug>&days=90
GET /agent-bloat/api/recent?agent=<slug>&limit=50
GET /agent-bloat/api/badge?agent=<slug> # SVG
GET /agent-bloat/api/agents
GET /agent-bloat/api/refresh-log?limit=20
GET /agent-bloat/api/rate-limit
POST /agent-bloat/api/refresh # public, idempotent
There is no authentication anywhere in this app. The optional GITHUB_TOKEN env var is used outbound to api.github.com only — it lifts the unauthenticated 60 req/h limit up to the standard 5,000 req/h.
Embeddable badge

Available slugs: claude, copilot, cursor, codex, devin, aider, gemini, openai.
Stack
- Node.js 18+, Express 4
better-sqlite3in WAL modenode-cronfor the 30-minute refreshhelmet+compression- Vanilla JS SPA — no frameworks, no CDN, no Tailwind, no Chart.js (charts are inline SVG)
Run locally
cp .env.example .env
npm install
npm start
# open http://localhost:4786/agent-bloat/
Optional but recommended:
echo 'GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx' >> .env
The boot sequence does:
- Open
data/agent-bloat.db(WAL mode), apply schema, seed the staticagentsidentity rows. - Listen on
PORT(default 4786) underBASE_PATH(default/agent-bloat). - After 3 s, run the first refresh: walk
/search/commitsfor each agent, resolve PRs, sample the human baseline. - Schedule
/30 *for subsequent refreshes.
Manual one-off refresh
npm run refresh
…or from a running instance:
curl -X POST http://localhost:4786/agent-bloat/api/refresh
File layout
agent-bloat/
├── SPEC.md
├── README.md (this file)
├── CLAUDE.md
├── package.json
├── .env.example
├── server.js express bootstrap + cron
├── db.js better-sqlite3 (WAL) + schema + prepared stmts
├── lib/
│ ├── agents.js static agent table + fingerprint attribution
│ ├── github.js fetch wrapper with pacing + rate-limit awareness
│ └── stats.js median / percentile / histogram / summarise
├── fetchers/
│ ├── search.js /search/commits walker per agent
│ ├── prs.js sha → PR resolution + diff fetch + human-baseline sweep
│ └── refresh.js orchestrator (boot + cron entry)
├── routes/
│ ├── api.js all /api/* endpoints
│ ├── badge.js SVG badge generator
│ └── pages.js SPA index route
└── public/
├── index.html SPA shell + 5 tabs + window toggle
├── app.js tab routing, fetches, inline-SVG charts
└── style.css dark theme tokens
Differentiation
Sister products in the Holy AI gallery:
vibeindex— counts AI-coauthored commits. agent-bloat measures size.shipboard— leaderboard of agents by PRs merged. agent-bloat measures how big each one is.slop-lens— per-snippet quality score. agent-bloat is per-PR diff-size aggregates across all of GitHub.token-lens— forensic per-session token-waste (upload-based). agent-bloat is upload-less, public.
License
Internal Holy AI project. Source closed by default.