agent-bloat

Live, public PR-size verbosity leaderboard for AI coding agents on public GitHub.

Answers a single question with real data: how big are the merged pull requests authored by each major AI coding agent on public GitHub?

Tracked agents (May 2026): Claude Code, GitHub Copilot, Cursor, OpenAI Codex, Devin, Aider, Gemini, OpenAI (generic).

This is one of the Holy AI product gallery dashboards.

What it shows

Leaderboard — per-agent avg / median / p90 / p99 lines changed per merged PR, avg files, and the "bloat vs human" ratio in the same repos.
Distribution — histogram of PR sizes per agent (buckets 0-10 / 11-50 / 51-100 / 101-300 / 301-1000 / 1001-3000 / 3001+).
Hall of Fame — top N largest merged PRs in the window, one per card with link out to GitHub.
By Language & Repo — agent × language matrix and the top 20 AI-PR-heavy repos with the per-repo bloat-vs-human ratio.
Trends — weekly avg LOC per agent over the last 90 days.
SVG badge — GET /agent-bloat/api/badge?agent=<slug> returns a shields-style SVG you can drop into a README.

Where the data comes from (no mocks, no seeds)

Every numeric value is derived at runtime from public endpoints on api.github.com:

| Endpoint | Used for | Cadence |
|---|---|---|
| GET /search/commits?q="Co-Authored-By: <agent>" committer-date:>=YYYY-MM-DD | Discovering recent commits attributed to each agent | every 30 minutes (/30 *) |
| GET /repos/{owner}/{repo}/commits/{sha}/pulls | Resolving a commit SHA to its merged PR | cached forever per SHA |
| GET /repos/{owner}/{repo}/pulls/{number} | Fetching additions, deletions, changed_files, language for one PR | cached forever once merged |
| GET /search/issues?q=repo:<r> is:pr is:merged -"Co-Authored-By:" | Sampling human-baseline PRs in the same repos | top AI-PR repos each refresh |
| GET /rate_limit | Surfacing the GitHub rate-limit counter in /health | once per refresh |

If a refresh discovers zero merged PRs for an agent, that row is left empty and the refresh_log records status: "empty". The app does not invent numbers.

Co-Authored-By: trailer matching is case-insensitive substring. Where multiple matchers overlap (e.g. Codex is a more specific case of OpenAI), the more specific agent wins, so OpenAI Codex commits are not double-counted under the generic OpenAI bucket.

API surface (all public, no auth)

GET  /agent-bloat/health                          # ok, version, prs_indexed, last_refresh, ...
GET  /agent-bloat/api/leaderboard?window=7d|30d|all
GET  /agent-bloat/api/distribution?agent=<slug>&window=...
GET  /agent-bloat/api/hall-of-fame?window=...&limit=10
GET  /agent-bloat/api/by-language?window=...
GET  /agent-bloat/api/by-repo?window=...&limit=20
GET  /agent-bloat/api/trend?agent=<slug>&days=90
GET  /agent-bloat/api/recent?agent=<slug>&limit=50
GET  /agent-bloat/api/badge?agent=<slug>           # SVG
GET  /agent-bloat/api/agents
GET  /agent-bloat/api/refresh-log?limit=20
GET  /agent-bloat/api/rate-limit
POST /agent-bloat/api/refresh                     # public, idempotent

There is no authentication anywhere in this app. The optional GITHUB_TOKEN env var is used outbound to api.github.com only — it lifts the unauthenticated 60 req/h limit up to the standard 5,000 req/h.

Embeddable badge

![claude bloat](https://holyai.me/agent-bloat/api/badge?agent=claude)

Available slugs: claude, copilot, cursor, codex, devin, aider, gemini, openai.

Stack

Node.js 18+, Express 4
better-sqlite3 in WAL mode
node-cron for the 30-minute refresh
helmet + compression
Vanilla JS SPA — no frameworks, no CDN, no Tailwind, no Chart.js (charts are inline SVG)

Run locally

cp .env.example .env
npm install
npm start
# open http://localhost:4786/agent-bloat/

Optional but recommended:

echo 'GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx' >> .env

The boot sequence does:

Open data/agent-bloat.db (WAL mode), apply schema, seed the static agents identity rows.
Listen on PORT (default 4786) under BASE_PATH (default /agent-bloat).
After 3 s, run the first refresh: walk /search/commits for each agent, resolve PRs, sample the human baseline.
Schedule /30 * for subsequent refreshes.

Manual one-off refresh

npm run refresh

…or from a running instance:

curl -X POST http://localhost:4786/agent-bloat/api/refresh

File layout

agent-bloat/
├── SPEC.md
├── README.md          (this file)
├── CLAUDE.md
├── package.json
├── .env.example
├── server.js          express bootstrap + cron
├── db.js              better-sqlite3 (WAL) + schema + prepared stmts
├── lib/
│   ├── agents.js      static agent table + fingerprint attribution
│   ├── github.js      fetch wrapper with pacing + rate-limit awareness
│   └── stats.js       median / percentile / histogram / summarise
├── fetchers/
│   ├── search.js      /search/commits walker per agent
│   ├── prs.js         sha → PR resolution + diff fetch + human-baseline sweep
│   └── refresh.js     orchestrator (boot + cron entry)
├── routes/
│   ├── api.js         all /api/* endpoints
│   ├── badge.js       SVG badge generator
│   └── pages.js       SPA index route
└── public/
    ├── index.html     SPA shell + 5 tabs + window toggle
    ├── app.js         tab routing, fetches, inline-SVG charts
    └── style.css      dark theme tokens

Differentiation

Sister products in the Holy AI gallery:

vibeindex — counts AI-coauthored commits. agent-bloat measures size.
shipboard — leaderboard of agents by PRs merged. agent-bloat measures how big each one is.
slop-lens — per-snippet quality score. agent-bloat is per-PR diff-size aggregates across all of GitHub.
token-lens — forensic per-session token-waste (upload-based). agent-bloat is upload-less, public.

License

Internal Holy AI project. Source closed by default.