spec-rot
A public leaderboard scoring open-source repos on how AI-implementable their issues are.
The bottleneck isn't the AI — it's the spec. Every Cursor/Claude PR that stalls on "what exactly do you mean?" is a vague-issue tax. spec-rot runs each repo's open issues through a Claude rubric (six dimensions, 0–10) and ranks repos by how readable they are to an autonomous coding agent.
What it does
For each tracked repo:
- Pull the 25 most-recently-updated open issues via the GitHub REST API (PRs filtered out).
- Score each issue through Claude Haiku 4.5 against six rubric dimensions:
- - acceptance_criteria — explicit pass/fail conditions?
- - ambiguity — precise vs. hand-wavy?
- - reproduction — repro steps for bugs / trigger condition for features?
- - defined_terms — names files, APIs, identifiers?
- - scope_clarity — bounded, single change?
- - testable_outcome — verifiable by automation?
- Aggregate to a repo-level mean 0–10 score plus per-dimension averages.
- Surface top/bottom 3 issues, an SVG share card, and a per-repo audit endpoint.
A daily cron at 03:00 UTC refreshes the 20 seed repos. Only changed-content issues (by sha256(title + body)) get re-scored, so re-runs are cheap. The public audit endpoint runs the same pipeline on any user-submitted GitHub URL.
Data sources
| Source | URL | Refresh |
|---|---|---|
| GitHub Issues REST | https://api.github.com/repos/{owner}/{name}/issues?state=open&sort=updated&per_page=25 | Seed repos: daily 03:00 UTC. Audits: on submit. |
| GitHub Repo REST (stars, description) | https://api.github.com/repos/{owner}/{name} | Same as issues. |
| Claude Haiku 4.5 | https://api.anthropic.com/v1/messages (model claude-haiku-4-5-20251001) — or https://openrouter.ai/api/v1/chat/completions (model anthropic/claude-haiku-4.5) | Per-issue, on demand for any issue still in score_status='pending'. |
GitHub auth: set GITHUB_TOKEN (PAT, public-repo read scope) for 5000 req/h. Unauth fallback is 60 req/h — works for the seed set on first run but cron will struggle. Score-provider auth: set ANTHROPIC_API_KEY (direct) or OPENROUTER_API_KEY (platform default — same model). If both are set, Anthropic direct wins. Without either, issues are fetched but stay pending until a key is configured.
No mock, no synthetic data. If both APIs are unreachable the leaderboard simply shows whatever is in the SQLite DB with a "stale" badge.
Endpoints
All under /spec-rot:
| Method | Path | What it returns |
|---|---|---|
| GET | /health | {ok:true} (HTTP 200, no auth). |
| GET | / | SPA shell. |
| GET | /api/leaderboard | Array of {full_name, score, issues_scored, stars, last_fetched_at, stale, dimensions, last_error}, sorted by score desc. |
| GET | /api/repo/:owner/:name | {full_name, description, stars, score, dimension_averages, issues_scored, issues_total, errors, last_fetched_at, best:[3], worst:[3]}. |
| GET | /api/repo/:owner/:name/issues | Full issue list with scores. Query ?sort=score_asc\|score_desc. |
| GET | /api/issue/:owner/:name/:number | {title, body, html_url, labels, score, dimensions, rationale, score_status, scored_at}. |
| POST | /api/audit | Body {url}. Validates github.com/{owner}/{name}, rate-limits 1/IP/10min, returns {id, repo_full_name} and triggers async fetch+score. |
| GET | /api/audit/:id | {status, progress:{fetched, scored, total}, repo_full_name, error}. Status flow: queued → fetching → scoring → done (or error). |
| GET | /api/stats | {repos, repos_scored, issues_scored, issues_total, issues_error, last_refresh_at, total_calls_today}. |
| GET | /card/:owner/:name.svg | 1200×630 SVG, image/svg+xml, 1h cache. |
Run locally
Requires Node ≥ 22.
npm install
cp .env.example .env
# edit .env — at minimum set GITHUB_TOKEN to avoid 60-req/h limits
# and ONE of (ANTHROPIC_API_KEY, OPENROUTER_API_KEY) to enable scoring
# optional one-shot warmup before booting (fetches+scores all 20 seed repos)
node warmup.js
# boot
PORT=4807 node server.js
# → http://localhost:4807/spec-rot/
```
On first boot, if no scored issues exist yet, the server kicks off a background warmup automatically. Set SPEC_ROT_NO_INITIAL_REFRESH=1 to skip.
Operational notes
- DB: SQLite (WAL). One file:
./spec-rot.db(or$DB_PATH). Schema indb.js. Foreign keys on; cascades on repo delete. - Cost: 20 repos × 25 issues × ~$0.001 per Haiku call ≈ $0.50 per full refresh. Incremental runs touch only changed issues.
- Rate limit:
/api/auditis gated to 1 submission per IP per 10 minutes via an in-memory token bucket. Public, no API keys, no accounts. - Failures: GitHub failure → repo marked with
last_errorand skipped; UI shows a stale pill after 36h. LLM failure → 1 retry after 2s for 429/5xx; hard fail marks the issuescore_status='error'and excludes it from the aggregate. - Scope: the score is on the spec, not the code. No PR/commit scoring, no fix suggestions, no historical charts, no auth, no private repos. See
SPEC.md§6.
Deviation from SPEC
SPEC.md §3 names the Anthropic Messages API direct. The platform vault standard is OpenRouter; this build supports both (Anthropic if ANTHROPIC_API_KEY is set, else OpenRouter via OPENROUTER_API_KEY) and uses the same claude-haiku-4-5 family in both paths. The deploy manifest declares OPENROUTER_API_KEY to match the platform.