bug-bot-bench
Live leaderboard of CVEs discovered by AI vulnerability hunters. Big Sleep, OpenAI Daybreak, ZeroPath, XBOW, AISLE, Team Atlantis, Naptime, PentestGPT, Metasploit-AI, Auto-Bounty — who's actually finding the bugs?
A public dashboard that scores every CVE in the National Vulnerability Database that is publicly credited to an AI-driven vulnerability-discovery system. Every CVE shown was fetched live from public sources — no mocks, no seed data, no presets.
Live: https://holyai.me/bug-bot-bench/
Stack: Node.js 20 · Express · better-sqlite3 (WAL) · node-cron · helmet · compression
Port: 4794 · BASE_PATH: /bug-bot-bench · Auth: none — every endpoint is public.
Why now
- 2026-05-10: OpenAI launches Daybreak and GPT-5.5-Cyber for AI-powered vulnerability detection and patch validation.
- 2026-05-11: First confirmed AI-built zero-day exploit (CVE-2026-42897) used in the wild.
- 2024-2026: Google's Big Sleep has 20+ public CVE credits across SQLite, GStreamer, libxml2, and more.
- Early 2026: AISLE discloses 12/12 zero-days in OpenSSL using AI logic analysis.
- 2025-2026: XBOW becomes a top-100 ranked HackerOne hunter — fully autonomous.
The question every security desk now asks: "Of the CVEs published this month, how many were AI-found, and by which system?" Nobody answers it publicly. bug-bot-bench does.
What it shows
- Hunter leaderboard — sortable table of every AI bug hunter we track, with CVE count, severity breakdown (Crit / High / Med / Low), and the latest find.
- Live Feed — most recently detected attributions across all hunters, with severity pills, hunter chips, and time-ago timestamps.
- Hall of Fame — highest-CVSS AI-attributed CVEs in the window, card-based, click-out to the NVD detail page.
- Trends — 90-day cumulative line chart per hunter, hand-rolled SVG (no chart library).
- Methodology — full list of upstream sources, refresh cadences, scoring weights, and the detector patterns for every hunter. Plus the last-fetch table so you can see the pipeline running.
Tracked AI bug hunters (May 2026)
| Slug | Name | Operator |
|---|---|---|
| big-sleep | Big Sleep | Google DeepMind + Project Zero |
| daybreak | Daybreak (GPT-5.5-Cyber) | OpenAI |
| zeropath | ZeroPath | ZeroPath, Inc. |
| xbow | XBOW | XBOW Engineering |
| aisle | AISLE | AISLE Security |
| atlantis | Team Atlantis | DARPA AIxCC finalist |
| pentestgpt | PentestGPT | open-source autopentest |
| naptime | Naptime | Google Project Zero (Big Sleep predecessor) |
| metasploit-ai | Metasploit AI | Rapid7 |
| auto-bounty | Auto-Bounty | autonomous bug-bounty collective |
The roster is editable via data/hunters.json and reloaded on boot. It is detector configuration — the keyword + domain patterns we look for in each CVE record — not CVE data. Every CVE and every severity in the database is fetched live from NVD.
Data sources (live, public — no mocks, no seeds)
| Source | URL | Refresh | Auth |
|---|---|---|---|
| NVD CVE API 2.0 | https://services.nvd.nist.gov/rest/json/cves/2.0 | every 30 min (per-hunter keywordSearch), every 2 h (lastModStartDate window) | optional NVD_API_KEY (5/30s → 50/30s) |
| GitHub Security Advisories | https://api.github.com/advisories | every 4 h | optional GITHUB_TOKEN (60/h → 5000/h) |
| OSV.dev | https://api.osv.dev/v1/vulns/{id} | on-demand per CVE | none |
The first NVD sweep fires ~8 s after boot. CVEs that match zero hunter patterns are not persisted — bug-bot-bench is a focused mirror of AI-attributed CVEs, not a general NVD clone.
Attribution scoring
For each fetched CVE we sum match weights across the hunter roster:
| Signal | Weight |
|---|---|
| Reference URL hostname/path matches a hunter's domain_patterns | +1.0 |
| English description contains a hunter's keyword_patterns (substring, case-insensitive) | +0.6 |
| NVD cve.credits[].value matches a hunter's keyword_patterns | +1.2 |
A CVE is admitted to the leaderboard when any single hunter scores ≥ 0.6. Multiple hunters can match a single CVE — each attribution is stored separately.
API
All endpoints under /bug-bot-bench. No authentication.
| Method | Path | Description |
|---|---|---|
| GET | /bug-bot-bench/health | Liveness + DB row counts + last fetch timestamp |
| GET | /bug-bot-bench/api/leaderboard?window=7d\|30d\|90d\|all | Hunters ranked by attributed CVE count in window |
| GET | /bug-bot-bench/api/hunter/:slug?limit=25 | Single hunter detail + last N attributed CVEs |
| GET | /bug-bot-bench/api/cves?hunter=&severity=&q=&limit=&offset= | Paged search across attributed CVEs |
| GET | /bug-bot-bench/api/cve/:id | Full CVE record with refs and attributions |
| GET | /bug-bot-bench/api/hall-of-fame?window=&limit= | Top CVEs by CVSS in window |
| GET | /bug-bot-bench/api/trend?days=90 | Per-hunter daily attribution counts (line chart) |
| GET | /bug-bot-bench/api/live-feed?limit=20 | Most recent detections across all hunters |
| GET | /bug-bot-bench/api/methodology | Machine-readable methodology + last-fetch log |
| GET | /bug-bot-bench/api/badge?hunter=<slug> | Shields-style SVG badge for embedding |
| POST | /bug-bot-bench/api/refresh?source=keyword-sweep\|recent-window\|ghsa-sync\|daily-snapshot | Manual trigger (60 s cooldown per source) |
Local run
cp .env.example .env # optional — keys are recommended but not required
npm install
node server.js
Then open http://localhost:4794/bug-bot-bench/.
The first keyword-sweep cron fires ~8 s after boot. The dashboard is live immediately — empty states show until the first hunter match lands.
Cron schedule
| Cron expression | Job | Purpose |
|---|---|---|
| /30 | keyword-sweep | NVD keywordSearch for every hunter keyword, round-robin |
| 0 /2 | recent-window | NVD lastModStartDate=now-48h sweep, attribute all matches |
| 15 /4 | ghsa-sync | Pull last 100 GitHub advisories, cross-match CVE IDs |
| 5 0 | daily-snapshot | Roll up cve_attribution into daily_snapshot for the trend chart |
Non-goals
- No PoC exploit listings — credit and severity only.
- No "GPT-4 vs Claude" framing — we score hunter systems, not raw LLMs.
- No subscriptions or email digests in MVP.
Files
bug-bot-bench/
server.js Express bootstrap + cron registration
db.js better-sqlite3 schema + prepared statements
package.json
.env.example
data/hunters.json Hunter detector configuration
fetchers/
nvd.js NVD CVE API 2.0 client (rate-limit aware)
ghsa.js GitHub Security Advisories client
osv.js OSV.dev per-CVE enrichment
lib/
attribute.js Detection / scoring logic
severity.js CVSS picker (v3.1 > v3.0 > v2)
cron.js Job orchestration
badge.js Shields-style SVG renderer
routes/
api.js /api/* endpoints
health.js /health
public/
index.html
app.js Vanilla JS SPA
style.css Dark theme, hand-rolled CSS
License
MIT.