bug-bot-bench

Live leaderboard of CVEs discovered by AI vulnerability hunters. Big Sleep, OpenAI Daybreak, ZeroPath, XBOW, AISLE, Team Atlantis, Naptime, PentestGPT, Metasploit-AI, Auto-Bounty — who's actually finding the bugs?

A public dashboard that scores every CVE in the National Vulnerability Database that is publicly credited to an AI-driven vulnerability-discovery system. Every CVE shown was fetched live from public sources — no mocks, no seed data, no presets.

Live: https://holyai.me/bug-bot-bench/
Stack: Node.js 20 · Express · better-sqlite3 (WAL) · node-cron · helmet · compression
Port: 4794 · BASE_PATH: /bug-bot-bench · Auth: none — every endpoint is public.

Why now

2026-05-10: OpenAI launches Daybreak and GPT-5.5-Cyber for AI-powered vulnerability detection and patch validation.
2026-05-11: First confirmed AI-built zero-day exploit (CVE-2026-42897) used in the wild.
2024-2026: Google's Big Sleep has 20+ public CVE credits across SQLite, GStreamer, libxml2, and more.
Early 2026: AISLE discloses 12/12 zero-days in OpenSSL using AI logic analysis.
2025-2026: XBOW becomes a top-100 ranked HackerOne hunter — fully autonomous.

The question every security desk now asks: "Of the CVEs published this month, how many were AI-found, and by which system?" Nobody answers it publicly. bug-bot-bench does.

What it shows

Hunter leaderboard — sortable table of every AI bug hunter we track, with CVE count, severity breakdown (Crit / High / Med / Low), and the latest find.
Live Feed — most recently detected attributions across all hunters, with severity pills, hunter chips, and time-ago timestamps.
Hall of Fame — highest-CVSS AI-attributed CVEs in the window, card-based, click-out to the NVD detail page.
Trends — 90-day cumulative line chart per hunter, hand-rolled SVG (no chart library).
Methodology — full list of upstream sources, refresh cadences, scoring weights, and the detector patterns for every hunter. Plus the last-fetch table so you can see the pipeline running.

Tracked AI bug hunters (May 2026)

| Slug | Name | Operator |
|---|---|---|
| big-sleep | Big Sleep | Google DeepMind + Project Zero |
| daybreak | Daybreak (GPT-5.5-Cyber) | OpenAI |
| zeropath | ZeroPath | ZeroPath, Inc. |
| xbow | XBOW | XBOW Engineering |
| aisle | AISLE | AISLE Security |
| atlantis | Team Atlantis | DARPA AIxCC finalist |
| pentestgpt | PentestGPT | open-source autopentest |
| naptime | Naptime | Google Project Zero (Big Sleep predecessor) |
| metasploit-ai | Metasploit AI | Rapid7 |
| auto-bounty | Auto-Bounty | autonomous bug-bounty collective |

The roster is editable via data/hunters.json and reloaded on boot. It is detector configuration — the keyword + domain patterns we look for in each CVE record — not CVE data. Every CVE and every severity in the database is fetched live from NVD.

Data sources (live, public — no mocks, no seeds)

| Source | URL | Refresh | Auth |
|---|---|---|---|
| NVD CVE API 2.0 | https://services.nvd.nist.gov/rest/json/cves/2.0 | every 30 min (per-hunter keywordSearch), every 2 h (lastModStartDate window) | optional NVD_API_KEY (5/30s → 50/30s) |
| GitHub Security Advisories | https://api.github.com/advisories | every 4 h | optional GITHUB_TOKEN (60/h → 5000/h) |
| OSV.dev | https://api.osv.dev/v1/vulns/{id} | on-demand per CVE | none |

The first NVD sweep fires ~8 s after boot. CVEs that match zero hunter patterns are not persisted — bug-bot-bench is a focused mirror of AI-attributed CVEs, not a general NVD clone.

Attribution scoring

For each fetched CVE we sum match weights across the hunter roster:

| Signal | Weight |
|---|---|
| Reference URL hostname/path matches a hunter's domain_patterns | +1.0 |
| English description contains a hunter's keyword_patterns (substring, case-insensitive) | +0.6 |
| NVD cve.credits[].value matches a hunter's keyword_patterns | +1.2 |

A CVE is admitted to the leaderboard when any single hunter scores ≥ 0.6. Multiple hunters can match a single CVE — each attribution is stored separately.

API

All endpoints under /bug-bot-bench. No authentication.

| Method | Path | Description |
|---|---|---|
| GET | /bug-bot-bench/health | Liveness + DB row counts + last fetch timestamp |
| GET | /bug-bot-bench/api/leaderboard?window=7d\|30d\|90d\|all | Hunters ranked by attributed CVE count in window |
| GET | /bug-bot-bench/api/hunter/:slug?limit=25 | Single hunter detail + last N attributed CVEs |
| GET | /bug-bot-bench/api/cves?hunter=&severity=&q=&limit=&offset= | Paged search across attributed CVEs |
| GET | /bug-bot-bench/api/cve/:id | Full CVE record with refs and attributions |
| GET | /bug-bot-bench/api/hall-of-fame?window=&limit= | Top CVEs by CVSS in window |
| GET | /bug-bot-bench/api/trend?days=90 | Per-hunter daily attribution counts (line chart) |
| GET | /bug-bot-bench/api/live-feed?limit=20 | Most recent detections across all hunters |
| GET | /bug-bot-bench/api/methodology | Machine-readable methodology + last-fetch log |
| GET | /bug-bot-bench/api/badge?hunter=<slug> | Shields-style SVG badge for embedding |
| POST | /bug-bot-bench/api/refresh?source=keyword-sweep\|recent-window\|ghsa-sync\|daily-snapshot | Manual trigger (60 s cooldown per source) |

Local run

cp .env.example .env   # optional — keys are recommended but not required
npm install
node server.js

Then open http://localhost:4794/bug-bot-bench/.

The first keyword-sweep cron fires ~8 s after boot. The dashboard is live immediately — empty states show until the first hunter match lands.

Cron schedule

| Cron expression | Job | Purpose |
|---|---|---|
| /30 | keyword-sweep | NVD keywordSearch for every hunter keyword, round-robin |
| 0 /2 | recent-window | NVD lastModStartDate=now-48h sweep, attribute all matches |
| 15 /4 | ghsa-sync | Pull last 100 GitHub advisories, cross-match CVE IDs |
| 5 0 | daily-snapshot | Roll up cve_attribution into daily_snapshot for the trend chart |

Non-goals

No PoC exploit listings — credit and severity only.
No "GPT-4 vs Claude" framing — we score hunter systems, not raw LLMs.
No subscriptions or email digests in MVP.

Files

bug-bot-bench/
  server.js               Express bootstrap + cron registration
  db.js                   better-sqlite3 schema + prepared statements
  package.json
  .env.example
  data/hunters.json       Hunter detector configuration
  fetchers/
    nvd.js                NVD CVE API 2.0 client (rate-limit aware)
    ghsa.js               GitHub Security Advisories client
    osv.js                OSV.dev per-CVE enrichment
  lib/
    attribute.js          Detection / scoring logic
    severity.js           CVSS picker (v3.1 > v3.0 > v2)
    cron.js               Job orchestration
    badge.js              Shields-style SVG renderer
  routes/
    api.js                /api/* endpoints
    health.js             /health
  public/
    index.html
    app.js                Vanilla JS SPA
    style.css             Dark theme, hand-rolled CSS

License

MIT.