vuln-arena
The arena that ranks the agents that find the bugs.
A live, public head-to-head momentum + capability matrix for AI-powered autonomous vulnerability finders: Big Sleep (Google), Aardvark (OpenAI), Daybreak (OpenAI), MDASH (Microsoft), RAMPART (Microsoft, open-source), Clarity (Microsoft, open-source), CodeMender (Google DeepMind), Claude Code Security Review (Anthropic), PyRIT (Microsoft Azure Red Team), garak (NVIDIA), promptfoo, Dioptra (NIST), Agent Governance Toolkit (Microsoft), Mindfort, and friends.
- URL: https://holyai.me/vuln-arena/
- Stack: Node.js 20, Express 4, better-sqlite3 (WAL), node-cron, helmet, compression, express-rate-limit
- License: MIT (source), CC0 (data — it's all already public)
- Auth: none. Every endpoint is public. No
/admin. No login.
What problem is this solving?
May 2026 is the month every major AI lab shipped an in-house AI offensive security agent, and CISOs are being asked "which one should we evaluate first?" with no comparison table. The vendor landing pages are puff pieces. The awesome-lists are static markdown. vuln-arena fills the gap with live data — GitHub vitals, NVD CVE-credit counts, HN buzz, and vendor blog cadence — refreshed every 6 hours.
Data sources (real, public, no mocks)
Every numeric field shown to the user is fetched live from a public API. No hardcoded star counts. No seed CVEs. No Math.random() jitter. If a source fails, the field is left null and the source is marked degraded.
| Source | URL pattern | Refresh |
|---|---|---|
| GitHub REST repo | https://api.github.com/repos/{owner}/{repo} | every 6h |
| GitHub latest release | https://api.github.com/repos/{owner}/{repo}/releases/latest | every 6h |
| GitHub commits | https://api.github.com/repos/{owner}/{repo}/commits?per_page=1 | every 6h |
| GitHub contributors | https://api.github.com/repos/{owner}/{repo}/contributors?per_page=100 | every 24h |
| NVD CVE 2.0 API | https://services.nvd.nist.gov/rest/json/cves/2.0?keywordSearch={kw} | every 12h |
| HN Algolia search | https://hn.algolia.com/api/v1/search?query={kw}&tags=story | every 12h |
| Vendor blog index pages (HTML) | OpenAI Safety, Microsoft Security Blog (AI/ML tag), Google blog, Anthropic news, NVIDIA developer blog, NIST AI news, DeepMind blog, promptfoo blog, Mindfort blog | every 24h |
Each agent has its own keyword list (in manifest.js). For example, Big Sleep's keywords are ["Big Sleep", "Project Naptime", "Google Project Zero AI"] — these are passed to NVD keywordSearch and HN Algolia query, and they're checked against blog link titles.
Endpoints
All under BASE_PATH=/vuln-arena.
GET /vuln-arena/— SPAGET /vuln-arena/health—{ok:true, ts}— auth-free smoke checkGET /vuln-arena/api/stats— totals + source healthGET /vuln-arena/api/agents?q=&class=&vendor=&oss=&sort=&limit=&offset=— listGET /vuln-arena/api/agents/:slug— full detail incl. CVE credits + announcementsGET /vuln-arena/api/agents/:slug/cves— CVE credits for one agentGET /vuln-arena/api/agents/:slug/announcements— HN + blog mentionsGET /vuln-arena/api/agents/:slug/timeline— unified time-ordered eventsGET /vuln-arena/api/agents/:slug/snapshots— daily snapshots (stars / CVE count / mentions)GET /vuln-arena/api/agents/:slug/badge.svg— README badgeGET /vuln-arena/api/capabilities— capability-axis breakdownGET /vuln-arena/api/vendors— vendor countsGET /vuln-arena/api/feed.json— JSON Feed v1.1GET /vuln-arena/api/rss.xml— RSS 2.0POST /vuln-arena/api/refresh— manual rediscovery trigger (rate-limited 1/min/IP)
Capability axes (9)
open_source, finds_vulns, patches, red_team, fuzz_harness, prompt_injection_scan, runtime_governance, ci_integratable, public_credits.
Plus a categorical class ∈ {zero_day_finder, patch_agent, red_team_harness, runtime_guard, prompt_injection_scanner, hybrid}.
Run locally
cp .env.example .env
# (optional but recommended) drop a GitHub PAT into GITHUB_TOKEN
npm install
npm start
# → http://localhost:4874/vuln-arena/
On a cold database the server will run a 60-second one-shot bootstrap fetch before serving requests; the rest is filled in by cron.
Environment variables
| Var | Default | Purpose |
|---|---|---|
| PORT | 4874 | listen port |
| NODE_ENV | production | runtime mode |
| BASE_PATH | /vuln-arena | URL mount prefix |
| GITHUB_TOKEN | (unset) | optional PAT; 60→5000 rate-limit lift |
| NVD_API_KEY | (unset) | optional NVD key; 5→50 req/30s |
| BOOTSTRAP_ON_START | 1 | run cold-start fetches |
| DB_PATH | ./data/vuln-arena.db | sqlite location |
| BUILD_SHA | dev | shown in footer |
Adding a new agent
- Append an entry to
AGENTSinmanifest.js— fill inslug,display_name,vendor,homepage,github(ornull),class, the 9capabilitiesbooleans,search_keywords,blog_url,tagline. - Restart the server. Next boot will sync the manifest into the DB and the next cron tick will hydrate live vitals.
- Optionally hit
POST /api/refreshto refresh immediately.
Honest non-goals
- No login / no accounts / no submissions.
- No email digest (RSS only — sufficient).
- No prediction or "winner" scoring beyond raw counts (readers infer).
- No paywalled / authenticated scraping.
- No ML — pure substring keyword match.
Acknowledgements
Source list inspired by recent disclosures from Microsoft Security, OpenAI, Google DeepMind, Google Project Zero, Anthropic, NVIDIA, NIST, and promptfoo. All data linked back to its primary source — vuln-arena is a directory, not a re-publisher.