← back to gallery

Vuln-Arena

Live momentum + capability matrix for AI-powered autonomous vulnerability-finder agents.

aiai-securityagentsvulnerabilitiesleaderboardcvered-teamingopen-source
Open product ↗

vuln-arena

The arena that ranks the agents that find the bugs.

A live, public head-to-head momentum + capability matrix for AI-powered autonomous vulnerability finders: Big Sleep (Google), Aardvark (OpenAI), Daybreak (OpenAI), MDASH (Microsoft), RAMPART (Microsoft, open-source), Clarity (Microsoft, open-source), CodeMender (Google DeepMind), Claude Code Security Review (Anthropic), PyRIT (Microsoft Azure Red Team), garak (NVIDIA), promptfoo, Dioptra (NIST), Agent Governance Toolkit (Microsoft), Mindfort, and friends.

What problem is this solving?

May 2026 is the month every major AI lab shipped an in-house AI offensive security agent, and CISOs are being asked "which one should we evaluate first?" with no comparison table. The vendor landing pages are puff pieces. The awesome-lists are static markdown. vuln-arena fills the gap with live data — GitHub vitals, NVD CVE-credit counts, HN buzz, and vendor blog cadence — refreshed every 6 hours.

Data sources (real, public, no mocks)

Every numeric field shown to the user is fetched live from a public API. No hardcoded star counts. No seed CVEs. No Math.random() jitter. If a source fails, the field is left null and the source is marked degraded.

| Source | URL pattern | Refresh |
|---|---|---|
| GitHub REST repo | https://api.github.com/repos/{owner}/{repo} | every 6h |
| GitHub latest release | https://api.github.com/repos/{owner}/{repo}/releases/latest | every 6h |
| GitHub commits | https://api.github.com/repos/{owner}/{repo}/commits?per_page=1 | every 6h |
| GitHub contributors | https://api.github.com/repos/{owner}/{repo}/contributors?per_page=100 | every 24h |
| NVD CVE 2.0 API | https://services.nvd.nist.gov/rest/json/cves/2.0?keywordSearch={kw} | every 12h |
| HN Algolia search | https://hn.algolia.com/api/v1/search?query={kw}&tags=story | every 12h |
| Vendor blog index pages (HTML) | OpenAI Safety, Microsoft Security Blog (AI/ML tag), Google blog, Anthropic news, NVIDIA developer blog, NIST AI news, DeepMind blog, promptfoo blog, Mindfort blog | every 24h |

Each agent has its own keyword list (in manifest.js). For example, Big Sleep's keywords are ["Big Sleep", "Project Naptime", "Google Project Zero AI"] — these are passed to NVD keywordSearch and HN Algolia query, and they're checked against blog link titles.

Endpoints

All under BASE_PATH=/vuln-arena.

Capability axes (9)

open_source, finds_vulns, patches, red_team, fuzz_harness, prompt_injection_scan, runtime_governance, ci_integratable, public_credits.

Plus a categorical class{zero_day_finder, patch_agent, red_team_harness, runtime_guard, prompt_injection_scanner, hybrid}.

Run locally

cp .env.example .env
# (optional but recommended) drop a GitHub PAT into GITHUB_TOKEN
npm install
npm start
# → http://localhost:4874/vuln-arena/

On a cold database the server will run a 60-second one-shot bootstrap fetch before serving requests; the rest is filled in by cron.

Environment variables

| Var | Default | Purpose |
|---|---|---|
| PORT | 4874 | listen port |
| NODE_ENV | production | runtime mode |
| BASE_PATH | /vuln-arena | URL mount prefix |
| GITHUB_TOKEN | (unset) | optional PAT; 60→5000 rate-limit lift |
| NVD_API_KEY | (unset) | optional NVD key; 5→50 req/30s |
| BOOTSTRAP_ON_START | 1 | run cold-start fetches |
| DB_PATH | ./data/vuln-arena.db | sqlite location |
| BUILD_SHA | dev | shown in footer |

Adding a new agent

  1. Append an entry to AGENTS in manifest.js — fill in slug, display_name, vendor, homepage, github (or null), class, the 9 capabilities booleans, search_keywords, blog_url, tagline.
  2. Restart the server. Next boot will sync the manifest into the DB and the next cron tick will hydrate live vitals.
  3. Optionally hit POST /api/refresh to refresh immediately.

Honest non-goals

Acknowledgements

Source list inspired by recent disclosures from Microsoft Security, OpenAI, Google DeepMind, Google Project Zero, Anthropic, NVIDIA, NIST, and promptfoo. All data linked back to its primary source — vuln-arena is a directory, not a re-publisher.