ctf-ai-solvable
Score CTF challenges by how easily a frontier Claude agent one-shots them — so organizers can design AI-resistant puzzles and recruiters know which scoreboards still measure humans.
Submit a CTF challenge (statement + optional file attachments), the server runs Claude in a server-side code_execution_20250522 sandbox loop, looks for a flag, and emits an AI-Trivial Score (0–100, higher = easier for AI) plus a public share card you can drop into Slack/Discord/X.
The board also ingests real upcoming CTFTime events and a real index of CTF writeups harvested from the public ctfs/write-ups-* GitHub repos so any historical challenge can be "Run AI on this" in one click.
---
What it does
POST /api/submit— accept a challenge (title + statement + optional.txt .py .pcap .png .binetc., ≤ 5 files, ≤ 2 MB each), create achallengerow, enqueue arun.- Single-concurrency in-process agent worker drives Claude up to 12 turns / 6 min wall clock, using Anthropic's server-side Python sandbox tool (
code_execution_20250522). Files are uploaded via the Anthropic Files API once and cached. - Flag detection: regex
/(?:flag|FLAG|ctf|CTF|HTB|picoCTF)\{[^}\n]{1,200}\}/against every text + tool-output block. - Score formula (
agent/score.js): solved →100 − 8·log₂(tokens/1k) − 3·(turns−1) − wall_ms/30000, clamped to[1,100]. Unsolved →2·turns, clamped to[0,25]. Error →null. - Board, permalink HTML, and an SVG OG share card (1200×630) — all server-rendered, no external services.
- CTFTime + writeups ingest daily via
node-cronand lazily on first boot if their tables are empty.
Data sources (all real, fetched at runtime)
| Source | URL | Refresh |
|---|---|---|
| CTFTime events API | https://ctftime.org/api/v1/events/?limit=50&start=<now>&finish=<now+120d> | daily 06:00 UTC + on first boot |
| ctfs/write-ups GitHub trees | https://api.github.com/repos/ctfs/write-ups-{2024,2017,2016,2015}/git/trees/master?recursive=1 — falls back through the list until one succeeds | daily 06:15 UTC + on first boot |
| Writeup README contents | https://raw.githubusercontent.com/<repo>/master/<path> | on demand, cached forever in writeup_contents |
| Anthropic Messages API | https://api.anthropic.com/v1/messages with tools:[{type:"code_execution_20250522"}] and betas:[code-execution-2025-05-22, files-api-2025-04-14] | every submit |
| Anthropic Files API | https://api.anthropic.com/v1/files | every submit that has attachments (file_ids cached on the challenge row) |
ANTHROPIC_API_KEY is the only secret needed. No GitHub auth — anonymous rate limit (60/hr) is plenty for one daily tree fetch + occasional README fetches.
HTTP API (all mounted at /ctf-ai-solvable)
| Method | Path | Purpose |
|---|---|---|
| GET | /health | {ok:true} |
| GET | /api/board?limit=&offset=&status=&category=&sort= | Joined challenges + best run, newest first (or by score) |
| GET | /api/board/stats | Counters: total challenges, solved %, avg score, avg wall, last-fetch timestamps |
| POST | /api/submit | multipart/form-data: title, statement, category?, model? (opus/sonnet), files[]?. Returns {run_id, challenge_id, slug} |
| GET | /api/challenges/:slug | {challenge, runs: [...]} |
| GET | /api/runs/:id | Poll target — status, score, tokens, turns, wall_ms, flag, band |
| GET | /api/runs/:id/transcript | Full structured per-turn transcript |
| GET | /api/events?limit= | CTFTime upcoming events (sorted by start time) |
| GET | /api/writeups?q=&category=&limit= | Writeups index, LIKE-search on name/CTF/path |
| POST | /api/writeups/:id/prefill | Lazily fetch the writeup's README from raw.githubusercontent, trim past the "Solution" header, return text + suggested title/category |
| GET | /c/:slug | HTML permalink with og:* and twitter:card meta tags for link-unfurl previews |
| GET | /c/:slug/card.svg | 1200×630 SVG share card, server-rendered, no external fonts |
Run locally
npm install
ANTHROPIC_API_KEY=sk-ant-... PORT=4798 node server.js
# open http://localhost:4798/ctf-ai-solvable/
The server boots immediately; on first boot it lazily pulls CTFTime + writeups so the Events and Writeups tabs are populated before any agent run.
Architecture
server.js express app + cron + bootstrap
config.js env, constants, score-band helpers
db.js better-sqlite3 (WAL), prepared statements
routes/ health · board · challenges · runs · events · writeups
scrapers/ ctftime · writeups (GitHub tree walker)
agent/ runner (Claude loop) · score · flag · files (upload helper)
views/share.js HTML permalink + SVG OG card renderers
public/ vanilla-JS SPA (index.html, app.js, style.css)
data/ SQLite DB + uploaded attachments (gitignored)
- SQLite tables (see
db.js):challenges,runs,events,writeups,writeup_contents,meta. - Agent queue is in-process FIFO with
MAX_CONCURRENT=1to keep$$$$$predictable. /api/submitper-IP rate limit: 5 per hour (in-memory).- Frontend polls
/api/runs/:idevery 2 s while a run is queued/running.
Out of scope
No auth, no payments, no multi-agent, no local bash exec (every Python call goes through Anthropic's sandbox), no live-infra exploitation, no WebSockets, no tests, no Docker. See SPEC.md §6.
Honest limitations
- Submits require a working
ANTHROPIC_API_KEYwithcode-execution-2025-05-22+files-api-2025-04-14betas enabled on the account. Without it, runs finalize asstatus='error'with the message surfaced in the UI. - If a challenge requires
nc challenge.host 1337to a live service, the agent will fail — by design. - Writeup-prefilled statements are heuristically trimmed to remove the solution; manually paste the original challenge prompt for cleaner scoring.
- GitHub anonymous rate limit (60/hr/IP) applies to the daily tree fetch and README pulls; on rate-limit, last-good rows are retained.