ctf-ai-solvable

Score CTF challenges by how easily a frontier Claude agent one-shots them — so organizers can design AI-resistant puzzles and recruiters know which scoreboards still measure humans.

Submit a CTF challenge (statement + optional file attachments), the server runs Claude in a server-side code_execution_20250522 sandbox loop, looks for a flag, and emits an AI-Trivial Score (0–100, higher = easier for AI) plus a public share card you can drop into Slack/Discord/X.

The board also ingests real upcoming CTFTime events and a real index of CTF writeups harvested from the public ctfs/write-ups-* GitHub repos so any historical challenge can be "Run AI on this" in one click.

---

What it does

POST /api/submit — accept a challenge (title + statement + optional .txt .py .pcap .png .bin etc., ≤ 5 files, ≤ 2 MB each), create a challenge row, enqueue a run.
Single-concurrency in-process agent worker drives Claude up to 12 turns / 6 min wall clock, using Anthropic's server-side Python sandbox tool (code_execution_20250522). Files are uploaded via the Anthropic Files API once and cached.
Flag detection: regex /(?:flag|FLAG|ctf|CTF|HTB|picoCTF)\{[^}\n]{1,200}\}/ against every text + tool-output block.
Score formula (agent/score.js): solved → 100 − 8·log₂(tokens/1k) − 3·(turns−1) − wall_ms/30000, clamped to [1,100]. Unsolved → 2·turns, clamped to [0,25]. Error → null.
Board, permalink HTML, and an SVG OG share card (1200×630) — all server-rendered, no external services.
CTFTime + writeups ingest daily via node-cron and lazily on first boot if their tables are empty.

Data sources (all real, fetched at runtime)

| Source | URL | Refresh |
|---|---|---|
| CTFTime events API | https://ctftime.org/api/v1/events/?limit=50&start=<now>&finish=<now+120d> | daily 06:00 UTC + on first boot |
| ctfs/write-ups GitHub trees | https://api.github.com/repos/ctfs/write-ups-{2024,2017,2016,2015}/git/trees/master?recursive=1 — falls back through the list until one succeeds | daily 06:15 UTC + on first boot |
| Writeup README contents | https://raw.githubusercontent.com/<repo>/master/<path> | on demand, cached forever in writeup_contents |
| Anthropic Messages API | https://api.anthropic.com/v1/messages with tools:[{type:"code_execution_20250522"}] and betas:[code-execution-2025-05-22, files-api-2025-04-14] | every submit |
| Anthropic Files API | https://api.anthropic.com/v1/files | every submit that has attachments (file_ids cached on the challenge row) |

ANTHROPIC_API_KEY is the only secret needed. No GitHub auth — anonymous rate limit (60/hr) is plenty for one daily tree fetch + occasional README fetches.

HTTP API (all mounted at `/ctf-ai-solvable`)

| Method | Path | Purpose |
|---|---|---|
| GET | /health | {ok:true} |
| GET | /api/board?limit=&offset=&status=&category=&sort= | Joined challenges + best run, newest first (or by score) |
| GET | /api/board/stats | Counters: total challenges, solved %, avg score, avg wall, last-fetch timestamps |
| POST | /api/submit | multipart/form-data: title, statement, category?, model? (opus/sonnet), files[]?. Returns {run_id, challenge_id, slug} |
| GET | /api/challenges/:slug | {challenge, runs: [...]} |
| GET | /api/runs/:id | Poll target — status, score, tokens, turns, wall_ms, flag, band |
| GET | /api/runs/:id/transcript | Full structured per-turn transcript |
| GET | /api/events?limit= | CTFTime upcoming events (sorted by start time) |
| GET | /api/writeups?q=&category=&limit= | Writeups index, LIKE-search on name/CTF/path |
| POST | /api/writeups/:id/prefill | Lazily fetch the writeup's README from raw.githubusercontent, trim past the "Solution" header, return text + suggested title/category |
| GET | /c/:slug | HTML permalink with og:* and twitter:card meta tags for link-unfurl previews |
| GET | /c/:slug/card.svg | 1200×630 SVG share card, server-rendered, no external fonts |

Run locally

npm install
ANTHROPIC_API_KEY=sk-ant-... PORT=4798 node server.js
# open http://localhost:4798/ctf-ai-solvable/

The server boots immediately; on first boot it lazily pulls CTFTime + writeups so the Events and Writeups tabs are populated before any agent run.

Architecture

server.js           express app + cron + bootstrap
config.js           env, constants, score-band helpers
db.js               better-sqlite3 (WAL), prepared statements
routes/             health · board · challenges · runs · events · writeups
scrapers/           ctftime · writeups (GitHub tree walker)
agent/              runner (Claude loop) · score · flag · files (upload helper)
views/share.js      HTML permalink + SVG OG card renderers
public/             vanilla-JS SPA (index.html, app.js, style.css)
data/               SQLite DB + uploaded attachments (gitignored)

SQLite tables (see db.js): challenges, runs, events, writeups, writeup_contents, meta.
Agent queue is in-process FIFO with MAX_CONCURRENT=1 to keep $$$$$ predictable.
/api/submit per-IP rate limit: 5 per hour (in-memory).
Frontend polls /api/runs/:id every 2 s while a run is queued/running.

Out of scope

No auth, no payments, no multi-agent, no local bash exec (every Python call goes through Anthropic's sandbox), no live-infra exploitation, no WebSockets, no tests, no Docker. See SPEC.md §6.

Honest limitations

Submits require a working ANTHROPIC_API_KEY with code-execution-2025-05-22 + files-api-2025-04-14 betas enabled on the account. Without it, runs finalize as status='error' with the message surfaced in the UI.
If a challenge requires nc challenge.host 1337 to a live service, the agent will fail — by design.
Writeup-prefilled statements are heuristically trimmed to remove the solution; manually paste the original challenge prompt for cleaner scoring.
GitHub anonymous rate limit (60/hr/IP) applies to the daily tree fetch and README pulls; on rate-limit, last-good rows are retained.