agent-ready
Live agent-readiness score for any public website.
agent-ready measures how friendly a public website is to AI agents — LLM crawlers, autonomous agents, AI search products — by inspecting six well-known signals via a single live HTTP probe. No mocks, no API keys for the core flow, no auth.
Think PageSpeed Insights but for AI-agent readiness.
What it measures
Every signal is a single public HTTP GET against the candidate domain. A failed fetch contributes 0 points and the failure mode is recorded in the per-check log.
| Signal | Max | URL we fetch | What it scores |
|---|---:|---|---|
| llms.txt | 25 | https://<domain>/llms.txt | 200 OK, text-ish, body > 100 bytes. +5 if it parses as the proposed Markdown structure (H1 + sections). |
| llms-full.txt | 10 | https://<domain>/llms-full.txt | 200 OK, text-ish, body > 500 bytes. |
| AI bot policy in robots.txt | 25 | https://<domain>/robots.txt | 3 points per distinct AI bot directive (GPTBot, ClaudeBot, anthropic-ai, OAI-SearchBot, Google-Extended, PerplexityBot, ByteSpider, CCBot, Applebot-Extended, Amazonbot, FacebookBot, Meta-ExternalAgent, cohere-ai, Diffbot, Omgilibot, YouBot, DuckAssistBot, Claude-Web, ChatGPT-User, PerplexityUser). Capped at 25. 5 points if robots.txt exists but has no AI-bot directives. |
| /.well-known/ai-agent.json | 15 | https://<domain>/.well-known/ai-agent.json | 200 OK + parses as JSON + has at least one of name, description, actions, endpoints, agents, tools, capabilities (Aiia spec). |
| /ai.txt | 10 | https://<domain>/ai.txt (and fallback https://<domain>/.well-known/ai.txt) | 200 OK, body ≥ 10 bytes. |
| Sitemap declared | 15 | https://<domain>/sitemap.xml (plus parses Sitemap: directives in robots.txt) | 15 if /sitemap.xml is a valid <urlset> or <sitemapindex>. 9 if only declared via robots.txt. |
Total possible: 100. Letter grades: A (≥85), B (≥70), C (≥50), D (≥30), F (<30).
Refresh cadence
- Tracked domains (~150 curated public sites in
data/seed-domains.jsonplus user-submitted): refreshed every 6 hours vianode-cron. Seecrons/refresh.js. - Housekeeping (prune old check rows to last 50 per domain, run
VACUUM): every 24 hours at 03:30. Seecrons/housekeeping.js. - Live single-domain checks (
POST /api/check): rate-limited to 10/min and 60/hour per IP.
API
Base path is /agent-ready. All endpoints return JSON { ok, data?, error? }.
| Method | Path | Notes |
|---|---|---|
| GET | /agent-ready/health | Auth-free, must 200. |
| GET | /agent-ready/api/sites?sort=score|recent&q=&limit=&offset= | Tracked domains list. |
| GET | /agent-ready/api/site/:domain | Latest score + 30 most recent checks with full signal breakdown. |
| GET | /agent-ready/api/stats | Index-wide totals, signal coverage %, grade spread. |
| GET | /agent-ready/api/movers?days=7 | Domains whose score changed since previous check. |
| GET | /agent-ready/api/categories | Average score per category. |
| POST | /agent-ready/api/check | Body { "domain": "example.com" }. Runs all 6 probes live, persists, returns full signal JSON. |
There is no auth, no admin, no API key required to call any endpoint.
Running locally
npm install
cp .env.example .env
npm start
Then open <http://localhost:4748/agent-ready/>.
The first launch seeds the tracked-domain table from data/seed-domains.json and kicks off a single background refresh (~3 minutes for ~150 domains, batched 8-parallel). The server listens immediately — /health does not block on the refresh.
No-mock guarantee
This product complies with the cowork R&D mock-data ban. Specifically:
- The
domainstable starts with no scores. Thelatest_scorecolumn isNULLuntil the first refresh has actually fetched the domain. - There is zero
Math.random(), no preset values, no fallback fake scores anywhere in the codebase. - The seed list (
data/seed-domains.json) is input (the universe of domains we track), not data. The data is what each domain returns at/llms.txt,/robots.txt, etc., on every refresh. - If a fetch fails, the signal score is
0andsignals_json.notesrecords the failure (e.g."timeout","HTTP 404").
Stack
- Node.js 18+ (uses global
fetchwithAbortController) - Express 4
- better-sqlite3 (WAL mode)
- node-cron
- helmet, compression
- Vanilla-JS SPA (no framework, no build step)
Layout
agent-ready/
server.js Express bootstrap, BASE_PATH mount, cron registration
db.js better-sqlite3 + schema + prepared statements
routes/api.js All /api/* handlers
lib/check.js Single-domain probe — 6 parallel fetches with timeout
lib/score.js Signal weights + grade helper + AI bot list
lib/parse-robots.js robots.txt parser (extracts AI-bot directives + Sitemap)
lib/seed.js First-boot seed of the tracked-domain table
lib/rate-limit.js Per-IP token bucket for POST /api/check
crons/refresh.js 6h refresh of all tracked domains, batched 8-parallel
crons/housekeeping.js 24h prune + VACUUM
data/seed-domains.json ~150 curated input domains (config, not data)
public/index.html SPA shell
public/app.js Vanilla-JS UI (leaderboard, stats, movers, live check)
public/style.css Dark theme
Why this exists
llms.txt has reached ~10% adoption across 300k crawled domains. WAB (Web Agent Bridge), Chrome's WebMCP, and the Aiia ai-agent.json spec all launched between January and May 2026. There is no public tracker of which top sites have adopted these standards. agent-ready ships that tracker, plus a single-domain checker so any site owner can audit their own property in seconds.
License
MIT.