agent-ready

Live agent-readiness score for any public website.

agent-ready measures how friendly a public website is to AI agents — LLM crawlers, autonomous agents, AI search products — by inspecting six well-known signals via a single live HTTP probe. No mocks, no API keys for the core flow, no auth.

Think PageSpeed Insights but for AI-agent readiness.

What it measures

Every signal is a single public HTTP GET against the candidate domain. A failed fetch contributes 0 points and the failure mode is recorded in the per-check log.

| Signal | Max | URL we fetch | What it scores |
|---|---:|---|---|
| llms.txt | 25 | https://<domain>/llms.txt | 200 OK, text-ish, body > 100 bytes. +5 if it parses as the proposed Markdown structure (H1 + sections). |
| llms-full.txt | 10 | https://<domain>/llms-full.txt | 200 OK, text-ish, body > 500 bytes. |
| AI bot policy in robots.txt | 25 | https://<domain>/robots.txt | 3 points per distinct AI bot directive (GPTBot, ClaudeBot, anthropic-ai, OAI-SearchBot, Google-Extended, PerplexityBot, ByteSpider, CCBot, Applebot-Extended, Amazonbot, FacebookBot, Meta-ExternalAgent, cohere-ai, Diffbot, Omgilibot, YouBot, DuckAssistBot, Claude-Web, ChatGPT-User, PerplexityUser). Capped at 25. 5 points if robots.txt exists but has no AI-bot directives. |
| /.well-known/ai-agent.json | 15 | https://<domain>/.well-known/ai-agent.json | 200 OK + parses as JSON + has at least one of name, description, actions, endpoints, agents, tools, capabilities (Aiia spec). |
| /ai.txt | 10 | https://<domain>/ai.txt (and fallback https://<domain>/.well-known/ai.txt) | 200 OK, body ≥ 10 bytes. |
| Sitemap declared | 15 | https://<domain>/sitemap.xml (plus parses Sitemap: directives in robots.txt) | 15 if /sitemap.xml is a valid <urlset> or <sitemapindex>. 9 if only declared via robots.txt. |

Total possible: 100. Letter grades: A (≥85), B (≥70), C (≥50), D (≥30), F (<30).

Refresh cadence

Tracked domains (~150 curated public sites in data/seed-domains.json plus user-submitted): refreshed every 6 hours via node-cron. See crons/refresh.js.
Housekeeping (prune old check rows to last 50 per domain, run VACUUM): every 24 hours at 03:30. See crons/housekeeping.js.
Live single-domain checks (POST /api/check): rate-limited to 10/min and 60/hour per IP.

API

Base path is /agent-ready. All endpoints return JSON { ok, data?, error? }.

| Method | Path | Notes |
|---|---|---|
| GET | /agent-ready/health | Auth-free, must 200. |
| GET | /agent-ready/api/sites?sort=score|recent&q=&limit=&offset= | Tracked domains list. |
| GET | /agent-ready/api/site/:domain | Latest score + 30 most recent checks with full signal breakdown. |
| GET | /agent-ready/api/stats | Index-wide totals, signal coverage %, grade spread. |
| GET | /agent-ready/api/movers?days=7 | Domains whose score changed since previous check. |
| GET | /agent-ready/api/categories | Average score per category. |
| POST | /agent-ready/api/check | Body { "domain": "example.com" }. Runs all 6 probes live, persists, returns full signal JSON. |

There is no auth, no admin, no API key required to call any endpoint.

Running locally

npm install
cp .env.example .env
npm start

Then open <http://localhost:4748/agent-ready/>.

The first launch seeds the tracked-domain table from data/seed-domains.json and kicks off a single background refresh (~3 minutes for ~150 domains, batched 8-parallel). The server listens immediately — /health does not block on the refresh.

No-mock guarantee

This product complies with the cowork R&D mock-data ban. Specifically:

The domains table starts with no scores. The latest_score column is NULL until the first refresh has actually fetched the domain.
There is zero Math.random(), no preset values, no fallback fake scores anywhere in the codebase.
The seed list (data/seed-domains.json) is input (the universe of domains we track), not data. The data is what each domain returns at /llms.txt, /robots.txt, etc., on every refresh.
If a fetch fails, the signal score is 0 and signals_json.notes records the failure (e.g. "timeout", "HTTP 404").

Stack

Node.js 18+ (uses global fetch with AbortController)
Express 4
better-sqlite3 (WAL mode)
node-cron
helmet, compression
Vanilla-JS SPA (no framework, no build step)

Layout

agent-ready/
  server.js                Express bootstrap, BASE_PATH mount, cron registration
  db.js                    better-sqlite3 + schema + prepared statements
  routes/api.js            All /api/* handlers
  lib/check.js             Single-domain probe — 6 parallel fetches with timeout
  lib/score.js             Signal weights + grade helper + AI bot list
  lib/parse-robots.js      robots.txt parser (extracts AI-bot directives + Sitemap)
  lib/seed.js              First-boot seed of the tracked-domain table
  lib/rate-limit.js        Per-IP token bucket for POST /api/check
  crons/refresh.js         6h refresh of all tracked domains, batched 8-parallel
  crons/housekeeping.js    24h prune + VACUUM
  data/seed-domains.json   ~150 curated input domains (config, not data)
  public/index.html        SPA shell
  public/app.js            Vanilla-JS UI (leaderboard, stats, movers, live check)
  public/style.css         Dark theme

Why this exists

llms.txt has reached ~10% adoption across 300k crawled domains. WAB (Web Agent Bridge), Chrome's WebMCP, and the Aiia ai-agent.json spec all launched between January and May 2026. There is no public tracker of which top sites have adopted these standards. agent-ready ships that tracker, plus a single-domain checker so any site owner can audit their own property in seconds.

License

MIT.