llmstxt-radar

Live adoption tracker, validator, and diff watcher for the /llms.txt standard. Who ships a real llms.txt? Who lets it rot? Who quietly took it down?

Live at: <https://holyai.me/llmstxt-radar/>
JSON feed: <https://holyai.me/llmstxt-radar/feed.json>
RSS: <https://holyai.me/llmstxt-radar/feed.xml>
Health: <https://holyai.me/llmstxt-radar/health>

What

The /llms.txt proposal (Jeremy Howard / Answer.AI, Sep 2024)
is the closest thing the web has to a "robots.txt for LLMs". By May 2026
hundreds of dev-tool, framework, docs, and SaaS sites publish one — but there
is no public, machine-checked registry of who ships one, how good it is, and
when it changes.

llmstxt-radar watches a curated seed of ~100 domains, fetches /llms.txt
and /llms-full.txt every six hours, parses each file against the official
spec, scores 0–100, and publishes:

a leaderboard sortable by spec score, freshness, or link count
a changes feed (JSON Feed + RSS) of every detected add / update /
remove / restore event
a per-domain detail view with the parsed snapshot, scoring breakdown,
and snapshot diff history
a paste-to-validate widget that scores your own llms.txt in real time

There is no login. Every endpoint is a public read.

Why this matters

Dev-tool docs teams can paste their llms.txt into the validator and
see exactly which spec rules they fail.
AI / IDE-agent builders can subscribe to the JSON feed and detect new
/llms.txt entrants the same day they ship.
AI docs / SEO writers can track adoption velocity.
Casual web devs can browse well-crafted examples to copy.

Stack

Node.js 18+ (ESM)
Express 4 + helmet + compression + cors
better-sqlite3 (WAL mode)
node-cron for scheduled fetches
Vanilla JS SPA, dark theme, no build step

No build step. No bundler. No TypeScript.

Data sources (all real, all public, no auth)

| Source | What we use it for | Refresh |
|---|---|---|
| Each domain's /llms.txt | The snapshot itself | every 6 h |
| Each domain's /llms-full.txt | Reachability check only (body not stored) | every 6 h |
| <https://raw.githubusercontent.com/SecretiveShell/Awesome-llms-txt/master/README.md> | Bootstrap seed of additional domains | weekly |
| <https://raw.githubusercontent.com/krish-adi/llmstxt-site/master/README.md> | Bootstrap seed of additional domains | weekly |
| seed.json (in this repo) | Hand-curated initial list of dev-tool / framework / SaaS docs | manual |

No mock data anywhere. Every score is computed from a real HTTP fetch
of the listed domain's /llms.txt. If a fetch fails we store the real HTTP
status — we never fabricate content. Math.random() is used only for cron
jitter to avoid thundering herd.

Scoring rubric (0–100)

| Rule | Points |
|---|---|
| File reachable (HTTP 200) | 20 |
| Valid markdown (parses cleanly) | 10 |
| Has H1 title | 10 |
| Has blockquote summary | 10 |
| Has at least one H2 section | 5 |
| Has at least 5 valid links | 10 |
| All links are absolute URLs | 5 |
| All links have a ": description" suffix | 5 |
| File size between 256 B and 64 KB | 5 |
| Companion /llms-full.txt reachable | 10 |
| Served as text/plain or text/markdown | 5 |
| No raw HTML in the body | 5 |

Public endpoints

All under BASE_PATH (default /llmstxt-radar).

GET  /                          → SPA
GET  /health                    → liveness JSON (200)
GET  /api/stats                 → counts + averages
GET  /api/leaderboard           → sort=score|freshness|links (default score)
GET  /api/changes               → ?days=30&limit=200
GET  /api/domains               → ?q=&limit=&offset=
GET  /api/domain/:domain        → full detail incl. last 10 snapshots
GET  /api/domain/:domain/raw    → cached body, text/markdown
GET  /api/domain/:domain/diff   → ?from=:sid&to=:sid (defaults to last two)
POST /api/validate              → { content } → score + violations (30/h/IP)
GET  /feed.json                 → JSON Feed 1.1 of changes
GET  /feed.xml                  → RSS 2.0 of changes

CORS is * for all GETs (and the validate POST).

Cron schedule

| Schedule | What |
|---|---|
| 0 /6 | Fetch every active domain's /llms.txt and probe /llms-full.txt |
| 15 3 0 (Sun 03:15 UTC) | Reload seed.json + pull external seed lists |
| boot + 3 s | One-shot fetch pass on first start |

Running locally

cp .env.example .env
npm install
npm start
# → llmstxt-radar listening on :4826 → /llmstxt-radar/

better-sqlite3 needs native bindings. On macOS / Linux arm64 you'll get a
prebuilt binary automatically. On the RNDLAB cowork sandbox the prebuilt
fetch is blocked — that is fine: production deploys to arm64 where prebuilts
are available.

License

MIT.