glyph-trap
Live public scanner that finds invisible Unicode payloads (Tag chars, zero-width separators, bidi-override re-orderings, homoglyph swaps, HTML comments hiding prompt-like keywords) inside AI-agent instruction files on GitHub —AGENTS.md,CLAUDE.md,GEMINI.md,.cursorrules,.windsurfrules,.clinerules,SKILL.md,*.mdc,.claude/settings.json, and more.
Tagline: Your AGENTS.md can lie to your eyes. glyph-trap catches the lie.
Live URL (after deploy): https://holyai.me/glyph-trap/
Port: 4907
Base path: /glyph-trap
Auth: none on any endpoint.
Why this exists (May 2026)
- AGENTS.md is the universal standard. OpenAI, Google, Cursor,
- Sourcegraph, Factory and Anthropic agreed on it as the open spec for AI
- coding-agent instructions. VS Code injects its contents into every chat
- request, by default, treating it as instructions.
- The attack is real and disclosed. Prompt Security's May 2026 post
- "When Your Repo Starts Talking" demonstrated a malicious
AGENTS.md - quietly tells the agent to email internal data out during an ordinary
- coding session. Embrace The Red showed Unicode Tag characters (U+E0000–U+E007F)
- are invisible to humans but read by LLMs as plain ASCII — the "ASCII
- Smuggler" attack.
- **No existing public scanner is dedicated to invisible-Unicode in
- instruction files across the full agent-format zoo.** This is it.
What it scans
| Filename | Notes |
|---|---|
| AGENTS.md, agents.md | the OpenAI-led open standard |
| CLAUDE.md, claude.md | Claude Code |
| GEMINI.md, gemini.md | Gemini CLI |
| SKILL.md, skill.md, .skill | Claude Skills, Cowork skills |
| .cursorrules, .cursor/rules/, .mdc | Cursor rules |
| .windsurfrules, .windsurf/rules/ | Windsurf rules |
| .clinerules, .clinerules/ | Cline rules |
| .aiderules | Aider rules |
| .claude/settings.json, .claude/settings.local.json | Claude Code settings |
| .claude/commands/.md, .claude/CLAUDE.md | Claude Code project config |
| .codex/AGENTS.md | OpenAI Codex CLI |
| .openhands/microagents/*.md | OpenHands microagents |
What it detects
Eight character classes, each with a weight:
| Category | Weight | Notes |
|---|---:|---|
| tag_chars | 12 | U+E0000–U+E007F. Invisible "ASCII Smuggler" payloads. |
| bidi_override | 10 | U+202A–U+202E, U+2066–U+2069 (Trojan Source family). |
| html_comment_with_prompt | 8 | <!-- … keyword … --> hiding prompts. |
| homoglyph_latin | 6 | Cyrillic/Greek glyphs that match ASCII letters. |
| noncharacter | 6 | U+FDD0–U+FDEF and U+xxFFFE/FFFF. |
| zero_width | 4 | U+200B, U+200C, U+200D, U+2060, U+FEFF. |
| invisible_format | 3 | U+00AD, U+034F, U+061C, Hangul fillers, … |
| private_use | 2 | Private-use blocks outside the Tag block. |
Score is the sum of hit weights, capped at 100. Category bands: critical >= 80,high >= 60, moderate >= 30, low >= 1, else clean.
Real data sources
| Source | URL | Frequency |
|---|---|---|
| GitHub Code Search | https://api.github.com/search/code?q=filename:… | every 6 h, one query per filename |
| Raw bytes | https://raw.githubusercontent.com/{owner}/{repo}/{sha}/{path} | per file during crawl + repo scan |
| Repo tree | https://api.github.com/repos/{owner}/{repo}/git/trees/{sha}?recursive=1 | per repo scan |
| Rate limit | https://api.github.com/rate_limit | surfaced in /api/stats |
No mock data. No seed arrays. No Math.random() jitter. If a crawl returns
zero new files this hour, the counter stays where it is.
API
GET /glyph-trap/health → { ok, files_tracked, last_crawl }
POST /glyph-trap/api/scan body: { text, filename? }
GET /glyph-trap/api/scan/:id
POST /glyph-trap/api/scan/repo body: { owner, repo, ref? }
GET /glyph-trap/api/scan/repo/:owner/:repo
GET /glyph-trap/api/registry?category=&limit=&offset=
GET /glyph-trap/api/file/:hash
GET /glyph-trap/api/stats
GET /glyph-trap/badge/scan/:id.svg
GET /glyph-trap/badge/repo/:owner/:repo.svg
All endpoints are public. CORS is on /api/. JSON unless .svg.
Run locally
npm install
node server.js
# open http://localhost:4907/glyph-trap/
Optional: set GITHUB_TOKEN to lift the unauthenticated rate limit.
Stack
Node.js 20+ • Express 4 • better-sqlite3 (WAL) • node-cron • helmet •compression • Vanilla JS SPA.
No build step. No bundler.
License
UNLICENSED — for the holyai.me showcase. Not a redistributable package.