grep-tax — how much does your repo cost an AI agent?

How grep-tax works

Every scan runs the same five navigation questions a developer (or an AI coding agent) would ask of an unfamiliar repo: where is auth handled, where are env vars read, where are HTTP routes defined, where is the database initialized, and where is error handling and logging configured.

For each question we run two strategies and count tokens with gpt-tokenizer (cl100k_base, the same tokenizer family Cursor and Claude Code charge against):

Naive: grep the highest-IDF keyword from the question across every file, sort by hit count, read each full file in order until 95 % of ground-truth chunks are covered. This is what an LLM with a grep tool does.
Smart: chunk every file into 50-line windows (5-line overlap), BM25-rank the chunks against the question, read chunks in rank order until 95 % coverage. Same numerator, ~10–50× fewer tokens.

Ground truth is deterministic: each question has a regex set; any file containing a match is considered relevant. The methodology section on every scorecard lists exactly which regexes were used so the grade is reproducible.

Out of scope: private repos, custom queries, real embeddings, language-specific parsing. The point is to give maintainers a fast, comparable number — not a perfect benchmark.

How much does your repo cost an AI agent to navigate?

🏆 Hall of fame

💀 Hall of shame

🕒 Recently scanned

How grep-tax works