rfs-matcher
Paste your 2-3 sentence startup description, get back the YC Summer 2026
Requests-for-Startups bullets you match plus the already-funded YC companies
in your exact lane to study before you write the application.
The YC RFS page is a wall of prose. The YC company directory is ~6000 rows
deep. rfs-matcher does the matching for you in one shot: embed your pitch,
embed every RFS bullet and every YC company, cosine-rank, surface the top 3 RFS
bullets and top 5 funded peers with a one-line "why this matches" rationale.
No accounts. No auth. No tracking. The dataset is all public.
Run locally
npm install
cp .env.example .env # optional — fill in OPENAI_API_KEY / ANTHROPIC_API_KEY
PORT=4819 node server.js
open http://localhost:4819/rfs-matcher/
First boot scrapes the YC RFS page (~30 KB) and the yc-oss companies JSON
(~10 MB, ~6000 rows) into ./data/rfs-matcher.db. Subsequent boots are
instant. Without API keys the service still works — see Modes below.
Endpoints
All mounted under /rfs-matcher. No auth, all public.
| Method | Path | What it does |
|---|---|---|
| GET | /rfs-matcher/ | The SPA |
| GET | /rfs-matcher/health | {ok:true, rfs_count, company_count, embeddings_mode} |
| POST | /rfs-matcher/api/match | Body {pitch} → top-3 RFS bullets + top-5 YC companies. Persisted, returns a short pid and share_url. |
| GET | /rfs-matcher/api/match/:pid | Re-fetch a prior match by id. |
| GET | /rfs-matcher/share/:pid | Server-rendered HTML with OG tags — for Twitter/LinkedIn link previews. |
| GET | /rfs-matcher/api/rfs | Every RFS bullet currently indexed. |
| GET | /rfs-matcher/api/companies?batch=&limit=&offset= | Paginated company rows. |
| GET | /rfs-matcher/api/stats | Corpus counts, batch breakdown, last-scrape timestamps, current modes. |
Rate limit: 10 matches/hour per IP (token bucket, in-process). Over-limit → 429
with a Retry-After header. No API key required.
Data sources
| Source | URL | Refresh |
|---|---|---|
| YC Requests for Startups | https://www.ycombinator.com/rfs | weekly, 0 5 1 (Mondays 05:00) |
| YC Companies directory | https://yc-oss.github.io/api/companies/all.json (community mirror of the YC company directory, refreshed daily from ycombinator.com/companies) | weekly, 0 6 1 |
| OpenAI Embeddings (text-embedding-3-small, 1536-d) | https://api.openai.com/v1/embeddings | per user pitch + once per row at bootstrap / delta |
| Anthropic Messages (claude-haiku-4-5) | https://api.anthropic.com/v1/messages | 1 call per /api/match request |
Refreshes are weekly cron jobs registered by lib/cron.js on boot. A snapshot
is kept in SQLite so the service stays up even if upstream is briefly broken;
see /api/stats for the last successful scrape time and scrape_log for
diagnostics.
Modes (honest about degradation)
| Condition | What happens |
|---|---|
| OPENAI_API_KEY set | RFS + companies + your pitch are embedded with text-embedding-3-small. Ranking is cosine similarity over Float32 1536-d vectors. |
| OPENAI_API_KEY missing | Embeddings fall back to a lexical scorer: a blend of overlap-coefficient and Jaccard over a stopword-filtered token set. embeddings_mode in /api/stats reads jaccard-fallback. Real comparison, lower quality. |
| ANTHROPIC_API_KEY set | Each result gets a one-line rationale tying a word from your pitch to a word from the target. |
| ANTHROPIC_API_KEY missing | Results ship with empty rationale strings and rationale_mode: "no-key". The UI shows the cards without the rationale line. |
Nothing is mocked. If a column would be fake, it's omitted.
Architecture
server.js— Express bootstrap, helmet/compression, mounts everything under/rfs-matcher, kicks the first-boot bootstrap.db.js— better-sqlite3 (WAL), schema migration, all prepared statements, Float32 ↔ BLOB helpers.routes/—pages.js(root + health),data.js(rfs/companies/stats),match.js(POST match, GET match, share view).scrapers/rfs.js—fetch+ cheerio parse ofycombinator.com/rfsinto{id, title, body}rows.scrapers/companies.js—fetchofyc-oss.github.io/api/companies/all.json, normalised intoyc_companiesrows.lib/embeddings.js— OpenAI client, batched at 96, exponential-backoff retry, per-IP token bucket.lib/cosine.js—cosineSim,topK,tokenize,jaccard,overlapfor the fallback path.lib/rationale.js— Anthropic call, JSON-mode prompt, lenient JSON-extraction parse.lib/cron.js— first-boot bootstrap + delta-embedding + the two weekly cron jobs.lib/shareCard.js— server-rendered share page with OG tags.public/— vanilla-JS SPA (index.html,app.js,style.css).
Total: ~1700 LOC across 13 source files (plus the spec and frontmatter).
SQLite schema
4 tables in ./data/rfs-matcher.db:
rfs_bullets—{id, batch_label, title, body, source_url, embedding BLOB, embedding_model, scraped_at}yc_companies—{slug, name, batch, industry, one_liner, long_desc, url, status, embedding BLOB, embedding_model, scraped_at}matches—{pid, pitch, top_rfs JSON, top_companies JSON, embeddings_mode, rationale_mode, created_at, client_ip_hash}scrape_log—{source, status, row_count, duration_ms, error, ran_at}
WAL mode. Embeddings stored as Float32 little-endian BLOBs. ~6000 × 1536 × 4 bytes ≈ 37 MB once embedded.
Limits / out-of-scope
- No payments, no accounts. Future "$19 full report" is future — v1 is fully free.
- Single language (English).
- Only YC. Other accelerators are intentionally not supported in v1.
- Pitch must be 20–2000 characters.
- 10 matches per IP per hour.
- Every submission creates a new
pid— matches are immutable; there's no edit.