← back to gallery

RFS Matcher

Match your startup to YC Summer 2026 RFS plus the funded YC companies in your lane

researchycrfsembeddingsfoundersfundraising
Open product ↗

rfs-matcher

Paste your 2-3 sentence startup description, get back the YC Summer 2026
Requests-for-Startups bullets you match plus the already-funded YC companies
in your exact lane to study before you write the application.

The YC RFS page is a wall of prose. The YC company directory is ~6000 rows
deep. rfs-matcher does the matching for you in one shot: embed your pitch,
embed every RFS bullet and every YC company, cosine-rank, surface the top 3 RFS
bullets and top 5 funded peers with a one-line "why this matches" rationale.

No accounts. No auth. No tracking. The dataset is all public.

Run locally

npm install
cp .env.example .env   # optional — fill in OPENAI_API_KEY / ANTHROPIC_API_KEY
PORT=4819 node server.js
open http://localhost:4819/rfs-matcher/

First boot scrapes the YC RFS page (~30 KB) and the yc-oss companies JSON
(~10 MB, ~6000 rows) into ./data/rfs-matcher.db. Subsequent boots are
instant. Without API keys the service still works — see Modes below.

Endpoints

All mounted under /rfs-matcher. No auth, all public.

| Method | Path | What it does |
|---|---|---|
| GET | /rfs-matcher/ | The SPA |
| GET | /rfs-matcher/health | {ok:true, rfs_count, company_count, embeddings_mode} |
| POST | /rfs-matcher/api/match | Body {pitch} → top-3 RFS bullets + top-5 YC companies. Persisted, returns a short pid and share_url. |
| GET | /rfs-matcher/api/match/:pid | Re-fetch a prior match by id. |
| GET | /rfs-matcher/share/:pid | Server-rendered HTML with OG tags — for Twitter/LinkedIn link previews. |
| GET | /rfs-matcher/api/rfs | Every RFS bullet currently indexed. |
| GET | /rfs-matcher/api/companies?batch=&limit=&offset= | Paginated company rows. |
| GET | /rfs-matcher/api/stats | Corpus counts, batch breakdown, last-scrape timestamps, current modes. |

Rate limit: 10 matches/hour per IP (token bucket, in-process). Over-limit → 429
with a Retry-After header. No API key required.

Data sources

| Source | URL | Refresh |
|---|---|---|
| YC Requests for Startups | https://www.ycombinator.com/rfs | weekly, 0 5 1 (Mondays 05:00) |
| YC Companies directory | https://yc-oss.github.io/api/companies/all.json (community mirror of the YC company directory, refreshed daily from ycombinator.com/companies) | weekly, 0 6 1 |
| OpenAI Embeddings (text-embedding-3-small, 1536-d) | https://api.openai.com/v1/embeddings | per user pitch + once per row at bootstrap / delta |
| Anthropic Messages (claude-haiku-4-5) | https://api.anthropic.com/v1/messages | 1 call per /api/match request |

Refreshes are weekly cron jobs registered by lib/cron.js on boot. A snapshot
is kept in SQLite so the service stays up even if upstream is briefly broken;
see /api/stats for the last successful scrape time and scrape_log for
diagnostics.

Modes (honest about degradation)

| Condition | What happens |
|---|---|
| OPENAI_API_KEY set | RFS + companies + your pitch are embedded with text-embedding-3-small. Ranking is cosine similarity over Float32 1536-d vectors. |
| OPENAI_API_KEY missing | Embeddings fall back to a lexical scorer: a blend of overlap-coefficient and Jaccard over a stopword-filtered token set. embeddings_mode in /api/stats reads jaccard-fallback. Real comparison, lower quality. |
| ANTHROPIC_API_KEY set | Each result gets a one-line rationale tying a word from your pitch to a word from the target. |
| ANTHROPIC_API_KEY missing | Results ship with empty rationale strings and rationale_mode: "no-key". The UI shows the cards without the rationale line. |

Nothing is mocked. If a column would be fake, it's omitted.

Architecture

Total: ~1700 LOC across 13 source files (plus the spec and frontmatter).

SQLite schema

4 tables in ./data/rfs-matcher.db:

WAL mode. Embeddings stored as Float32 little-endian BLOBs. ~6000 × 1536 × 4 bytes ≈ 37 MB once embedded.

Limits / out-of-scope