← back to gallery

frontier-s1

Live S-1 extractor for AI and frontier-tech IPOs — read the prospectus in 30 seconds

financesec-edgaripos-1aifrontier-techscraperextractor
Open product ↗

frontier-s1

A live extractor for AI / frontier-tech S-1 prospectuses on SEC EDGAR. The poller
watches for new S-1, S-1/A, F-1, F-1/A registration statements, downloads the
prospectus HTML, and rips out the bits a VC associate or fintwit operator actually
reads on day-one: revenue lines, gross margin, AI-related risk factors, named
competitors, lead underwriters and use of proceeds. Each filing is rendered as a
one-screen comparison card; up to four can be put side-by-side.

No mock data. Every number comes from the actual prospectus on EDGAR.

What it does

Data sources (all public, no API keys)

| Source | URL pattern | Refresh interval |
|---|---|---|
| EDGAR full-text search | https://efts.sec.gov/LATEST/search-index?q=&dateRange=custom&startdt=…&enddt=…&forms=S-1,S-1/A,F-1,F-1/A | every 15 min |
| EDGAR per-company submissions | https://data.sec.gov/submissions/CIK{cik10}.json | every 60 min (re-checked on 15-min ticks) |
| EDGAR filing index | https://www.sec.gov/Archives/edgar/data/{cik}/{accNoDash}/index.json | on demand, cached forever |
| EDGAR primary doc HTML | https://www.sec.gov/Archives/edgar/data/{cik}/{accNoDash}/{primary_doc} | on demand, cached on disk |
| EDGAR company-tickers map | https://www.sec.gov/files/company_tickers.json | every 24 h (and at first boot) |

All requests carry User-Agent: frontier-s1 ([email protected]) per SEC fair-access
policy, and the in-process limiter caps SEC traffic at 8 req/s. On three consecutive
poll failures /health returns degraded:true.

HTTP endpoints

All public, no auth. Everything is mounted under /frontier-s1.

| Method | Path | Returns |
|---|---|---|
| GET | /frontier-s1/health | {ok, filings, frontierFilings, lastPollAt, lastPollStatus, degraded} |
| GET | /frontier-s1/api/filings?limit=&category=&since= | List of frontier filings (newest first) |
| GET | /frontier-s1/api/filings/:accession | Full extracted detail for one filing |
| GET | /frontier-s1/api/feed | Last 20 frontier filings (lightweight) |
| GET | /frontier-s1/api/compare?ids=acc1,acc2,… | Up to 4 filings, side-by-side payload |
| GET | /frontier-s1/api/watchlist | Curated watchlist + last-seen-filing per company |
| GET | /frontier-s1/api/status | Poll log (last 50 ticks), per-source health |
| POST | /frontier-s1/api/refresh | Trigger immediate poll cycle (idempotent) |
| GET | /frontier-s1/ | SPA feed |
| GET | /frontier-s1/filing/:accession | SPA filing detail |
| GET | /frontier-s1/compare?ids=… | SPA compare view |
| GET | /frontier-s1/card/:accession | Standalone 1200×630 share-card HTML |

Frontier-tag rule

A filing is tagged is_frontier=1 if any of the following hold:

  1. Its CIK is on the curated watchlist (scrapers/watchlist.js).
  2. Its SIC code is in the curated frontier set: 7372, 7370, 3674, 3812, 3669, 3845, 8731.
  3. Its prospectus body mentions ≥3 of these keywords: artificial intelligence, foundation model, large language model, machine learning, spacecraft, launch vehicle, satellite constellation, semiconductor, accelerator, inference, autonomous, generative ai, neural network.

Non-frontier filings are still stored (with is_frontier=0) but excluded from the UI feed.

Local run

npm install
PORT=4836 node server.js

Then open http://localhost:4836/frontier-s1/.

Scope cuts (intentional)

This is the 4-hour MVP. It does not do:

File layout

frontier-s1/
  server.js              Express bootstrap, cron, mounts /frontier-s1
  db.js                  better-sqlite3 schema + prepared statements
  config.js              base path, port, watchlist + keyword sets
  scrapers/
    edgar-client.js      rate-limited fetch wrapper with SEC User-Agent
    fulltext-search.js   EDGAR full-text search → recent S-1 / F-1 hits
    company-submissions.js  per-CIK submissions JSON
    filing-index.js      resolve accession → primary doc filename
    primary-doc.js       download + cache prospectus HTML
    company-tickers.js   CIK ↔ ticker ↔ name map (24-h refresh)
    watchlist.js         curated CIK list + categories
  extractors/
    html-utils.js        cheerio helpers (section finder, money parser)
    classifier.js        frontier-tag decision + AI-mention counter
    revenue.js           income-statement table regex
    risk-factors.js      Risk-Factors split + keyword filter
    competitors.js       Competition section + proper-noun phrases
    underwriters.js      Cover/Underwriting bookrunner extraction
    use-of-proceeds.js   Use of Proceeds section extraction
    ticker-exchange.js   Proposed ticker symbol + exchange
  jobs/
    poll.js              cron tick: search → submissions → fetch → extract
    extract-queue.js     in-process extract queue
  routes/
    api.js               JSON endpoints
    pages.js             SPA shell routes
  public/
    index.html           SPA shell
    card.html            share-card shell
    app.js               vanilla SPA: feed / detail / compare / status
    card.js              share-card renderer
    style.css            dark theme
    favicon.svg

Honest notes