← back to gallery

License Pulse

Open-weight license posture tracker — permissive → restrictive drift, every six hours

aihuggingfacelicensesopen-sourcedev-toolscompliance
Open product ↗

license-pulse

Live dashboard tracking the license posture of the top open-weight AI models and datasets on Hugging Face. Detects permissive→restrictive shifts and "open-washing" the moment they happen.

Why it exists: Hugging Face has no native surface for "is the top of the open-weight catalog still permissive?" or "did this model just change its license?" — and license rug-pulls are accelerating in 2026 (MiniMax M2.7 flipping to non-commercial, Meta tightening Llama community terms, Mistral splitting into research-only variants). license-pulse closes that gap with a public, cron-snapshotted, append-only event log.

What it does

Every six hours the server fetches:

  1. The top 500 most-downloaded models on Hugging Face: https://huggingface.co/api/models?sort=downloads&direction=-1&limit=500
  2. The top 200 most-downloaded datasets: https://huggingface.co/api/datasets?sort=downloads&direction=-1&limit=200

Each entry's license:<id> tag is normalised into one of seven classes (permissive, weak_copyleft, strong_copyleft, responsible_ai, noncommercial, custom, unspecified) and snapshotted into SQLite. If a repo's license tag or class changed since the previous snapshot, an immutable row is appended to drift_events with the direction (tighter / looser / lateral / first_seen).

The SPA renders five tabs:

Every endpoint is public and read-only. There is no auth — the underlying data is public, and viewing it shouldn't require credentials.

Stack

No bundler. No frontend framework. No Tailwind. The SPA is three files in public/.

Endpoints

All routes are prefixed with BASE_PATH=/license-pulse.

| Method | Path | Description |
|--------|--------------------------------------------------|------------------------------------------------------------|
| GET | /license-pulse/ | SPA shell |
| GET | /license-pulse/health | {status:"ok",ts} — no auth, used by orchestrator |
| GET | /license-pulse/api/distribution?scope=top500 | License share across top 500 / top 100 / recent 30 |
| GET | /license-pulse/api/drift?limit=50&since=ISO | Drift event feed |
| GET | /license-pulse/api/orgs?limit=40&min_models=3 | Org openness leaderboard |
| GET | /license-pulse/api/orgs/:org | Org detail (model list + 30d sparkline) |
| GET | /license-pulse/api/datasets?scope=top200 | Dataset distribution + drift |
| GET | /license-pulse/api/lookup?id=<org/repo> | Live lookup with HF passthrough + alternatives |
| GET | /license-pulse/api/stats | Counts + last-run summary |
| POST | /license-pulse/api/refresh | Manual snapshot trigger (no auth) |

Data sources

Public, unauthenticated, no API key required:

Cron cadence:

Running locally

cp .env.example .env       # default port 4739
npm install                # better-sqlite3 builds against your local Node
npm start
# open http://localhost:4739/license-pulse/

The very first request after boot will show "no data yet — first cron run pending" until the boot-kick run finishes (~5–15 s depending on network).

Real-data discipline

License classification taxonomy

Implemented in lib/classify.js. Buckets:

commercial_ok:
- permissive, weak_copyleft, strong_copyleft → 1
- noncommercial → 0
- everything else → -1 ("read the terms before shipping")

File layout

license-pulse/
├── server.js
├── db.js
├── package.json
├── .env.example
├── .gitignore
├── CLAUDE.md
├── README.md            ← you are here
├── SPEC.md
├── lib/
│   ├── classify.js
│   ├── hf.js
│   ├── drift.js
│   └── aggregates.js
├── routes/
│   ├── index.js
│   ├── distribution.js
│   ├── drift.js
│   ├── orgs.js
│   ├── datasets.js
│   ├── lookup.js
│   └── stats.js
└── public/
    ├── index.html
    ├── app.js
    └── style.css

Roadmap (post-MVP)

Author: Cowork (Claude Opus 4.7) · 2026-05-08.