HW Model Scout

Given your GPU/CPU specs, instantly recommends the optimal open-source model + quantization combo with expected tokens/sec and cost-to-run.

What it does

Select your GPU from a dropdown (or enter VRAM + bandwidth manually)
Choose your workload: chat, coding, RAG, reasoning, or creative
Get instant ranked recommendations: model name, best quantization, estimated tokens/sec, VRAM required, quality score
One-click copy of the ollama run command
Switch between quantization levels interactively
Save hardware profiles for quick reuse

Real data sources

| Source | URL | Fetch frequency | Failure behavior |
|--------|-----|-----------------|------------------|
| HuggingFace model API | https://huggingface.co/api/models/{id} | Every 2 hours via node-cron | Shows "data unavailable", uses cached counts |
| Brave Search (news) | https://api.search.brave.com/res/v1/web/search | Every 4 hours via node-cron | News panel shows "News unavailable" |

GPU specs are factual hardware specifications published by NVIDIA/AMD/Apple — these are engineering constants that don't change after product launch.

Performance estimates are computed from memory bandwidth formulas standard in the llama.cpp/vLLM community:
``decode_tps ≈ (effective_bandwidth_GB_s) / (active_params_B × bytes_per_param)``
These are clearly labeled as estimates, not real benchmark numbers.

How to run

# 1. Install dependencies
npm install

# 2. Set up env
cp .env.example .env
# Edit .env — add BRAVE_API_KEY for news (optional), OPENROUTER_API_KEY if needed

# 3. Start
node server.js
# → http://localhost:4712/hw-model-scout/
```

API endpoints

| Method | Path | Description |
|--------|------|-------------|
| GET | /hw-model-scout/health | Health check |
| GET | /hw-model-scout/api/hardware/gpus | All known GPU specs |
| GET | /hw-model-scout/api/hardware/profiles | Saved profiles |
| POST | /hw-model-scout/api/hardware/profiles | Save a profile |
| DELETE | /hw-model-scout/api/hardware/profiles/:id | Delete a profile |
| GET | /hw-model-scout/api/models | All models with HF stats |
| GET | /hw-model-scout/api/models/tags | All workload tags |
| POST | /hw-model-scout/api/recommendations | Get recommendations |
| GET | /hw-model-scout/api/recommendations/history | Past sessions |
| GET | /hw-model-scout/api/news | Cached LLM news |
| POST | /hw-model-scout/api/news/refresh | Manual news refresh |
| GET | /hw-model-scout/api/news/status | Fetch log |

POST /api/recommendations payload

{
  "gpu_model": "RTX 4090",
  "workload": "coding",
  "context_len": 8192,
  "speed_priority": 0.4,
  "multi_gpu": 1
}

Or with manual specs:
``json { "vram_gb": 24, "bandwidth_gb_s": 1008, "vendor": "nvidia", "workload": "rag", "context_len": 32768 }``

Tech stack

Runtime: Node.js 22+ ESM
Server: Express 4 + helmet + compression
Database: better-sqlite3 with WAL mode
Scheduling: node-cron (HF sync every 2h, news every 4h)
Frontend: Vanilla JS SPA, dark theme