HW Model Scout
Given your GPU/CPU specs, instantly recommends the optimal open-source model + quantization combo with expected tokens/sec and cost-to-run.
What it does
- Select your GPU from a dropdown (or enter VRAM + bandwidth manually)
- Choose your workload: chat, coding, RAG, reasoning, or creative
- Get instant ranked recommendations: model name, best quantization, estimated tokens/sec, VRAM required, quality score
- One-click copy of the
ollama runcommand - Switch between quantization levels interactively
- Save hardware profiles for quick reuse
Real data sources
| Source | URL | Fetch frequency | Failure behavior |
|--------|-----|-----------------|------------------|
| HuggingFace model API | https://huggingface.co/api/models/{id} | Every 2 hours via node-cron | Shows "data unavailable", uses cached counts |
| Brave Search (news) | https://api.search.brave.com/res/v1/web/search | Every 4 hours via node-cron | News panel shows "News unavailable" |
GPU specs are factual hardware specifications published by NVIDIA/AMD/Apple — these are engineering constants that don't change after product launch.
Performance estimates are computed from memory bandwidth formulas standard in the llama.cpp/vLLM community:
````
decode_tps ≈ (effective_bandwidth_GB_s) / (active_params_B × bytes_per_param)
These are clearly labeled as estimates, not real benchmark numbers.
How to run
# 1. Install dependencies
npm install
# 2. Set up env
cp .env.example .env
# Edit .env — add BRAVE_API_KEY for news (optional), OPENROUTER_API_KEY if needed
# 3. Start
node server.js
# → http://localhost:4712/hw-model-scout/
```
API endpoints
| Method | Path | Description |
|--------|------|-------------|
| GET | /hw-model-scout/health | Health check |
| GET | /hw-model-scout/api/hardware/gpus | All known GPU specs |
| GET | /hw-model-scout/api/hardware/profiles | Saved profiles |
| POST | /hw-model-scout/api/hardware/profiles | Save a profile |
| DELETE | /hw-model-scout/api/hardware/profiles/:id | Delete a profile |
| GET | /hw-model-scout/api/models | All models with HF stats |
| GET | /hw-model-scout/api/models/tags | All workload tags |
| POST | /hw-model-scout/api/recommendations | Get recommendations |
| GET | /hw-model-scout/api/recommendations/history | Past sessions |
| GET | /hw-model-scout/api/news | Cached LLM news |
| POST | /hw-model-scout/api/news/refresh | Manual news refresh |
| GET | /hw-model-scout/api/news/status | Fetch log |
POST /api/recommendations payload
{
"gpu_model": "RTX 4090",
"workload": "coding",
"context_len": 8192,
"speed_priority": 0.4,
"multi_gpu": 1
}
Or with manual specs:
``json``
{
"vram_gb": 24,
"bandwidth_gb_s": 1008,
"vendor": "nvidia",
"workload": "rag",
"context_len": 32768
}
Tech stack
- Runtime: Node.js 22+ ESM
- Server: Express 4 + helmet + compression
- Database: better-sqlite3 with WAL mode
- Scheduling: node-cron (HF sync every 2h, news every 4h)
- Frontend: Vanilla JS SPA, dark theme