← back to gallery

HW Model Scout

Find the best open-source LLM for your GPU in seconds

dev-toolsllmgpuinferencequantizationlocal-ai
Open product ↗

HW Model Scout

Given your GPU/CPU specs, instantly recommends the optimal open-source model + quantization combo with expected tokens/sec and cost-to-run.

What it does

Real data sources

| Source | URL | Fetch frequency | Failure behavior |
|--------|-----|-----------------|------------------|
| HuggingFace model API | https://huggingface.co/api/models/{id} | Every 2 hours via node-cron | Shows "data unavailable", uses cached counts |
| Brave Search (news) | https://api.search.brave.com/res/v1/web/search | Every 4 hours via node-cron | News panel shows "News unavailable" |

GPU specs are factual hardware specifications published by NVIDIA/AMD/Apple — these are engineering constants that don't change after product launch.

Performance estimates are computed from memory bandwidth formulas standard in the llama.cpp/vLLM community:
``
decode_tps ≈ (effective_bandwidth_GB_s) / (active_params_B × bytes_per_param)
``
These are clearly labeled as estimates, not real benchmark numbers.

How to run

# 1. Install dependencies
npm install

# 2. Set up env
cp .env.example .env
# Edit .env — add BRAVE_API_KEY for news (optional), OPENROUTER_API_KEY if needed

# 3. Start
node server.js
# → http://localhost:4712/hw-model-scout/
```

API endpoints

| Method | Path | Description |
|--------|------|-------------|
| GET | /hw-model-scout/health | Health check |
| GET | /hw-model-scout/api/hardware/gpus | All known GPU specs |
| GET | /hw-model-scout/api/hardware/profiles | Saved profiles |
| POST | /hw-model-scout/api/hardware/profiles | Save a profile |
| DELETE | /hw-model-scout/api/hardware/profiles/:id | Delete a profile |
| GET | /hw-model-scout/api/models | All models with HF stats |
| GET | /hw-model-scout/api/models/tags | All workload tags |
| POST | /hw-model-scout/api/recommendations | Get recommendations |
| GET | /hw-model-scout/api/recommendations/history | Past sessions |
| GET | /hw-model-scout/api/news | Cached LLM news |
| POST | /hw-model-scout/api/news/refresh | Manual news refresh |
| GET | /hw-model-scout/api/news/status | Fetch log |

POST /api/recommendations payload

{
  "gpu_model": "RTX 4090",
  "workload": "coding",
  "context_len": 8192,
  "speed_priority": 0.4,
  "multi_gpu": 1
}

Or with manual specs:
``json
{
"vram_gb": 24,
"bandwidth_gb_s": 1008,
"vendor": "nvidia",
"workload": "rag",
"context_len": 32768
}
``

Tech stack