llama-cpplisted

llama.cpp local GGUF inference + HF Hub model discovery.
aashutosh396/mindpalace · ★ 0 · AI & Automation · score 78

Install: claude install-skill aashutosh396/mindpalace

# llama.cpp + GGUF Use this skill for local GGUF inference, quant selection, or Hugging Face repo discovery for llama.cpp. ## When to use - Run local models on CPU, Apple Silicon, CUDA, ROCm, or Intel GPUs - Find the right GGUF for a specific Hugging Face repo - Build a `llama-server` or `llama-cli` command from the Hub - Search the Hub for models that already support llama.cpp - Enumerate available `.gguf` files and sizes for a repo - Decide between Q4/Q5/Q6/IQ variants for the user's RAM or VRAM ## Model Discovery workflow Prefer URL workflows before asking for `hf`, Python, or custom scripts. 1. Search for candidate repos on the Hub: - Base: `https://huggingface.co/models?apps=llama.cpp&sort=trending` - Add `search=<term>` for a model family - Add `num_parameters=min:0,max:24B` or similar when the user has size constraints 2. Open the repo with the llama.cpp local-app view: - `https://huggingface.co/<repo>?local-app=llama.cpp` 3. Treat the local-app snippet as the source of truth when it is visible: - copy the exact `llama-server` or `llama-cli` command - report the recommended quant exactly as HF shows it 4. Read the same `?local-app=llama.cpp` URL as page text or HTML and extract the section under `Hardware compatibility`: - prefer its exact quant labels and sizes over generic tables - keep repo-specific labels such as `UD-Q4_K_M` or `IQ4_NL_XL` - if that section is not visible in the fetched page source, say so and fall back to the tree