local-llm-tool
SolidLocal LLM execution tool for text generation and chat through Ollama or vLLM endpoints. Use when: running on-prem inference, calling a local GPU model, or summarizing with a self-hosted LLM.
Install
Quality Score: 90/100
Skill Content
Details
- Author
- xuiltul
- Repository
- xuiltul/animaworks
- Created
- 3 months ago
- Last Updated
- today
- Language
- Python
- License
- Apache-2.0
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
local-llm
Hand off bulk transformation work to a local LLM via Ollama
local-llm-expert
Master local LLM inference, model selection, VRAM optimization, and local deployment using Ollama, llama.cpp, vLLM, and LM Studio. Expert in quantization formats (GGUF, EXL2) and local AI privacy.
llama-cpp
Secondary local LLM inference engine via llama.cpp. This skill should be used when running GGUF models directly, loading LoRA adapters for Kothar, benchmarking inference speed, or serving models via llama-server. Includes dedicated Qwen 3.5 serve scripts (9B dense with F16 option, 35B MoE) with asymmetric KV cache and thinking mode. Complements Ollama (which remains primary for RLAMA and general use).