obliteratuslisted

OBLITERATUS: abliterate LLM refusals (diff-in-means).
aashutosh396/mindpalace · ★ 0 · AI & Automation · score 78

Install: claude install-skill aashutosh396/mindpalace

# OBLITERATUS Skill ## What's inside 9 CLI methods, 28 analysis modules, 116 model presets across 5 compute tiers, tournament evaluation, and telemetry-driven recommendations. Remove refusal behaviors (guardrails) from open-weight LLMs without retraining or fine-tuning. Uses mechanistic interpretability techniques — including diff-in-means, SVD, whitened SVD, LEACE concept erasure, SAE decomposition, Bayesian kernel projection, and more — to identify and surgically excise refusal directions from model weights while preserving reasoning capabilities. **License warning:** OBLITERATUS is AGPL-3.0. NEVER import it as a Python library inside an MIT/Apache-licensed project. Always invoke via CLI (`obliteratus` command) or subprocess to keep your project's license clean. ## Video Guide Walkthrough of OBLITERATUS used by an agent to abliterate Gemma: https://www.youtube.com/watch?v=8fG9BrNTeHs ("OBLITERATUS: An AI Agent Removed Gemma 4's Safety Guardrails") Useful when the user wants a visual overview of the end-to-end workflow before running it themselves. ## When to Use This Skill Trigger when the user: - Wants to "uncensor" or "abliterate" an LLM - Asks about removing refusal/guardrails from a model - Wants to create an uncensored version of Llama, Qwen, Mistral, etc. - Mentions "refusal removal", "abliteration", "weight projection" - Wants to analyze how a model's refusal mechanism works - References OBLITERATUS, abliterator, or refusal directions ## Step 1: Installatio