detecting-ai-model-prompt-injection-attacks
FeaturedDetects prompt injection attacks targeting LLM-based applications using a multi-layered defense combining regex pattern matching for known attack signatures, heuristic scoring for structural anomalies, and transformer-based classification with DeBERTa models. The detector analyzes user inputs before they reach the LLM, flagging direct injections (system prompt overrides, role-play escapes, instruction hijacking) and indirect injections (encoded payloads, multi-language obfuscation, delimiter-based escapes). Based on the OWASP LLM Top 10 (LLM01:2025 Prompt Injection) and Simon Willison's prompt injection taxonomy. Activates for requests involving prompt injection detection, LLM input sanitization, AI security scanning, or prompt attack classification.
Install
Quality Score: 99/100
Skill Content
Details
- Author
- mukul975
- Repository
- mukul975/Anthropic-Cybersecurity-Skills
- Created
- 3 months ago
- Last Updated
- today
- Language
- Python
- License
- Apache-2.0
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
prompt-injection-detector
Prompt injection detection and prevention for secure LLM applications
adversarial-prompt-testing
Test LLM applications for prompt injection, jailbreak, data exfiltration, and indirect injection attacks — attack taxonomy, test harness design, automated red-team probes, defense patterns, and evaluation rubrics. Use when asked about "prompt injection", "jailbreak", "LLM red team", "adversarial prompts", "indirect injection", "exfiltration via LLM", "test AI security", "LLM attack surface", "OWASP LLM Top 10", "system prompt leak", "prompt leaking", or "AI safety testing". Do NOT use for: traditional app security — see red-team-check or security-review. Do NOT use for: model alignment — focus is on app layer.
prompt-injection-test
Run an OWASP LLM01 injection corpus against the system prompt + tool surface and report which payloads succeeded
ai-security
Use when assessing AI/ML systems for prompt injection, jailbreak vulnerabilities, model inversion risk, data poisoning exposure, or agent tool abuse. Covers MITRE ATLAS technique mapping, injection signature detection, and adversarial robustness scoring.
prompt-guard
Meta's 86M prompt injection and jailbreak detector. Filters malicious prompts and third-party data for LLM apps. 99%+ TPR, <1% FPR. Fast (<2ms GPU). Multilingual (8 languages). Deploy with HuggingFace or batch processing for RAG security.