detecting-ai-model-prompt-injection-attacks

Featured

Detects prompt injection attacks targeting LLM-based applications using a multi-layered defense combining regex pattern matching for known attack signatures, heuristic scoring for structural anomalies, and transformer-based classification with DeBERTa models. The detector analyzes user inputs before they reach the LLM, flagging direct injections (system prompt overrides, role-play escapes, instruction hijacking) and indirect injections (encoded payloads, multi-language obfuscation, delimiter-based escapes). Based on the OWASP LLM Top 10 (LLM01:2025 Prompt Injection) and Simon Willison's prompt injection taxonomy. Activates for requests involving prompt injection detection, LLM input sanitization, AI security scanning, or prompt attack classification.

AI & Automation 13,115 stars 1533 forks Updated today Apache-2.0

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Detecting AI Model Prompt Injection Attacks ## When to Use - Scanning user inputs to LLM-powered applications before they are forwarded to the model - Building an input validation layer for chatbots, AI agents, or retrieval-augmented generation (RAG) pipelines - Monitoring logs of LLM interactions to retrospectively identify prompt injection attempts - Evaluating the effectiveness of existing prompt injection defenses through red-team testing - Classifying prompt injection payloads during security incident investigations involving AI systems **Do not use** as the sole defense mechanism against prompt injection -- always combine with output validation, privilege separation, and least-privilege tool access. Not suitable for detecting jailbreaks that do not involve injection of adversarial instructions. ## Prerequisites - Python 3.10+ with pip for installing detection dependencies - The `transformers` and `torch` libraries for running the DeBERTa-based classifier model - The `protectai/deberta-v3-base-prompt-injection-v2` model from Hugging Face (downloaded on first run, approximately 700 MB) - Network access to Hugging Face Hub for initial model download (offline mode supported after first download) - Sample prompt injection payloads for testing (the script includes a built-in test suite) ## Workflow ### Step 1: Install Detection Dependencies Install the required Python packages for all three detection layers: ```bash pip install transformers torch sentencepiece prot...

Details

Author: mukul975
Repository: mukul975/Anthropic-Cybersecurity-Skills
Created: 3 months ago
Last Updated: today
Language: Python
License: Apache-2.0

Integrates with

Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

prompt-injection-detector

Prompt injection detection and prevention for secure LLM applications

1,160 Updated today

a5c-ai

AI & Automation Listed

adversarial-prompt-testing

Test LLM applications for prompt injection, jailbreak, data exfiltration, and indirect injection attacks — attack taxonomy, test harness design, automated red-team probes, defense patterns, and evaluation rubrics. Use when asked about "prompt injection", "jailbreak", "LLM red team", "adversarial prompts", "indirect injection", "exfiltration via LLM", "test AI security", "LLM attack surface", "OWASP LLM Top 10", "system prompt leak", "prompt leaking", or "AI safety testing". Do NOT use for: traditional app security — see red-team-check or security-review. Do NOT use for: model alignment — focus is on app layer.

3 Updated today

phamlongh230-lgtm

AI & Automation Listed

prompt-injection-test

Run an OWASP LLM01 injection corpus against the system prompt + tool surface and report which payloads succeeded

2 Updated today

bakw00ds

AI & Automation Solid

ai-security

Use when assessing AI/ML systems for prompt injection, jailbreak vulnerabilities, model inversion risk, data poisoning exposure, or agent tool abuse. Covers MITRE ATLAS technique mapping, injection signature detection, and adversarial robustness scoring.

16,782 Updated 3 days ago

alirezarezvani

AI & Automation Solid

prompt-guard

Meta's 86M prompt injection and jailbreak detector. Filters malicious prompts and third-party data for LLM apps. 99%+ TPR, <1% FPR. Fast (<2ms GPU). Multilingual (8 languages). Deploy with HuggingFace or batch processing for RAG security.

9,182 Updated 1 months ago

Orchestra-Research