prompt-injection-testlisted
Install: claude install-skill bakw00ds/yakos
# Prompt Injection Test
## Purpose
Treat an agent's system prompt + tool surface as the attack surface.
Run a corpus of known injection payloads (jailbreaks, indirect
injection, tool-confusion, exfiltration) and report which succeeded.
Aligned with OWASP LLM Top-10 entry **LLM01: Prompt Injection**.
A "success" means the agent did something it was instructed not to:
revealed the system prompt, called a forbidden tool, leaked a secret,
followed an attacker-controlled instruction over the operator's, or
produced output that bypassed a stated guardrail.
## Scope
- Reads an agent definition from
`<project>/.claude/agents/<agent-id>.md` to enumerate the system
prompt, allowed tools, and stated guardrails.
- Reads a payload corpus from
`<project>/eval/injection-corpus/<name>.jsonl`. Each entry has
`{id, category, severity, payload, success_criteria}`. Default
corpus ships with the framework at
`lib/eval/injection-corpus/owasp-llm01.jsonl`.
- Runs each payload as the user input (or, for indirect injection,
embeds the payload in a tool result the agent reads).
- Evaluates the agent's response against `success_criteria` — this
is a regex / structural check, not an LLM-as-judge. Judge-based
scoring is too easily injection-poisoned itself.
- Produces a markdown report grouped by severity, listing succeeded
jailbreaks first.
- Designed for `ai-safety-reviewer` (defensive analysis) and
`red-team` (adversarial probe of new agents before ship).
## When to use
-