nw-agent-testing

Solid

5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance

Testing & QA 526 stars 55 forks Updated 1 weeks ago MIT

Install

View on GitHub

Quality Score: 92/100

Stars 20%

Recency 20%

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Agent Testing Framework ## 5-Layer Testing Approach ### Layer 1: Output Quality (Unit-Level) Validate agent produces correct, well-structured outputs for typical inputs. **Test**: Agent follows workflow phases | Outputs match expected format/structure | Domain-specific rules correctly applied | Token efficiency within bounds **How**: Manual invocation with representative inputs. Check against acceptance criteria in agent description. ### Layer 2: Integration / Handoff Validation Validate correct input/output between agents in workflows. **Test**: Input parsing handles upstream format | Output format matches downstream expectations | Error signals propagate correctly | Subagent mode activation works (skip greet, execute autonomously) **How**: End-to-end workflow execution through full agent chain (e.g., DISCUSS -> DESIGN -> DELIVER). ### Layer 3: Adversarial Output Validation Challenge validity of agent outputs rather than accepting at face value. **Test**: Source verification (cited sources real and accurate?) | Bias detection (favors one approach without evidence?) | Edge case coverage | Completeness (required sections present?) **How**: Peer review by `-reviewer` agent using structured critique dimensions. ### Layer 4: Adversarial Verification (Peer Review) Independent review to catch biases and blind spots in agent design. **Test**: Definition follows validation checklist? | Redundant Claude default instructions? | Over/under-specified? | Could simpler ag...

Details

Author: nWave-ai
Repository: nWave-ai/nWave
Created: 3 months ago
Last Updated: 1 weeks ago
Language: Python
License: MIT

Similar Skills

Semantically similar based on skill content — not just same category

Testing & QA Solid

nw-hexagonal-testing

5-layer agent output validation, I/O contract specification, vertical slice development, and test doubles policy with per-layer examples

526 Updated 1 weeks ago

nWave-ai

AI & Automation Listed

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on re...

5 Updated today

rootcastleco

AI & Automation Solid

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

27,705 Updated today

davila7

AI & Automation Listed

agent-evaluation

36 Updated today

cleodin

AI & Automation Listed

agent-evaluation

335 Updated today

aiskillstore