nw-agent-testing

Solid

5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance

Testing & QA 526 stars 55 forks Updated 1 weeks ago MIT

Install

View on GitHub

Quality Score: 92/100

Stars 20%
91
Recency 20%
90
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Agent Testing Framework ## 5-Layer Testing Approach ### Layer 1: Output Quality (Unit-Level) Validate agent produces correct, well-structured outputs for typical inputs. **Test**: Agent follows workflow phases | Outputs match expected format/structure | Domain-specific rules correctly applied | Token efficiency within bounds **How**: Manual invocation with representative inputs. Check against acceptance criteria in agent description. ### Layer 2: Integration / Handoff Validation Validate correct input/output between agents in workflows. **Test**: Input parsing handles upstream format | Output format matches downstream expectations | Error signals propagate correctly | Subagent mode activation works (skip greet, execute autonomously) **How**: End-to-end workflow execution through full agent chain (e.g., DISCUSS -> DESIGN -> DELIVER). ### Layer 3: Adversarial Output Validation Challenge validity of agent outputs rather than accepting at face value. **Test**: Source verification (cited sources real and accurate?) | Bias detection (favors one approach without evidence?) | Edge case coverage | Completeness (required sections present?) **How**: Peer review by `-reviewer` agent using structured critique dimensions. ### Layer 4: Adversarial Verification (Peer Review) Independent review to catch biases and blind spots in agent design. **Test**: Definition follows validation checklist? | Redundant Claude default instructions? | Over/under-specified? | Could simpler ag...

Details

Author
nWave-ai
Repository
nWave-ai/nWave
Created
3 months ago
Last Updated
1 weeks ago
Language
Python
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

Testing & QA Solid

nw-hexagonal-testing

5-layer agent output validation, I/O contract specification, vertical slice development, and test doubles policy with per-layer examples

526 Updated 1 weeks ago
nWave-ai
AI & Automation Listed

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on re...

5 Updated today
rootcastleco
AI & Automation Solid

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

27,705 Updated today
davila7
AI & Automation Listed

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

36 Updated today
cleodin
AI & Automation Listed

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

335 Updated today
aiskillstore