testing-guidelisted
Install: claude install-skill akaszubski/autonomous-dev
# Testing Guide
What to test, how to test it, and what NOT to test — for a plugin made of prompt files, Python glue, and configuration.
## Philosophy: GenAI-First Testing
Traditional unit tests work for deterministic logic. But most bugs in this project are **drift** — docs diverge from code, agents contradict commands, component counts go stale. GenAI congruence tests catch these. Unit tests don't.
**Decision rule**: Can you write `assert x == y` and it won't break next week? → Unit test. Otherwise → GenAI test or structural test.
---
## Three Test Patterns
### 1. Judge Pattern (single artifact evaluation)
An LLM evaluates one artifact against criteria. Use for: doc completeness, security posture, architectural intent.
```python
pytestmark = [pytest.mark.genai]
def test_agents_documented_in_claude_md(self, genai):
agents_on_disk = list_agents()
claude_md = Path("CLAUDE.md").read_text()
result = genai.judge(
question="Does CLAUDE.md document all active agents?",
context=f"Agents on disk: {agents_on_disk}\nCLAUDE.md:\n{claude_md[:3000]}",
criteria="All active agents should be referenced. Score by coverage %."
)
assert result["score"] >= 5, f"Gap: {result['reasoning']}"
```
### 2. Congruence Pattern (two-source cross-reference)
The most valuable pattern. An LLM checks two files that should agree. Use for: command↔agent alignment, FORBIDDEN lists, config↔reality.
```python
def test_implement_and_implementer_share_forbidden