testing-guidelisted

GenAI-first testing with structural assertions, congruence validation, and tier-based test structure. Use when writing tests, setting up test infrastructure, or validating coverage. TRIGGER when: test, pytest, coverage, TDD, test patterns, congruence, validation. DO NOT TRIGGER when: production code implementation, documentation, config-only changes.
akaszubski/autonomous-dev · ★ 29 · Testing & QA · score 68

Install: claude install-skill akaszubski/autonomous-dev

# Testing Guide What to test, how to test it, and what NOT to test — for a plugin made of prompt files, Python glue, and configuration. ## Philosophy: GenAI-First Testing Traditional unit tests work for deterministic logic. But most bugs in this project are **drift** — docs diverge from code, agents contradict commands, component counts go stale. GenAI congruence tests catch these. Unit tests don't. **Decision rule**: Can you write `assert x == y` and it won't break next week? → Unit test. Otherwise → GenAI test or structural test. --- ## Three Test Patterns ### 1. Judge Pattern (single artifact evaluation) An LLM evaluates one artifact against criteria. Use for: doc completeness, security posture, architectural intent. ```python pytestmark = [pytest.mark.genai] def test_agents_documented_in_claude_md(self, genai): agents_on_disk = list_agents() claude_md = Path("CLAUDE.md").read_text() result = genai.judge( question="Does CLAUDE.md document all active agents?", context=f"Agents on disk: {agents_on_disk}\nCLAUDE.md:\n{claude_md[:3000]}", criteria="All active agents should be referenced. Score by coverage %." ) assert result["score"] >= 5, f"Gap: {result['reasoning']}" ``` ### 2. Congruence Pattern (two-source cross-reference) The most valuable pattern. An LLM checks two files that should agree. Use for: command↔agent alignment, FORBIDDEN lists, config↔reality. ```python def test_implement_and_implementer_share_forbidden