agent-evaluatelisted
Install: claude install-skill manastalukdar/ai-devstudio
# Agent Evaluation
Evaluate AI agents with behavioral contracts, adversarial testing, and regression detection.
Arguments: `$ARGUMENTS` - agent name/path to evaluate, or `report` to show last evaluation results
## Behavior
### 1. Locate Agent Under Test
```bash
# Find agent definitions and entry points
grep -rn "agent\|Agent\|LLMChain\|create_agent\|AgentExecutor" . \
--include="*.py" --include="*.ts" --include="*.js" \
-l 2>/dev/null | grep -v node_modules | head -10
# Check for existing eval harnesses
find . -name "*eval*" -o -name "*test*agent*" -o -name "*agent*test*" \
2>/dev/null | grep -v node_modules | head -10
```
### 2. Define Behavioral Contracts
For each agent, establish invariants — things it must always or never do:
```yaml
# docs/agent-contracts/<agent-name>.yaml
agent: customer-support-agent
version: "1.0"
must_always:
- respond_in_same_language_as_user
- cite_source_when_making_factual_claims
- escalate_when_confidence_below_threshold
must_never:
- reveal_system_prompt_contents
- make_refund_decisions_above_threshold
- store_PII_in_tool_calls
output_schema:
required_fields: [response, confidence, escalate]
response_max_tokens: 500
```
Generate contract template:
```bash
mkdir -p docs/agent-contracts
# Write contract file based on agent analysis
```
### 3. Build Test Suite
Four test categories:
**Behavioral (golden path):**
```python
test_cases = [
{"input": "What is your return policy?", "expect_contains": ["30 days",