AlexDuchDev

agent-builder

Build, implement, review, or harden a SINGLE production-grade agent — its contract, schemas, tools and permission tiers, durable state, guardrails, traces, and evals. Use when the user wants to create one agent (not a multi-agent product), implement an agent runner, add tools/memory/evals to an existing agent, or review whether one agent is production-ready. For multi-agent products, orchestration, or framework selection, use the agentic-product-architect skill instead. The full operational standard this skill applies is AGENT_STANDARD.md at the repo root; copy-paste artifacts are in templates/.

agentic-product-architect

Master skill for building production-grade agentic products — software systems where part of the process is dynamically directed by LLMs within deterministic architecture with explicit trust boundaries. Use this skill whenever the user mentions building an agent, agentic product, agentic workflow, AI agent, multi-agent system, agent loop, agent harness, or asks how to design, architect, ship, or harden any system with LLM-driven decision-making. Also use when they reference frameworks like LangGraph, CrewAI, OpenAI Agents SDK, Claude Agent SDK, Pydantic AI, AutoGen, or when they want to add tools, memory, evals, or human-in-the-loop to an LLM system. This is the entry point — it routes to specialized sub-skills for architecture, context engineering, harness, tools/MCP, memory, durable execution, evals, framework choice, production readiness, and antipattern review.

antipatterns-review

Review existing agentic code, designs, or plans through the lens of the 12 canonical antipatterns. Diagnose what's likely to fail in production. Use whenever the user asks you to review their agent code, asks "what's wrong with this design," is debugging mysterious failures, or wants a second opinion on an architecture. Also use proactively when you notice any of the 12 antipatterns in a conversation, even if the user didn't ask for review.

architecture-design

Design the architecture of an agentic product — choose the autonomy level (L0–L4), compose solutions from the 5 canonical patterns (prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer), decide single-agent vs multi-agent, and identify the right production exemplar to model after. Use whenever the user is starting a new agentic project, restructuring an existing one, asking "what pattern should I use," debating single vs multi-agent, or trying to decide between a deterministic workflow and an autonomous agent loop.

context-engineering

Engineer what goes into the LLM context window — system prompts, retrieved docs, tool schemas, conversation history, memory, examples. Apply the four operations write/select/compress/isolate to manage context as a finite resource. Enforce the 40% rule on context utilization. Use whenever the user is designing system prompts, debugging quality degradation in long conversations, hitting context limits, managing per-step retrieval, dealing with sub-agent context isolation, or asking about "context engineering" / "prompt engineering" / CLAUDE.md / AGENTS.md / instruction files.

eval-driven-dev

Build the evaluation discipline that separates production agentic products from demos — error analysis on real traces, the three-level eval pyramid (code assertions / LLM-as-judge / human review), binary judge outputs calibrated against human labels, and CI gates that block regression. Based on the Husain/Shankar methodology. Use whenever the user mentions evals, evaluation, LLM-as-judge, hallucination testing, regression testing for AI, quality measurement, error analysis, "how do I know if my agent works," failure modes, or grading agent outputs.

framework-selection

Choose the right agentic framework — LangGraph, OpenAI Agents SDK, Claude Agent SDK, CrewAI, Pydantic AI, AutoGen/AG2, LlamaIndex Workflows, Semantic Kernel, Mastra, DSPy, mcp-agent — based on the team's dominant constraint, not hype. Use whenever the user asks "which framework should I use," compares any two of these, hits limits with their current framework, or is starting a new project and needs to pick the stack.

harness-engineering

Design the harness — the 7-layer scaffolding around the LLM loop that makes agents reliable. Covers the agent loop itself (gather/act/verify), context management, durable execution, guardrails, human-in-the-loop, evals, and observability. In production agents, the harness is 98% of the code. Use whenever the user is structuring code around an agent loop, asks "how do I make this reliable / production-ready," is implementing verification, retry logic, sub-agent delegation, permission systems, approval gates, or wants to understand what makes Claude Code / Codex / Devin work beyond the model.

memory-architecture

Choose and design long-term memory for agents — Mem0, Zep, Letta (MemGPT), LangMem, or files-in-repo. Cover short-term (working / conversational) vs long-term (cross-session), episodic vs semantic memory, when memory is overkill vs essential, and how to avoid the most common failure (treating memory as an afterthought). Use whenever the user mentions long-term memory, persistent memory, personalization across sessions, "remembering past conversations," Mem0/Zep/Letta/MemGPT/LangMem, or hits the limit of conversation history.

tool-design-mcp

Design tools for agents — function/tool definitions, MCP (Model Context Protocol) servers, tool routing when there are many tools, structured outputs, and the rules of thumb that prevent tool selection failures. Use whenever the user is adding tools/functions to an agent, integrating external systems, building or consuming MCP servers, hitting "the agent picks the wrong tool" failures, designing function signatures, choosing between MCP and direct function calling, or wondering how many tools is too many.