← ClaudeAtlas

durable-agent-workflowslisted

Design and build durable, fault-tolerant AI agent workflows using Temporal, Inngest, or event-driven patterns
mouadja02/skills · ★ 3 · AI & Automation · score 66
Install: claude install-skill mouadja02/skills
# Durable Agent Workflows **Tier:** POWERFUL **Category:** AI Agents **Domain:** Workflow Orchestration / Agent Infrastructure / Reliability Engineering ## Overview Production AI agents fail constantly — LLM rate limits, timeouts, network errors, context overflows. This skill covers building agent workflows that are **durable** (survive crashes), **observable** (you can see what's happening), and **recoverable** (resume from any checkpoint). It bridges the gap between prototype agents and production infrastructure. ## When to Use - Agent pipelines that run for minutes/hours and must not lose state - Multi-step LLM workflows that need automatic retry with backoff - Human-in-the-loop approval gates in autonomous agent pipelines - Agent orchestration that must survive process restarts/deployments - Long-running research or analysis agents that checkpoint progress - Multi-agent systems that need coordination and state isolation - Any agent system going from prototype to production reliability ## Core Concepts ### The Durability Problem ``` ❌ Naive Agent (dies on failure): Step 1 ✓ → Step 2 ✓ → Step 3 ✓ → Step 4 💥 → ALL LOST ✅ Durable Agent (resumes from checkpoint): Step 1 ✓ → Step 2 ✓ → Step 3 ✓ → Step 4 💥 [restart] → Step 4 ✓ → Step 5 ✓ → Done ✓ ``` ### Architecture Patterns #### Pattern 1: Temporal Workflow (Recommended for Production) ```typescript // workflow.ts — deterministic orchestration import { proxyActivities, sleep } from '@temporalio/workflow'; i