← ClaudeAtlas

computer-use-agentslisted

The fundamental architecture of computer use agents: observe screen, reason about next action, execute action, repeat. This loop integrates vision models with action execution through an iterative pipeline.
foolhardy45/portfolio · ★ 0 · AI & Automation · score 64
Install: claude install-skill foolhardy45/portfolio
# Computer Use Agents ## Patterns ### Perception-Reasoning-Action Loop The fundamental architecture of computer use agents: observe screen, reason about next action, execute action, repeat. This loop integrates vision models with action execution through an iterative pipeline. Key components: 1. PERCEPTION: Screenshot captures current screen state 2. REASONING: Vision-language model analyzes and plans 3. ACTION: Execute mouse/keyboard operations 4. FEEDBACK: Observe result, continue or correct Critical insight: Vision agents are completely still during "thinking" phase (1-5 seconds), creating a detectable pause pattern. **When to use**: ['Building any computer use agent from scratch', 'Integrating vision models with desktop control', 'Understanding agent behavior patterns'] ```python from anthropic import Anthropic from PIL import Image import base64 import pyautogui import time class ComputerUseAgent: """ Perception-Reasoning-Action loop implementation. Based on Anthropic Computer Use patterns. """ def __init__(self, client: Anthropic, model: str = "claude-sonnet-4-20250514"): self.client = client self.model = model self.max_steps = 50 # Prevent runaway loops self.action_delay = 0.5 # Seconds between actions def capture_screenshot(self) -> str: """Capture screen and return base64 encoded image.""" screenshot = pyautogui.screenshot() # Resiz