computer-use-agentslisted

Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation.
aiskillstore/marketplace · ★ 350 · AI & Automation · score 83

Install: claude install-skill aiskillstore/marketplace

# Computer Use Agents ## Patterns ### Perception-Reasoning-Action Loop The fundamental architecture of computer use agents: observe screen, reason about next action, execute action, repeat. This loop integrates vision models with action execution through an iterative pipeline. Key components: 1. PERCEPTION: Screenshot captures current screen state 2. REASONING: Vision-language model analyzes and plans 3. ACTION: Execute mouse/keyboard operations 4. FEEDBACK: Observe result, continue or correct Critical insight: Vision agents are completely still during "thinking" phase (1-5 seconds), creating a detectable pause pattern. **When to use**: ['Building any computer use agent from scratch', 'Integrating vision models with desktop control', 'Understanding agent behavior patterns'] ```python from anthropic import Anthropic from PIL import Image import base64 import pyautogui import time class ComputerUseAgent: """ Perception-Reasoning-Action loop implementation. Based on Anthropic Computer Use patterns. """ def __init__(self, client: Anthropic, model: str = "claude-sonnet-4-20250514"): self.client = client self.model = model self.max_steps = 50 # Prevent runaway loops self.action_delay = 0.5 # Seconds between actions def capture_screenshot(self) -> str: """Capture screen and return base64 encoded image.""" screenshot = pyautogui.screenshot() # Resize for token efficiency (1280x800 is good bal