computer-control

Solid

Automate desktop GUI workflows via Claude computer use API with screenshot capture and mouse/keyboard control.

AI & Automation 308 stars 27 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
83
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Computer Control Skill Use Claude's Computer Use API to see and control desktop environments through screenshots and mouse/keyboard actions. ## When To Use - Automating GUI-based workflows that lack CLI alternatives - Testing web applications through visual interaction - Filling forms, navigating menus, or interacting with desktop apps - Building automation pipelines that need visual verification ## When NOT To Use - Tasks achievable through CLI or API (no GUI needed) - Browser automation better served by Playwright or CDP ## Architecture The computer use system has three layers: 1. **Display Toolkit** (`phantom.display`) - executes OS-level actions via xdotool/scrot on the real or virtual display 2. **Agent Loop** (`phantom.loop`) - manages the conversation cycle between Claude API and the display toolkit 3. **CLI** (`phantom.cli`) - command-line interface for running tasks or checking environment readiness ``` User Task | v Agent Loop <----> Claude API (beta) | | v v Display Toolkit tool_use responses | (click, type, screenshot) v OS Commands (xdotool, scrot) | v Display (X11 / Xvfb / WSLg) ``` ## Quick Start ### Check environment ```bash cd plugins/phantom uv run python -m phantom.cli --check ``` ### Run a task ```bash export ANTHROPIC_API_KEY="sk-ant-..." uv run python -m phantom.cli "Open Firefox and search for Claude AI" ``` ### Use in Python ```python from p...

Details

Author
athola
Repository
athola/claude-night-market
Created
6 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category