Skip to main content

computer-use

Computer use and GUI automation patterns — when to use GUI automation vs shell/MCP/browser tools, visual validation techniques, native app testing, and guardrails for visual regression workflows

Stars
12
Source
markus41/claude
Updated
2026-05-11
Slug
markus41--claude--computer-use
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/markus41/claude/HEAD/plugins/claude-code-expert/archive/v7.6.0/skills/computer-use/SKILL.md -o .claude/skills/computer-use.md

Drops the SKILL.md into .claude/skills/computer-use.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Computer Use & GUI Automation

Computer use lets Claude interact with GUIs: click buttons, fill forms, take screenshots, and navigate native apps. This is powerful but expensive and slow — use it only when a more precise tool doesn't exist.

Tool Selection Priority

Before reaching for computer use, exhaust these options first:

Task Prefer This Over Computer Use
API endpoint testing Bash + curl Clicking through UI
Database inspection MCP postgres/sqlite Navigating admin UI
File operations Read/Write/Edit Drag-and-drop UI
Web scraping Firecrawl MCP Screenshot + parse
Browser automation Playwright MCP Computer use click
CI status GitHub API / gh CLI Browser navigation
Log inspection Bash + grep Terminal screenshot

Rule: If you can express the task as a shell command or API call, do that. Computer use is the fallback for GUI-only workflows.


When Computer Use Is the Right Choice

1. Native App Validation

Testing a desktop app that has no API or CLI interface.

# Example: Validate Electron app UI after a build
Take a screenshot of the app after launch.
Click the "New Project" button.
Verify the dialog opens with the correct fields.
Fill in project name: "Test Project 2026"
Click Create and verify the project appears in the list.

2. Visual Regression Checks

Detecting layout regressions that unit tests can't catch.

# Workflow:
1. Take baseline screenshot of the current UI state
2. Apply the change
3. Take comparison screenshot
4. Highlight pixel differences > 1%
5. Human reviews diff

3. GUI-Only Admin Tools

Admin panels, legacy enterprise software, and embedded UIs with no API.

# Example: Generate a report from a legacy admin panel
Navigate to: http://admin.internal/reports
Click: "Export" → "CSV" → "Last 30 days"
Wait for download
Move file to: /tmp/report-{date}.csv

4. Local Simulator Flows

Mobile simulator or desktop app testing that requires visual interaction.

# Example: iOS simulator validation
Launch: xcrun simctl launch booted com.example.MyApp
Take screenshot
Verify: "Welcome" text is visible in the header
Tap: "Get Started" button (coordinates or element description)
Verify: onboarding screen loads

Result Verification

Computer use output is inherently visual and unstructured. Always verify results with a structured check after GUI actions:

Verification Pattern

After each GUI action:
1. Take a screenshot
2. Verify the expected visual state (specific text, element position, color)
3. If verification fails: log "FAIL: {what was expected vs. what was seen}"
4. If unsure: take another screenshot from a wider viewport

At the end:
- List each action and its verification result
- Count: {N} actions taken, {M} verified OK, {K} failed

Confidence Levels

Confidence Verification Action
HIGH Text matches exactly / element found by ID Proceed
MEDIUM Visual match but element found by position Log and proceed
LOW Can't find element / ambiguous screenshot Stop, report to human

Safety Guardrails

Computer use can cause irreversible actions (delete files, send emails, submit forms). Apply these guardrails:

Never Without Confirmation

  • Form submissions in production environments
  • Delete or "Archive" actions
  • Payment or billing interactions
  • Sending emails or messages
  • Anything involving real user data

Screenshot Audit Trail

Keep screenshots of:

  • State before any action
  • State after each major action
  • Final state

Dry-Run First

For complex GUI flows, describe the steps and ask for confirmation before executing:

Before I click "Submit", here's what will happen:
- Form data: {summary}
- This action cannot be undone
- Proceeding? (yes/no)

Computer Use vs. Playwright MCP

For web UIs, Playwright MCP is almost always better than computer use:

Playwright MCP Computer Use
Reliability High (DOM-based) Medium (pixel-based)
Speed Fast Slow (screenshot per action)
Testability Scriptable, repeatable Hard to reproduce exactly
Cost Low High (vision model per screenshot)
Works on Web browsers Any visual surface

Use Playwright MCP for: Web app testing, scraping, form automation on websites.

Use Computer Use for: Native desktop apps, embedded UIs, legacy apps with no API.


Cost Awareness

Computer use is expensive:

  • Each screenshot = vision model inference (high token cost)
  • A 10-step GUI flow = 10+ vision inferences
  • Compare: a 10-step shell script = near-zero cost

Estimate before using: If a GUI flow has N steps, expect N × (screenshot tokens + generation tokens). For flows > 20 steps, consider whether a shell/API approach exists.


Claude Desktop Requirement

Computer use requires the Claude Desktop app (not CLI or Web). The Desktop app has the screen capture and input simulation capabilities that CLI lacks.

CLI:     ❌ Computer use not available
Web:     ❌ Computer use not available
Desktop: ✅ Computer use available