Docker Computer Use Agent Starter

OpenAI's computer use works doesn't work with just screenshots from Playwright because screenshots from Playwright don't include select options, alerts, etc. So to reliably use computer use, you need to be able to see the entire screen.

This repo contains a Docker image that runs a full desktop environment with VNC access and Chrome CDP enabled. It also contains a simple Python API that is designed to work with OpenAI's Computer Use Agent tool call function types.

Use this as a starting point for your own computer use agent, where you can use Computer Use to control the entire Ubuntu desktop, and Playwright to control the browser programmatically.

Features

Zero dependencies
Ubuntu 24.04 base with XFCE desktop environment
Chrome browser with CDP enabled
VNC server for remote viewing
Python API for browser automation
Support for multiple concurrent instances

Quick Start

Build the Docker image:

docker build -t cua-env .

Use the Python API computer.py:

with DockerComputer() as computer:
    # computer automation commands
    computer.type("Hello World")
    computer.click(100, 100)
    computer.screenshot()  # Returns base64 PNG

    # Control the browser with Playwright
    browser = playwright.chromium.connect_over_cdp(
        computer.chrome_cdp_endpoint_url
    )

Ports

VNC: 5900 + id (default = 0)
Chrome CDP: 9222 + id (default = 0)

For multiple instances, you can pass an id argument to the DockerComputer constructor. This increments the port numbers by that amount.

with DockerComputer(id=1) as computer:
    # VNC: 5901
    # Chrome CDP: 9223

    # computer automation commands
    computer.type("Hello World")
    computer.click(100, 100)
    computer.screenshot()  # Returns base64 PNG

    # Control the browser with Playwright
    browser = playwright.chromium.connect_over_cdp(
        computer.chrome_cdp_endpoint_url
    )


with DockerComputer(id=2) as computer:
    # VNC: 5902
    # Chrome CDP: 9224

    # computer automation commands
    computer.type("Hello World")
    computer.click(100, 100)
    computer.screenshot()  # Returns base64 PNG

    # Control the browser with Playwright
    browser = playwright.chromium.connect_over_cdp(
        computer.chrome_cdp_endpoint_url
    )

Available Methods

Methods are compatible with the OpenAI Computer Use Agent tool call function types.

click(x, y, button="left") - Click at coordinates
double_click(x, y) - Double click at coordinates
type(text) - Type text
move(x, y) - Move mouse to coordinates
scroll(x, y, scroll_x, scroll_y) - Scroll at coordinates
keypress(keys) - Press keyboard keys
drag(path) - Drag mouse along path
screenshot() - Take screenshot (returns base64 PNG)
wait(ms=1000) - Wait specified milliseconds

Resource Limits

Memory: 2GB per container
Display: 1280x720

Acknowledgements

OpenAI's own CUA Sample App was used as a starting point for the docker/computer.py file.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
computer.py		computer.py
entrypoint.sh		entrypoint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Docker Computer Use Agent Starter

Features

Quick Start

Ports

Available Methods

Resource Limits

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

zenbase-ai/docker-cua-starter

Folders and files

Latest commit

History

Repository files navigation

Docker Computer Use Agent Starter

Features

Quick Start

Ports

Available Methods

Resource Limits

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages