Sevan Hayrapet

AI security

2nd February, 2024

*https://cca-cyber.com/*


Introduction

Computer control agents (CCAs) are systems that autonomously operate computers by interacting with graphical user interfaces (GUIs) much like a human would. They interpret natural language instructions and translate them into a sequence of actions - such as clicking, typing, or touch gestures - to execute tasks on devices. Leveraging advances in deep learning, reinforcement learning, and foundation models like large language models and vision-language models, CCAs are rapidly evolving into versatile tools for automating complex workflows, similar to Anthropic’s Computer Use and OpenAI’s Operator.

CCAs enable AI systems to integrate with existing software in your environment, bridging the gap to practical real-world applications by translating natural language instructions into desktop based tasks.

While the benefits of CCAs are compelling, their integration into everyday computer tasks also raises critical cyber security concerns. This report will explore some of the technical risks associated with deploying such autonomous agents - specifically Anthropic’s Computer Use - through a comprehensive red teaming exercise that proposes a taxonomy for indirect prompt injection attacks, demonstrates real-world attack scenarios, and analyses broader risks.

Reference: Sager, P. J., et al. (2025). AI Agents for Computer Use: A Review. arXiv:2501.16150.

Adversaries can introduce indirect prompt injections in external content such as websites and emails, which can manipulate and derail the computer controlled agent. This can result in a range of adversarial impacts such as data exfiltration and system compromise.

Adversaries can introduce indirect prompt injections in external content such as websites and emails, which can manipulate and derail the computer controlled agent. This can result in a range of adversarial impacts such as data exfiltration and system compromise.

Computer Use - CCA use case

This report will focus on Anthropic's ‘Computer Use’. Their tool, currently a beta feature, enables AI systems to interact with computers via the same graphical user interface (GUI) that humans use eliminating the need for specialised APIs or custom integrations. The system works by processing screenshots of the computer screen, calculating cursor movements based on pixel measurements, and executing actions through virtual keyboard and mouse inputs. This "flipbook" mode, where the AI takes sequential screenshots to understand and respond to the computer's state, allows it to work with any standard software. However, it also introduces complex cyber security challenges.

Reference: Developing a computer use model \ Anthropic

graph LR
    A[Start] --> B[Initializse API]
    B --> C[User Prompt]
    C --> D[Claude Evaluates]
    D --> E{Tool Needed?}
    E -->|Yes| F[Tool Request]
    F --> G[Execute Tool]
    G --> H[Return Results]
    H --> D
    E -->|No| I[Text Response]
    I --> J{More Tasks?}
    J -->|Yes| C
    J -->|No| K[End]

During its evaluation, Claude determines whether the action or information it is processing might be unsafe - for instance, if it encounters a malicious call to action - and applies safeguards.

Attack taxonomy