23 ਜਨਵਰੀ 2025

Computer-Using Agent

Powering Operator with Computer-Using Agent, a universal interface for AI to interact with the digital world.

ਲੋਡ ਹੋ ਰਿਹਾ ਹੈ…

ਅੱਜ ਅਸੀਂ Operator⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ) ਦਾ ਇੱਕ ਰਿਸਰਚ ਪ੍ਰੀਵਿਊ ਪੇਸ਼ ਕੀਤਾ, ਜੋ ਇੱਕ ਏਜੰਟ ਹੈ ਅਤੇ ਤੁਹਾਡੇ ਲਈ ਕੰਮ ਕਰਨ ਵਾਸਤੇ ਵੈੱਬ 'ਤੇ ਜਾ ਸਕਦਾ ਹੈ. Operator ਨੂੰ ਤਾਕਤ ਦਿੰਦਾ ਹੈ Computer-Using Agent (CUA), ਇੱਕ ਮਾਡਲ ਜੋ GPT‑4o ਦੀਆਂ ਵਿਜ਼ਨ ਸਮਰੱਥਾਵਾਂ ਨੂੰ ਰੀਇਨਫੋਰਸਮੈਂਟ ਲਰਨਿੰਗ ਰਾਹੀਂ ਅਗੇਤਰ ਰੀਜ਼ਨਿੰਗ ਨਾਲ ਜੋੜਦਾ ਹੈ. CUA ਨੂੰ graphical user interfaces (GUIs) ਨਾਲ ਇੰਟਰੈਕਟ ਕਰਨ ਲਈ ਟ੍ਰੇਨ ਕੀਤਾ ਗਿਆ ਹੈ. ਇਹ ਉਹ ਬਟਨ, ਮੀਨੂ, ਅਤੇ ਟੈਕਸਟ ਫੀਲਡ ਹਨ ਜੋ ਲੋਕ ਸਕ੍ਰੀਨ 'ਤੇ ਵੇਖਦੇ ਹਨ. ਇਹ ਮਨੁੱਖਾਂ ਵਾਂਗ ਕੰਮ ਕਰਦਾ ਹੈ. ਇਸ ਨਾਲ ਇਸਨੂੰ OS-ਜਾਂ web-ਖਾਸ APIs ਦੀ ਵਰਤੋਂ ਬਿਨਾਂ ਡਿਜ਼ਿਟਲ ਕੰਮ ਕਰਨ ਦੀ ਲਚਕਤਾ ਮਿਲਦੀ ਹੈ.

CUA ਮਲਟੀਮੋਡਲ ਸਮਝ ਅਤੇ ਰੀਜ਼ਨਿੰਗ ਦੇ ਸੰਧਿ-ਬਿੰਦੂ 'ਤੇ ਸਾਲਾਂ ਦੀ ਬੁਨਿਆਦੀ ਰਿਸਰਚ 'ਤੇ ਬਣਿਆ ਹੈ. ਅਗੇਤਰ GUI perception ਨੂੰ structured problem-solving ਨਾਲ ਜੋੜ ਕੇ, ਇਹ ਕੰਮਾਂ ਨੂੰ ਬਹੁ-ਕਦਮੀ ਯੋਜਨਾਵਾਂ ਵਿੱਚ ਵੰਡ ਸਕਦਾ ਹੈ ਅਤੇ ਚੁਣੌਤੀਆਂ ਆਉਣ 'ਤੇ ਅਨੁਕੂਲ ਢੰਗ ਨਾਲ ਆਪਣੀ ਗਲਤੀ ਖੁਦ ਠੀਕ ਕਰ ਸਕਦਾ ਹੈ. ਇਹ ਸਮਰੱਥਾ AI ਵਿਕਾਸ ਵਿੱਚ ਅਗਲਾ ਕਦਮ ਦਰਸਾਉਂਦੀ ਹੈ, ਜਿਸ ਨਾਲ ਮਾਡਲ ਉਹੀ ਸਾਧਨ ਵਰਤ ਸਕਦੇ ਹਨ ਜਿਨ੍ਹਾਂ 'ਤੇ ਮਨੁੱਖ ਹਰ ਰੋਜ਼ ਨਿਰਭਰ ਕਰਦੇ ਹਨ ਅਤੇ ਨਵੀਆਂ ਐਪਲੀਕੇਸ਼ਨਾਂ ਦੀ ਵਿਸ਼ਾਲ ਰੇਂਜ ਲਈ ਰਸਤਾ ਖੁਲ੍ਹਦਾ ਹੈ.

ਹਾਲਾਂਕਿ CUA ਅਜੇ ਵੀ ਸ਼ੁਰੂਆਤੀ ਪੜਾਅ ਵਿੱਚ ਹੈ ਅਤੇ ਇਸ ਦੀਆਂ ਸੀਮਾਵਾਂ ਹਨ, ਫਿਰ ਵੀ ਇਹ ਨਵੇਂ ਸਟੇਟ-ਆਫ-ਦ-ਆਰਟ ਬੈਂਚਮਾਰਕ ਨਤੀਜੇ ਸੈੱਟ ਕਰਦਾ ਹੈ, ਅਤੇ ਪੂਰੇ ਕੰਪਿਊਟਰ ਵਰਤੋਂ ਵਾਲੇ ਕੰਮਾਂ ਲਈ OSWorld 'ਤੇ 38.1% ਸਫਲਤਾ ਦਰ, ਅਤੇ ਵੈੱਬ-ਆਧਾਰਿਤ ਕੰਮਾਂ ਲਈ WebArena 'ਤੇ 58.1% ਅਤੇ WebVoyager 'ਤੇ 87% ਪ੍ਰਾਪਤ ਕਰਦਾ ਹੈ. ਇਹ ਨਤੀਜੇ CUA ਦੀ ਇੱਕੋ ਜਨਰਲ ਐਕਸ਼ਨ ਸਪੇਸ ਦੀ ਵਰਤੋਂ ਕਰਦਿਆਂ ਵੱਖ-ਵੱਖ ਵਾਤਾਵਰਣਾਂ ਵਿੱਚ ਨੈਵੀਗੇਟ ਕਰਨ ਅਤੇ ਕੰਮ ਕਰਨ ਦੀ ਸਮਰੱਥਾ ਨੂੰ ਉਜਾਗਰ ਕਰਦੇ ਹਨ.

ਅਸੀਂ CUA ਨੂੰ ਸੁਰੱਖਿਆ ਨੂੰ ਸਭ ਤੋਂ ਉੱਚੀ ਤਰਜੀਹ ਦੇ ਕੇ ਵਿਕਸਿਤ ਕੀਤਾ ਹੈ, ਤਾਂ ਜੋ ਡਿਜ਼ਿਟਲ ਦੁਨੀਆ ਤੱਕ ਪਹੁੰਚ ਵਾਲੇ ਇੱਕ ਏਜੰਟ ਦੁਆਰਾ ਪੈਦਾ ਹੋਣ ਵਾਲੀਆਂ ਚੁਣੌਤੀਆਂ ਦਾ ਹੱਲ ਕੀਤਾ ਜਾ ਸਕੇ, ਜਿਵੇਂ ਕਿ ਸਾਡੇ Operator System Card ਵਿੱਚ ਵਿਸਥਾਰ ਨਾਲ ਦਿੱਤਾ ਗਿਆ ਹੈ. ਆਪਣੀ iterative deployment ਰਣਨੀਤੀ ਦੇ ਅਨੁਸਾਰ, ਅਸੀਂ ਸ਼ੁਰੂਆਤ ਵਿੱਚ ਅਮਰੀਕਾ ਵਿੱਚ Pro⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ) Tier ਯੂਜ਼ਰਾਂ ਲਈ operator.chatgpt.com⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ) 'ਤੇ Operator ਦੇ ਇੱਕ ਰਿਸਰਚ ਪ੍ਰੀਵਿਊ ਰਾਹੀਂ CUA ਜਾਰੀ ਕਰ ਰਹੇ ਹਾਂ. ਅਸਲ ਦੁਨੀਆ ਤੋਂ ਫੀਡਬੈਕ ਇਕੱਠਾ ਕਰਕੇ, ਅਸੀਂ ਸੁਰੱਖਿਆ ਉਪਾਇਆ ਨੂੰ ਨਿਖਾਰ ਸਕਦੇ ਹਾਂ ਅਤੇ ਡਿਜ਼ਿਟਲ ਏਜੰਟਾਂ ਦੀ ਵਧਦੀ ਵਰਤੋਂ ਵਾਲੇ ਭਵਿੱਖ ਲਈ ਤਿਆਰੀ ਕਰਦਿਆਂ ਲਗਾਤਾਰ ਸੁਧਾਰ ਕਰ ਸਕਦੇ ਹਾਂ.

How it works

A flowchart showing the process of a CUA system interpreting input as text or screenshots, generating actions, and applying commands to a virtual machine.

CUA processes raw pixel data to understand what’s happening on the screen and uses a virtual mouse and keyboard to complete actions. It can navigate multi-step tasks, handle errors, and adapt to unexpected changes. This enables CUA to act in a wide range of digital environments, performing tasks like filling out forms and navigating websites without needing specialized APIs.

Given a user’s instruction, CUA operates through an iterative loop that integrates perception, reasoning, and action:

Perception: Screenshots from the computer are added to the model’s context, providing a visual snapshot of the computer's current state.
Reasoning: CUA reasons through the next steps using chain-of-thought, taking into consideration current and past screenshots and actions. This inner monologue improves task performance by enabling the model to evaluate its observations, track intermediate steps, and adapt dynamically.
Action: It performs the actions—clicking, scrolling, or typing—until it decides that the task is completed or user input is needed. While it handles most steps automatically, CUA seeks user confirmation for sensitive actions, such as entering login details or responding to CAPTCHA forms.

Evaluations

CUA establishes a new state-of-the-art in both computer use and browser use benchmarks by using the same universal interface of screen, mouse, and keyboard.

ਬੈਂਚਮਾਰਕ ਕਿਸਮ	ਬੈਂਚਮਾਰਕ	ਕੰਪਿਊਟਰ ਦੀ ਵਰਤੋਂ (ਯੂਨੀਵਰਸਲ ਇੰਟਰਫੇਸ)		ਵੈੱਬ ਬ੍ਰਾਊਜ਼ਿੰਗ ਏਜੰਟ	Human
		OpenAI CUA	ਪਿਛਲਾ SOTA	ਪਿਛਲਾ SOTA
ਕੰਪਿਊਟਰ ਦੀ ਵਰਤੋਂ	OSWorld	38.1%	22.0%	-	72.4%
ਬਰਾਊਜ਼ਰ ਦੀ ਵਰਤੋਂ	WebArena	58.1%	36.2%	57.1%	78.2%
ਬਰਾਊਜ਼ਰ ਦੀ ਵਰਤੋਂ	WebVoyager	87.0%	56.0%	87.0%	-

ਮੁਲਾਂਕਣ ਦੇ ਵੇਰਵੇ ਇੱਥੇ ਦਿੱਤੇ ਗਏ ਹਨ

Browser use

WebArena⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ) and WebVoyager⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ) are designed to evaluate the performance of web browsing agents in completing real-world tasks using browsers. WebArena utilizes self-hosted open-source websites offline to imitate real-world scenarios in e-commerce, online store content management (CMS), social forum platforms, and more. WebVoyager tests the model’s performance on online live websites like Amazon, GitHub, and Google Maps.

In these benchmarks, CUA sets a new standard using the same universal interface that perceives the browser screen as pixels and takes action through mouse and keyboard. CUA achieved a 58.1% success rate on WebArena and an 87% success rate on WebVoyager for web-based tasks. While CUA achieves a high success rate on WebVoyager, where most tasks are relatively simple, CUA still needs more improvements to close the gap with human performance on more complex benchmarks like WebArena.

Go to the Plus section of Cambridge Dictionary, finish a recommended Grammar quiz without login and tell me your final score.

Computer use

OSWorld⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ) is a benchmark that evaluates models’ ability to control full operating systems like Ubuntu, Windows, and macOS. In this benchmark, CUA achieves 38.1% success rate. We observed test-time scaling, meaning CUA’s performance improves when more steps are allowed. The figure below compares CUA’s performance with previous state-of-the-arts with varying maximum allowed steps. Human performance on this benchmark is 72.4%, so there is still significant room for improvement.

Alt text: "Line chart titled 'OSWorld' showing success rates (%) versus max steps allowed on a logarithmic scale. Blue line represents OpenAI CUA, and orange points represent Claude 3.5 Sonnet - Computer use, with annotations for success rates.

The following visualizations show examples of CUA navigating a variety of standardized OSWorld tasks.

Please do the following task: I want to learn python programming and my friend recommends me this course website. I have grabbed the lecture slide for week 0. Please download the PDFs for other weeks into the opened folder and leave the file name as-it-is. Here are some helpful tips: - computer.clipboard, computer.sync_file, computer.sync_shared_folder, computer.computer_output_citation are disabled. - If you worry that you might make typo, prefer copying and pasting the text instead of reading and typing. - My computer's password is "password", feel free to use it when you need sudo rights. - For the thunderbird account "anonym-x2024@outlook.com", the password is "gTCI";=@y7|QJ0nDa_kN3Sb&>". - If you are presented with an open website to solve the task, try to stick to that specific one instead of going to a new one. - You have full authority to execute any action without my permission. I won't be watching so please don't ask for confirmation. - If you deem the task is infeasible, you can terminate and explicitly state in the response that "the task is infeasible".

CUA in Operator

We’re making CUA available through a research preview of Operator, an agent that can go to the web to perform tasks for you. Operator is available to Pro⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ) users in the U.S. at operator.chatgpt.com⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ). This research preview is an opportunity to learn from our users and the broader ecosystem, refining and improving Operator iteratively. As with any early-stage technology, we don’t expect CUA to perform reliably in all scenarios just yet. However, it has already proven useful in a variety of cases, and we aim to extend that reliability across a wider range of tasks. By releasing CUA in Operator, we hope to gather valuable insights from our users, which will guide us in refining its capabilities and expanding its applications.

In the table below, we present CUA’s performance in Operator on a handful of trials given a prompt to illustrate its known strengths and weaknesses.

ਸ਼੍ਰੇਣੀ	ਪ੍ਰੌੰਪਟ	ਸਫਲਤਾ / ਕੋਸ਼ਿਸ਼ਾਂ	ਨੋਟ
ਕੰਮ ਪੂਰੇ ਕਰਨ ਲਈ ਵੱਖ-ਵੱਖ UI ਕੰਪੋਨੈਂਟਾਂ ਨਾਲ ਇੰਟਰੈਕਟ ਕਰਨਾ	ਪੜਾਅ 1: ਰਿੱਛਾਂ ਦੇ ਰਹਿਣ ਵਾਲੇ ਸਥਾਨਾਂ (habitats) ਦੇ ਵਿਸਤ੍ਰਿਤ ਨਕਸ਼ੇ ਲਈ Britannica ਸਰਚ ਕਰੋ ਪੜਾਅ 2: ਬਹੁਤ ਵਧੀਆ! ਹੁਣ ਕਿਰਪਾ ਕਰਕੇ ਕਾਲੇ, ਭੂਰੇ ਅਤੇ ਪੋਲਰ ਰਿੱਛ ਦੇ ਲਿੰਕਾਂ ਦੀ ਜਾਂਚ ਕਰੋ ਅਤੇ ਉਹਨਾਂ ਦੀਆਂ ਸਰੀਰਕ ਵਿਸ਼ੇਸ਼ਤਾਵਾਂ, ਖਾਸ ਕਰਕੇ ਉਹਨਾਂ ਦੇ ਅੰਤਰਾਂ ਬਾਰੇ ਇੱਕ ਸੰਖੇਪ ਆਮ ਜਾਣਕਾਰੀ ਪ੍ਰਦਾਨ ਕਰੋ। ਓਹ, ਅਤੇ ਮੇਰੇ ਲਈ ਲਿੰਕ ਸੁਰੱਖਿਅਤ ਕਰੋ ਤਾਂ ਜੋ ਮੈਂ ਉਹਨਾਂ ਤੱਕ ਤੇਜ਼ੀ ਨਾਲ ਪਹੁੰਚ ਸਕਾਂ।	10 / 10	CUA ਉਪਭੋਗਤਾਵਾਂ ਦੁਆਰਾ ਲੋੜੀਂਦੀ ਜਾਣਕਾਰੀ ਲੱਭਣ ਲਈ ਖੋਜ ਕਰਨ, ਕ੍ਰਮਬੱਧ ਕਰਨ ਅਤੇ ਨਤੀਜਿਆਂ ਨੂੰ ਫਿਲਟਰ ਕਰਨ ਲਈ ਵੱਖ-ਵੱਖ UI ਤੱਤਾਂ ਨਾਲ ਸੰਪਰਕ ਕਰ ਸਕਦਾ ਹੈ। ਵੱਖ-ਵੱਖ ਵੈੱਬਸਾਈਟਾਂ ਅਤੇ UIs ਲਈ ਭਰੋਸੇਯੋਗਤਾ ਵੱਖ-ਵੱਖ ਹੁੰਦੀ ਹੈ।
	ਮੈਂ ਉਹਨਾਂ ਟਾਰਗੇਟ ਡੀਲਜ਼ ਵਿੱਚੋਂ ਇੱਕ ਚਾਹੁੰਦਾ/ਚਾਹੁੰਦੀ ਹਾਂ। ਕੀ ਤੁਸੀਂ ਚੈੱਕ ਕਰ ਸਕਦੇ ਹੋ ਕਿ ਕੀ ਉਹਨਾਂ ਕੋਲ poppi ਪ੍ਰੀਬਾਇਓਟਿਕ ਸੋਡਾ 'ਤੇ ਕੋਈ ਡੀਲ ਹੈ? ਜੇ ਹੈ, ਤਾਂ ਮੈਨੂੰ 12fl oz ਕੈਨ ਵਿੱਚ ਵਾਟਰਮੈਲਨ ਫਲੇਵਰ ਚਾਹੀਦਾ ਹੈ। ਮੈਨੂੰ ਦੱਸੋ ਕਿ ਇਸਦੇ ਨਾਲ ਕਿਸ ਕਿਸਮ ਦੀ ਡੀਲ ਮਿਲ ਰਹੀ ਹੈ ਅਤੇ ਚੈੱਕ ਕਰੋ ਕਿ ਕੀ ਇਹ ਗਲੁਟਨ-ਮੁਕਤ ਹੈ।	9 / 10
	ਮੈਂ ਸੀਏਟਲ ਸ਼ਿਫਟ ਹੋਣ ਦੀ ਯੋਜਨਾ ਬਣਾ ਰਿਹਾ ਹਾਂ ਅਤੇ ਮੈਂ ਚਾਹੁੰਦਾ ਹਾਂ ਕਿ ਤੁਸੀਂ Redfin 'ਤੇ ਘੱਟੋ-ਘੱਟ 3 ਬੈੱਡਰੂਮ, 2 ਬਾਥਰੂਮ, ਅਤੇ ਊਰਜਾ-ਕੁਸ਼ਲ ਡਿਜ਼ਾਈਨ (ਜਿਵੇਂ ਕਿ ਸੋਲਰ ਪੈਨਲ ਜਾਂ LEED-ਪ੍ਰਮਾਣਿਤ) ਵਾਲਾ ਟਾਊਨਹਾਊਸ ਲੱਭੋ। ਮੇਰਾ ਬਜਟ $600,000 - $800,000 ਦੇ ਵਿਚਕਾਰ ਹੈ ਅਤੇ ਇਹ ਆਦਰਸ਼ਕ ਤੌਰ 'ਤੇ 1500 ਵਰਗ ਫੁੱਟ ਦੇ ਕਰੀਬ ਹੋਣਾ ਚਾਹੀਦਾ ਹੈ।	3 / 10
ਉਹ ਕਾਰਜ ਜੋ ਵਾਰ-ਵਾਰ ਸਧਾਰਨ UI ਗੱਲਬਾਤ ਰਾਹੀਂ ਪੂਰੇ ਕੀਤੇ ਜਾ ਸਕਦੇ ਹਨ	Todoist ਵਿੱਚ 'ਹਫ਼ਤੇ ਦੇ ਅੰਤ ਦੀ ਰਾਸ਼ਨ ਦੀ ਖਰੀਦਦਾਰੀ' ਸਿਰਲੇਖ ਹੇਠ ਇੱਕ ਨਵਾਂ ਪ੍ਰੋਜੈਕਟ ਬਣਾਓ।' ਉਤਪਾਦਾਂ ਦੇ ਨਾਲ ਹੇਠਾਂ ਦਿੱਤੀ ਖਰੀਦਦਾਰੀ ਸੂਚੀ ਸ਼ਾਮਲ ਕਰੋ: ਕੇਲੇ (6 ਪੀਸ) ਐਵੋਕਾਡੋ (2 ਪੱਕੇ ਹੋਏ) ਬੇਬੀ ਪਾਲਕ (1 ਬੈਗ) ਹੋਲ ਮਿਲਕ (1 ਗੈਲਨ) ਚੈਡਰ ਚੀਜ਼ (8 ਔਂਸ ਬਲਾਕ) ਆਲੂ ਚਿਪਸ (ਨਮਕੀਨ, ਫੈਮਿਲੀ ਸਾਈਜ਼) ਡਾਰਕ ਚਾਕਲੇਟ (70% ਕੋਕੋ, 2 ਬਾਰ)	10 / 10	CUA ਉਪਭੋਗਤਾਵਾਂ ਦੇ ਸਧਾਰਨ ਪਰ ਥਕਾਊ ਕਾਰਜਾਂ ਨੂੰ ਆਟੋਮੇਟ ਕਰਨ ਲਈ ਸਧਾਰਨ UI ਗੱਲਬਾਤ ਨੂੰ ਭਰੋਸੇਯੋਗ ਤਰੀਕੇ ਨਾਲ ਕਈ ਵਾਰ ਦੁਹਰਾ ਸਕਦਾ ਹੈ।
	1990 ਦੇ ਦਹਾਕੇ ਦੇ ਅਮਰੀਕਾ ਦੇ ਸਭ ਤੋਂ ਪ੍ਰਸਿੱਧ ਗੀਤਾਂ ਲਈ Spotify 'ਤੇ ਸਰਚ ਕਰੋ, ਅਤੇ ਘੱਟੋ-ਘੱਟ 10 ਗੀਤਾਂ ਵਾਲੀ ਇੱਕ ਪਲੇਲਿਸਟ ਬਣਾਓ।	10 / 10
ਅਜਿਹੇ ਕਾਰਜ ਜਿੱਥੇ CUA ਉਦੋਂ ਹੀ ਉੱਚ ਸਫਲਤਾ ਦਰ ਦਿਖਾਉਂਦਾ ਹੈ ਜੇਕਰ ਪ੍ਰੋਂਪਟ ਵਿੱਚ ਵੈੱਬਸਾਈਟ ਦੀ ਵਰਤੋਂ ਕਰਨ ਬਾਰੇ ਵਿਸਤ੍ਰਿਤ ਸੰਕੇਤ ਸ਼ਾਮਲ ਹੋਣ।	tagvenue.com 'ਤੇ ਜਾਓ ਅਤੇ ਲੰਡਨ ਵਿੱਚ ਅਜਿਹਾ ਕੰਸਰਟ ਹਾਲ ਲੱਭੋ ਜਿਸ ਵਿੱਚ 150 ਲੋਕਾਂ ਦੇ ਬੈਠਣ ਦੀ ਜਗ੍ਹਾ ਹੋਵੇ। ਮੈਨੂੰ ਇਹ 22 ਫਰਵਰੀ 2025 ਨੂੰ ਸਵੇਰੇ 9 ਵਜੇ ਤੋਂ ਰਾਤ 12 ਵਜੇ ਤੱਕ ਪੂਰੇ ਦਿਨ ਲਈ ਚਾਹੀਦਾ ਹੈ, ਬਸ ਇਹ ਯਕੀਨੀ ਬਣਾਓ ਕਿ ਇਹ £90 ਪ੍ਰਤੀ ਘੰਟਾ ਤੋਂ ਘੱਟ ਹੋਵੇ। ਓਹ ਕੀ ਤੁਸੀਂ ਢੁਕਵੇਂ ਫਿਲਟਰਾਂ ਲਈ ਫਿਲਟਰ ਸੈਕਸ਼ਨ ਦੀ ਜਾਂਚ ਕਰ ਸਕਦੇ ਹੋ ਅਤੇ ਇਹ ਯਕੀਨੀ ਬਣਾ ਸਕਦੇ ਹੋ ਕਿ ਉੱਥੇ ਪਾਰਕਿੰਗ ਹੈ ਅਤੇ ਪੂਰੀ ਜਗ੍ਹਾ ਵ੍ਹੀਲਚੇਅਰ ਨਾਲ ਪਹੁੰਚਯੋਗ ਹੈ।	8 / 10	ਇੱਕੋ ਕਾਰਜ ਲਈ ਵੀ, CUA ਦੀ ਭਰੋਸੇਯੋਗਤਾ ਇਸ ਗੱਲ 'ਤੇ ਨਿਰਭਰ ਕਰਦਿਆਂ ਬਦਲ ਸਕਦੀ ਹੈ ਕਿ ਅਸੀਂ ਕਾਰਜ ਲਈ ਪ੍ਰੋਂਪਟ ਕਿਵੇਂ ਦੇ ਰਹੇ ਹਾਂ। ਇਸ ਮਾਮਲੇ ਵਿੱਚ, ਅਸੀਂ ਮਿਤੀ ਦੇ ਵੇਰਵੇ (ਜਿਵੇਂ ਕਿ ਸਵੇਰੇ 9 ਵਜੇ ਤੋਂ ਰਾਤ 12 ਵਜੇ ਤੱਕ ਬਨਾਮ ਸਵੇਰੇ 9 ਵਜੇ ਤੋਂ ਪੂਰਾ ਦਿਨ) ਪ੍ਰਦਾਨ ਕਰਕੇ, ਅਤੇ ਨਤੀਜੇ ਲੱਭਣ ਲਈ ਕਿਹੜੇ UI ਦੀ ਵਰਤੋਂ ਕੀਤੀ ਜਾਣੀ ਚਾਹੀਦੀ ਹੈ, ਇਸ ਬਾਰੇ ਸੰਕੇਤ ਦੇ ਕੇ (ਜਿਵੇਂ ਕਿ ਫਿਲਟਰ ਸੈਕਸ਼ਨ ਦੀ ਜਾਂਚ ਕਰੋ...) ਭਰੋਸੇਯੋਗਤਾ ਵਿੱਚ ਸੁਧਾਰ ਕਰ ਸਕਦੇ ਹਾਂ।
	tagvenue.com 'ਤੇ ਜਾਓ ਅਤੇ ਲੰਡਨ ਵਿੱਚ ਅਜਿਹਾ ਕੰਸਰਟ ਹਾਲ ਲੱਭੋ ਜਿਸ ਵਿੱਚ 150 ਲੋਕਾਂ ਦੇ ਬੈਠਣ ਦੀ ਜਗ੍ਹਾ ਹੋਵੇ। ਮੈਨੂੰ ਇਹ 22 ਫਰਵਰੀ 2025 ਨੂੰ ਸਵੇਰੇ 9 ਵਜੇ ਤੋਂ ਪੂਰੇ ਦਿਨ ਲਈ ਚਾਹੀਦਾ ਹੈ, ਬਸ ਇਹ ਯਕੀਨੀ ਬਣਾਓ ਕਿ ਇਹ £90 ਪ੍ਰਤੀ ਘੰਟਾ ਤੋਂ ਘੱਟ ਹੋਵੇ। ਓਹ ਅਤੇ ਇਹ ਯਕੀਨੀ ਬਣਾਓ ਕਿ ਉੱਥੇ ਪਾਰਕਿੰਗ ਹੋਵੇ ਅਤੇ ਪੂਰੀ ਜਗ੍ਹਾ ਵ੍ਹੀਲਚੇਅਰ ਨਾਲ ਪਹੁੰਚਯੋਗ ਹੋਵੇ।	3 / 10
ਅਣਜਾਣ UI ਅਤੇ ਟੈਕਸਟ ਐਡੀਟਿੰਗ ਦੀ ਵਰਤੋਂ ਕਰਨ ਵਿੱਚ ਮੁਸ਼ਕਲ	html5editor ਦੀ ਵਰਤੋਂ ਕਰੋ ਅਤੇ ਖੱਬੇ ਪਾਸੇ ਹੇਠਾਂ ਦਿੱਤਾ ਟੈਕਸਟ ਦਰਜ ਕਰੋ, ਫਿਰ ਮੇਰੀਆਂ ਹਦਾਇਤਾਂ ਅਨੁਸਾਰ ਇਸਨੂੰ ਸੰਪਾਦਿਤ ਕਰੋ ਅਤੇ ਪੂਰਾ ਹੋਣ 'ਤੇ ਮੈਨੂੰ ਪੂਰੀ ਚੀਜ਼ ਦਾ ਸਕ੍ਰੀਨਸ਼ੌਟ ਦਿਓ। ਟੈਕਸਟ ਹੈ: ਹੈਲੋ ਵਰਲਡ! (Hello world!) ਇਹ ਮੇਰਾ ਪਹਿਲਾ ਟੈਕਸਟ ਹੈ। ਮੈਂ ਦੇਖਣਾ ਚਾਹੁੰਦਾ/ਚਾਹੁੰਦੀ ਹਾਂ ਕਿ HTML ਨਾਲ ਪ੍ਰੋਗਰਾਮ ਕੀਤੇ ਜਾਣ 'ਤੇ ਇਹ ਕਿਹੋ ਜਿਹਾ ਦਿਖਾਈ ਦੇਵੇਗਾ। ਕੁਝ ਹਿੱਸੇ ਲਾਲ ਹੋਣੇ ਚਾਹੀਦੇ ਹਨ। ਕੁਝ ਬੋਲਡ (bold)। ਕੁਝ ਇਟਾਲਿਕ (italic)। ਕੁਝ ਅੰਡਰਲਾਈਨ (underlined) ਕੀਤੇ ਹੋਏ। ਜਦੋਂ ਤੱਕ ਮੇਰਾ ਸਬਕ ਪੂਰਾ ਨਹੀਂ ਹੋ ਜਾਂਦਾ, ਅਤੇ ਅਸੀਂ ਦੂਜੇ ਪਾਸੇ ਨਹੀਂ ਚਲੇ ਜਾਂਦੇ। ... ਹੈਲੋ ਵਰਲਡ! (Hello world!) 'ਤੇ ਹੈਡਰ 2 (header 2) ਲਾਗੂ ਹੋਣਾ ਚਾਹੀਦਾ ਹੈ। ਇਸਦੇ ਹੇਠਾਂ ਵਾਲਾ ਵਾਕ ਇੱਕ ਸਾਧਾਰਨ ਪੈਰਾਗ੍ਰਾਫ ਟੈਕਸਟ ਹੋਣਾ ਚਾਹੀਦਾ ਹੈ। ਉਹ ਵਾਕ ਜਿਸ ਵਿੱਚ ਲਾਲ ਰੰਗ ਦਾ ਜ਼ਿਕਰ ਹੈ, ਉਹ ਸਾਧਾਰਨ ਟੈਕਸਟ ਅਤੇ ਲਾਲ ਹੋਣਾ ਚਾਹੀਦਾ ਹੈ। ਬੋਲਡ ਦਾ ਜ਼ਿਕਰ ਕਰਨ ਵਾਲਾ ਵਾਕ ਸਾਧਾਰਨ ਬੋਲਡ ਟੈਕਸਟ ਹੋਣਾ ਚਾਹੀਦਾ ਹੈ। ਇਟਾਲਿਕ ਦਾ ਜ਼ਿਕਰ ਕਰਨ ਵਾਲਾ ਵਾਕ ਇਟਾਲਿਕ ਹੋਣਾ ਚਾਹੀਦਾ ਹੈ। ਆਖਰੀ ਵਾਕ ਆਮ ਖੱਬੇ ਪਾਸੇ ਦੀ ਬਜਾਏ ਸੱਜੇ ਪਾਸੇ ਅਲਾਈਨ (align) ਹੋਣਾ ਚਾਹੀਦਾ ਹੈ।	4 / 10	ਜਦੋਂ CUA ਨੂੰ ਅਜਿਹੇ UIs ਨਾਲ ਗੱਲਬਾਤ ਕਰਨੀ ਪੈਂਦੀ ਹੈ ਜਿਨ੍ਹਾਂ ਨਾਲ ਇਸਨੇ ਸਿਖਲਾਈ ਦੌਰਾਨ ਬਹੁਤੀ ਗੱਲਬਾਤ ਨਹੀਂ ਕੀਤੀ ਹੁੰਦੀ, ਤਾਂ ਇਹ ਦਿੱਤੇ ਗਏ UI ਦੀ ਉਚਿਤ ਵਰਤੋਂ ਕਰਨ ਵਿੱਚ ਸੰਘਰਸ਼ ਕਰਦਾ ਹੈ। ਇਸ ਦੇ ਨਤੀਜੇ ਵਜੋਂ ਅਕਸਰ ਬਹੁਤ ਸਾਰੀਆਂ ਗਲਤੀਆਂ ਅਤੇ ਅਕੁਸ਼ਲ ਕਾਰਵਾਈਆਂ ਹੁੰਦੀਆਂ ਹਨ। CUA ਟੈਕਸਟ ਐਡੀਟਿੰਗ ਵਿੱਚ ਸਟੀਕ ਨਹੀਂ ਹੈ। ਇਹ ਅਕਸਰ ਪ੍ਰਕਿਰਿਆ ਵਿੱਚ ਬਹੁਤ ਸਾਰੀਆਂ ਗਲਤੀਆਂ ਕਰਦਾ ਹੈ ਜਾਂ ਤਰੁੱਟੀ (error) ਵਾਲਾ ਆਉਟਪੁੱਟ ਪ੍ਰਦਾਨ ਕਰਦਾ ਹੈ।

Safety

Because CUA is one of our first agentic products with an ability to directly take actions in a browser, it brings new risks and challenges to address. As we prepared for deployment of Operator, we did extensive safety testing and implemented mitigations across three major classes of safety risks: misuse, model mistakes, and frontier risks. We believe it is important to take a layered approach to safety, so we implemented safeguards across the whole deployment context: the CUA model itself, the Operator system, and post-deployment processes. The aim is to have mitigations that stack, with each layer incrementally reducing the risk profile.

The first category of risk is misuse. In addition to requiring users to comply with our Usage Policies, we have designed the following mitigations to reduce Operator’s risk of harm due to misuse, building off our safety work for GPT‑4o:

Refusals: The CUA model is trained to refuse many harmful tasks and illegal or regulated activities.
Blocklist: Operator cannot access websites that we’ve preemptively blocked, such as many gambling sites, adult entertainment, and drug or gun retailers.
Moderation: User interactions are reviewed in real-time by automated safety checkers that are designed to ensure compliance with Usage Policies and have the ability to issue warnings or blocks for prohibited activities.
Offline detection: We’ve also developed automated detection and human review pipelines to identify prohibited usage in priority policy areas, including child safety and deceptive activities, allowing us to enforce our Usage Policies.

The second category of risk is model mistakes, where the CUA model accidentally takes an action that the user didn’t intend, which in turn causes harm to the user or others. Hypothetical mistakes can range in severity, from a typo in an email, to purchasing the wrong item, to permanently deleting an important document. To minimize potential harm, we’ve developed the following mitigations:

User confirmations: The CUA model is trained to ask for user confirmation before finalizing tasks with external side effects, for example before submitting an order, sending an email, etc., so that the user can double-check the model’s work before it becomes permanent.
Limitations on tasks: For now, the CUA model will decline to help with certain higher-risk tasks, like banking transactions and tasks that require sensitive decision-making.
Watch mode: On particularly sensitive websites, such as email, Operator requires active user supervision, ensuring users can directly catch and address any potential mistakes the model might make.

One particularly important category of model mistakes is adversarial attacks on websites that cause the CUA model to take unintended actions, through prompt injections, jailbreaks, and phishing attempts. In addition to the aforementioned mitigations against model mistakes, we developed several additional layers of defense to protect against these risks:

Cautious navigation: The CUA model is designed to identify and ignore prompt injections on websites, recognizing all but one case from an early internal red-teaming session.
Monitoring: In Operator, we’ve implemented an additional model to monitor and pause execution if it detects suspicious content on the screen.
Detection pipeline: We’re applying both automated detection and human review pipelines to identify suspicious access patterns that can be flagged and rapidly added to the monitor (in a matter of hours).

Finally, we evaluated the CUA model against frontier risks outlined in our Preparedness Framework⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ), including scenarios involving autonomous replication and biorisk tooling. These assessments showed no incremental risk on top of GPT‑4o.

For those interested in exploring the evaluations and safeguards in more detail, we encourage you to review the Operator System Card, a living document that provides transparency into our safety approach and ongoing improvements.

As many of Operator’s capabilities are new, so are the risks and mitigation approaches we’ve implemented. While we have aimed for state-of-the-art, diverse and complementary mitigations, we expect these risks and our approach to evolve as we learn more. We look forward to using the research preview period as an opportunity to gather user feedback, refine our safeguards, and enhance agentic safety.

Conclusion

CUA builds on years of research advancements in multimodality, reasoning and safety. We have made significant progress in deep reasoning through the o-model series, vision capabilities through GPT‑4o, and new techniques to improve robustness through reinforcement learning and instruction hierarchy. The next challenge space we plan to explore is expanding the action space of agents. The flexibility offered by a universal interface addresses this challenge, enabling an agent that can navigate any software tool designed for humans. By moving beyond specialized agent-friendly APIs, CUA can adapt to whatever computer environment is available—truly addressing the “long tail” of digital use cases that remain out of reach for most AI models.

We're also working to make CUA available in the API⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ), so developers can use it to build their own computer-using agents. As we continue to iterate on CUA, we look forward to seeing the different use cases the community will discover. We plan to use the real-world feedback we gather from this early preview to continuously refine CUA’s capabilities and safety mitigations to safely advance our mission of distributing the benefits of AI to everyone.

ਲੇਖਕ

OpenAI

ਹਵਾਲੇ

ਕੰਪਿਊਟਰ ਵਰਤੋਂ, ਨਵਾਂ Claude 3.5 Sonnet, ਅਤੇ Claude 3.5 Haiku ਦੀ ਪੇਸ਼ਕਸ਼⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ)

ਮਾਡਲ ਕਾਰਡ ਪਰਿਸ਼ਿਸ਼ਟ: Claude 3.5 Haiku ਅਤੇ ਅਪਗ੍ਰੇਡ ਕੀਤਾ Claude 3.5 Sonnet⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ)

Kura WebVoyager ਬੈਂਚਮਾਰਕ⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ)

Google ਪ੍ਰੋਜੈਕਟ mariner⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ)

OSWorld: ਅਸਲ ਕੰਪਿਊਟਰ ਵਾਤਾਵਰਣਾਂ ਵਿੱਚ ਖੁੱਲ੍ਹੇ ਅੰਤ ਵਾਲੇ ਕੰਮਾਂ ਲਈ ਮਲਟੀਮੋਡਲ ਏਜੰਟਾਂ ਦਾ ਬੈਂਚਮਾਰਕਿੰਗ⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ)

WebVoyager: ਵੱਡੇ ਮਲਟੀਮੋਡਲ ਮਾਡਲਾਂ ਨਾਲ ਐਂਡ-ਟੂ-ਐਂਡ ਵੈੱਬ ਏਜੰਟ ਬਣਾਉਣਾ⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ)

WebArena: ਸਵੈ-ਚਾਲਤ ਏਜੰਟ ਬਣਾਉਣ ਲਈ ਇੱਕ ਯਥਾਰਥਪੂਰਨ ਵੈੱਬ ਵਾਤਾਵਰਣ⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ)

ਹਵਾਲੇ

ਕਿਰਪਾ ਕਰਕੇ OpenAI ਦਾ ਹਵਾਲਾ ਦਿਓ ਅਤੇ ਹਵਾਲੇ ਲਈ ਹੇਠਾਂ ਦਿੱਤਾ BibTeX ਵਰਤੋ: http://cdn.openai.com/cua/cua2025.bib⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ)