၂၀၂၆ မတ် ၁၁

From model to agent: Equipping the Responses API with a computer environment

By Bo Xu, Danny Zhang, and Rohit Arunachalam

ဖွင့်နေသည်…

We're currently in a shift from using models, which excel at particular tasks, to using agents capable of handling complex workflows. By prompting models, you can only access trained intelligence. However, giving the model a computer environment can achieve a much wider range of use cases, like running services, requesting data from APIs, or generating more useful artifacts like spreadsheets or reports.

A few practical problems emerge when you try to build agents: where to put intermediate files, how to avoid pasting large tables into a prompt, how to give the workflow network access without creating a security headache, and how to handle timeouts and retries without building a workflow system yourself.

Instead of putting it on developers to build their own execution environments, we built the necessary components to equip the Responses API⁠(ဝင်းဒိုးအသစ်တွင် ဖွင့်မည်) with a computer environment to reliably execute real-world tasks.

OpenAI’s Responses API, together with the shell tool and a hosted container workspace, is designed to address these practical problems. The model proposes steps and commands; the platform runs them in an isolated environment with a filesystem for inputs and outputs, optional structured storage (like SQLite), and restricted network access.

In this post, we’ll break down how we built a computer environment for agents and share some early lessons on how to use it for faster, more repeatable, and safer production workflows.

The shell tool

A good agent workflow starts with a tight execution loop: the model proposes an action like reading files or fetching data with API, the platform runs it, and the result feeds into the next step. We’ll start with the shell tool—the simplest way to see this loop in action—and then cover the container workspace, networking, reusable skills, and context compaction.

To understand the shell tool, it’s first useful to understand how a language model uses tools in general: to do things like call a function or interact with a computer. During training, a model is shown examples of how tools are used and the resulting effects, step by step. This helps the model learn to decide when to use a tool and how to use it. When we say “using a tool”, we mean the model actually only proposes a tool call. It can't execute the call on its own.

shell tool သည် ပုံနှင့်အတူ “အခြား tool တစ်ခုသာ” ဖြစ်သည်

The shell tool makes the model dramatically more powerful: it interacts with a computer through the command line to carry out a wide range of tasks, from searching for text to sending API requests on your computer. Built on familiar Unix tooling, our shell tool can do anything you'd expect, with utilities like grep, curl, and awk available out of the box.

Compared to our existing code interpreter, which only executes Python, the shell tool enables a much wider range of use cases, like running Go or Java programs or starting a NodeJS server. This flexibility lets the model fulfill complex agentic tasks.

Orchestrating the agent loop

On its own, a model can only propose shell commands, but how are these commands executed? We need an orchestrator to get model output, invoke tools, and pass the tool response back to the model in a loop, until the task is complete.

The Responses API is how developers interact with OpenAI models. When used with custom tools, the Responses API yields control back to the client, and the client requires its own harness for running the tools. However, this API can also orchestrate between the model and hosted tools out of the box.

When the Responses API receives a prompt, it assembles model context: user prompt, prior conversation state, and tool instructions. For shell execution to work, the prompt must mention using the shell tool and the selected model must be trained to propose shell commands—models GPT‑5.2 and later are trained for this. With all of this context, the model then decides the next action. If it chooses shell execution, it returns one or more shell commands to Responses API service. The API service forwards those commands to the container runtime, streams back shell output, and feeds it to the model in the next request’s context. The model can then inspect the results, issue follow-up commands, or produce a final answer. The Responses API repeats this loop until the model returns a completion without additional shell commands.

အေးဂျင့် loop ပုံ - Responses API သည် container အတွင်း မော်ဒယ်နှင့် shell execution ကို စုစည်းထိန်းချုပ်ပေးသည်

When the Responses API executes a shell command, it maintains a streaming connection to the container service. As output is produced, the API relays it to the model in near real time so the model can decide whether to wait for more output, run another command, or move on to a final response.

Streaming လုပ်ထားသော shell command execution output

The model can propose multiple shell commands in one step, and the Responses API can execute them concurrently using separate container sessions. Each session streams output independently, and the API multiplexes those streams back into structured tool outputs as context. In other words, the agent loop can parallelize work, such as searching files, fetching data, and validating intermediate results.

Responses API သည် command execution sessions များကို multiplex လုပ်ပေးသည်

When the command involves file operations or data processing, shell output can become very large and consume context budgets without adding useful signals. To control this, the model specifies an output cap per command. The Responses API enforces that cap and returns a bounded result that preserves both the beginning and end of the output, while marking omitted content. For example, you might bound the output to 1,000 characters, with preserved beginning and end:

text at the beginning ... 1000 chars truncated ... text at the end

Together, concurrent execution and bounded output make the agent loop both fast and context-efficient so the model can keep reasoning over relevant results instead of getting overwhelmed by raw terminal logs.

When the context window gets full: compaction

One potential issue with agent loops is that tasks can run for a long time. Long-running tasks fill the context window, which is important for providing context across turns and across agents. Picture an agent calling a skill, getting a response, adding tool calls and reasoning summaries—the limited context window quickly fills up. To avoid losing the important context as the agent continues running, we need a way to keep the key details and remove anything extraneous. Instead of requiring developers to design and maintain custom summarization or state-carrying systems, we added native compaction in the Responses API, designed to align with how the model behaves and how it's been trained.

Our latest models are trained to analyze prior conversation state and produce a compaction item that preserves key prior state in an encrypted token-efficient representation. After compaction, the next context window consists of this compaction item and high-value portions of the earlier window. This allows workflows to continue coherently across window boundaries, even in extended multi-step and tool-driven sessions. Codex relies on this mechanism to sustain long-running coding tasks and iterative tool execution without degrading quality.

Compaction is available either built-in on the server or through a standalone `/compact` endpoint. Server-side compaction lets you configure a threshold, and the system handles compaction timing automatically, eliminating the need for complex client-side logic. It allows a slightly larger effective input context window to tolerate small overages right before compaction, so requests near the limit can still be processed and compacted rather than rejected. As model training evolves, the native compaction solution evolves with it for every OpenAI model release.

Codex helped us build the compaction system while serving as an early user of it. When one Codex instance hit a compaction error, we'd spin up a second instance to investigate. The result was that Codex got a native, effective compaction system just by working on the problem. This ability for Codex to inspect and refine itself has become an especially interesting part of working at OpenAI. Most tools only require the user to learn how to use them; Codex learns alongside us.

Container context

Now let’s cover state and resources. The container is not only a place to run commands but also the working context for the model. Inside the container, the model can read files, query databases, and access external systems under network policy controls.

runtime container အတွင်းရှိ Files, databases, skills နှင့် policy-controlled network ကို ပြထားသော ပုံ

File systems

The first part of container context is the file system for uploading, organizing, and managing resources. We built container and file⁠(ဝင်းဒိုးအသစ်တွင် ဖွင့်မည်) APIs to give the model a map of available data and help it choose targeted file operations instead of performing broad, noisy scans.

A common anti-pattern is packing all input directly into prompt context. As inputs grow, overfilling the prompt becomes expensive and hard for the model to navigate. A better pattern is to stage resources in the container file system and let the model decide what to open, parse, or transform with shell commands. Much like humans, models work better with organized information.

Databases

The second part of container context is databases. In many cases, we suggest developers store structured data in databases as SQLite and query them. Instead of copying an entire spreadsheet into the prompt, for example, you can give the model a description of the tables—what columns exist and what they mean—and let it pull the rows it needs.

For example, if you ask, “Which products had declining sales this quarter?” the model can query just the relevant rows instead of scanning the whole spreadsheet. This is faster, cheaper, more scalable to larger datasets.

Network access

container context ၏ တတိယအစိတ်အပိုင်းမှာ network access ဖြစ်ပြီး၊ ၎င်းသည် အေးဂျင့် workload များအတွက် မရှိမဖြစ် လိုအပ်သော အစိတ်အပိုင်းတစ်ခုဖြစ်သည်။ အေးဂျင့် workflow သည် live data ကို fetch လုပ်ရန်၊ ပြင်ပ API များကို ခေါ်ရန် သို့မဟုတ် packages များ install လုပ်ရန် လိုအပ်နိုင်သည်။ တစ်ချိန်တည်းမှာပင် container များကို ကန့်သတ်မထားသော internet access ပေးခြင်းသည် အန္တရာယ်ရှိနိုင်သည် - ၎င်းသည် ပြင်ပ website များထံ သတင်းအချက်အလက် ပေါက်ကြားစေနိုင်ပြီး၊ ထိခိုက်လွယ်သော internal သို့မဟုတ် third-party systems များကို မရည်ရွယ်ဘဲ ထိတွေ့စေနိုင်သလို credential leak နှင့် data exfiltration ကို ကာကွယ်ရန်လည်း ပိုခက်ခဲစေနိုင်သည်။

ဤစိုးရိမ်ချက်များကို ဖြေရှင်းပြီး အေးဂျင့်များ၏ အသုံးဝင်မှုကို မကန့်သတ်ရန် hosted containers များတွင် sidecar egress proxy ကို အသုံးပြုအောင် ကျွန်ုပ်တို့ တည်ဆောက်ခဲ့သည်။ အပြင်သို့ ထွက်သော network request များအားလုံးသည် allowlists နှင့် access controls များကို အကောင်အထည်ဖော်ပေးသည့် centralized policy layer တစ်ခုမှတစ်ဆင့် စီးဆင်းသွားပြီး traffic ကိုလည်း စောင့်ကြည့်နိုင်စေသည်။ credentials အတွက်တော့ egress တွင် domain-scoped secret injection ကို အသုံးပြုပါသည်။ မော်ဒယ်နှင့် container သည် placeholders များကိုသာ မြင်ရပြီး raw secret values များသည် model-visible context ၏ အပြင်ဘက်တွင်သာ ရှိနေကာ ခွင့်ပြုထားသော destination များအတွက်သာ သက်ရောက်စေပါသည်။ ၎င်းကြောင့် leakage အန္တရာယ်ကို လျှော့ချပေးသော်လည်း authenticated external calls များကိုတော့ ဆက်လက်လုပ်ဆောင်နိုင်စေပါသည်။

access egress proxy မှတစ်ဆင့် ထိန်းချုပ်ထားသော network access ပုံ - container setup

အေးဂျင့် skills

Shell commands များသည် အင်အားကြီးသော်လည်း task အများအပြားသည် multi-step pattern တူညီမှုများကို ထပ်ခါတလဲလဲ ပြုလုပ်ရလေ့ရှိသည်။ အေးဂျင့်များသည် run တိုင်း workflow ကို ပြန်လည်ရှာဖွေရသည် - replanning လုပ်ရ၊ commands များကို ပြန်ထုတ်ရ၊ conventions များကို ပြန်သင်ရပြီး၊ ၎င်းကြောင့် ရလဒ်မတည်ငြိမ်ခြင်းနှင့် execution ပျက်စီးဆုံးရှုံးမှု ဖြစ်စေသည်။ Agent skills⁠(ဝင်းဒိုးအသစ်တွင် ဖွင့်မည်) သည် ထို pattern များကို ပြန်အသုံးချနိုင်ပြီး ပေါင်းစပ်အသုံးပြုနိုင်သော building blocks များအဖြစ် package လုပ်ပေးသည်။ တိတိကျကျဆိုရလျှင် skill တစ်ခုသည် ‘SKILL.md⁠(ဝင်းဒိုးအသစ်တွင် ဖွင့်မည်)’ (metadata နှင့် instructions ပါဝင်သည်) နှင့် API specs၊ UI assets စသည့် supporting resources များပါဝင်သော folder bundle တစ်ခုဖြစ်သည်။

ဤဖွဲ့စည်းပုံသည် အစောပိုင်းက ဖော်ပြခဲ့သော runtime architecture နှင့် သဘာဝကျစွာ ကိုက်ညီပါသည်။ container သည် persistent files နှင့် execution context ကို ပံ့ပိုးပေးပြီး shell tool သည် execution interface ကို ပံ့ပိုးပေးသည်။ ထိုနှစ်ခုလုံး ရှိနေသောအခါ မော်ဒယ်သည် လိုအပ်သည့်အချိန်တွင် shell commands (`ls`, `cat` စသည်) ဖြင့် skill files များကို ရှာဖွေနိုင်ပြီး instructions များကို အဓိပ္ပာယ်ဖွင့်ဆိုကာ skill scripts များကို အေးဂျင့် loop တစ်ခုတည်းအတွင်း run လုပ်နိုင်သည်။

OpenAI platform တွင် skills များကို စီမံခန့်ခွဲရန် API များ⁠(ဝင်းဒိုးအသစ်တွင် ဖွင့်မည်) ကို ကျွန်ုပ်တို့ ပံ့ပိုးပေးပါသည်။ developer များသည် skill folders များကို versioned bundles အဖြစ် upload လုပ်ပြီး သိမ်းဆည်းနိုင်ကာ နောက်ပိုင်းတွင် skill ID ဖြင့် ပြန်လည်ရယူနိုင်သည်။ prompt ကို မော်ဒယ်ထံ မပို့မီ Responses API သည် skill ကို load လုပ်ပြီး model context ထဲတွင် ထည့်သွင်းပေးသည်။ ဤအစီအစဉ်သည် deterministic ဖြစ်သည် -

name နှင့် description အပါအဝင် skill metadata ကို fetch လုပ်ပါ။
skill bundle ကို fetch လုပ်ပြီး container ထဲသို့ copy လုပ်ကာ unpack လုပ်ပါ။
skill metadata နှင့် container path ဖြင့် model context ကို update လုပ်ပါ။

skill တစ်ခု သက်ဆိုင်မှုရှိမရှိ ဆုံးဖြတ်ရာတွင် မော်ဒယ်သည် ၎င်း၏ instructions များကို တဖြည်းဖြည်း လေ့လာပြီး container အတွင်း shell commands များမှတစ်ဆင့် ၎င်း၏ scripts များကို run လုပ်ပါသည်။

အေးဂျင့်များကို ဘယ်လိုတည်ဆောက်သလဲ

အစိတ်အပိုင်းအားလုံးကို ပေါင်းစည်းပြောရလျှင် - Responses API သည် orchestration ကို ပံ့ပိုးပေးသည်၊ shell tool သည် လုပ်ဆောင်နိုင်သော action များကို ပံ့ပိုးပေးသည်၊ hosted container သည် persistent runtime context ကို ပံ့ပိုးပေးသည်၊ skills သည် ပြန်အသုံးချနိုင်သော workflow logic ကို အထပ်ထပ်ဖြစ်စေပြီး compaction သည် အေးဂျင့်အား လိုအပ်သော context နှင့်အတူ အချိန်ကြာမြင့်စွာ လည်ပတ်နိုင်စေပါသည်။

ဤ primitive များဖြင့် prompt တစ်ခုတည်းသည် end-to-end workflow တစ်ခုအဖြစ် ချဲ့ထွင်သွားနိုင်သည် - သင့်တော်သော skill ကို ရှာဖွေခြင်း၊ ဒေတာကို fetch လုပ်ခြင်း၊ ၎င်းကို local structured state အဖြစ် ပြောင်းလဲခြင်း၊ ထို state ကို ထိရောက်စွာ query လုပ်ခြင်းနှင့် durable artifacts များ ထုတ်ပေးခြင်း။

အောက်ပါပုံသည် live data မှ spreadsheet တစ်ခု ဖန်တီးရာတွင် ဤစနစ် မည်သို့ အလုပ်လုပ်သည်ကို ပြထားသည်။

တောင်းဆိုမှု lifecycle ပုံ - prompt တစ်ခုမှ durable artifacts နှင့် skill discovery အထိ

သင့်ကိုယ်ပိုင် အေးဂျင့်ကို တည်ဆောက်ပါ

shell tool နှင့် computer environment ကို ပေါင်းစပ်၍ end-to-end workflows များ ဆောင်ရွက်သည့် နက်နက်ရှိုင်းရှိုင်း ဥပမာတစ်ခုအတွက် skill တစ်ခု package လုပ်ကာ Responses API မှတစ်ဆင့် execute လုပ်ပုံကို အဆင့်လိုက်ရှင်းပြထားသည့် ကျွန်ုပ်တို့၏ developer blog post⁠(ဝင်းဒိုးအသစ်တွင် ဖွင့်မည်) နှင့် cookbook⁠(ဝင်းဒိုးအသစ်တွင် ဖွင့်မည်) ကို ကြည့်ပါ။

ဤ primitive set ဖြင့် developer များက ဘာတွေ တည်ဆောက်မလဲဆိုတာ မြင်ရဖို့ ကျွန်ုပ်တို့ စိတ်လှုပ်ရှားနေပါသည်။ Language models များသည် စာသား၊ ရုပ်ပုံနှင့် အသံများ ထုတ်ပေးရုံထက် ပိုလုပ်နိုင်ရန် ရည်ရွယ်ထားပြီး—complex ဖြစ်သော လက်တွေ့ကမ္ဘာ task များကို scale ဖြင့် ကိုင်တွယ်ရာတွင် ပိုမိုစွမ်းဆောင်နိုင်စေရန် ကျွန်ုပ်တို့၏ platform ကို ဆက်လက် တိုးတက်အောင်လုပ်သွားမည်။

စာရေးသူ

Bo Xu - Danny Zhangနှင့် Rohit Arunachalam

ဆက်ဖတ်ရှုပါ

အားလုံးကို ကြည့်ရန်

GPT-5.6 ၏ စွမ်းဆောင်ရည်အမြင့်ဆုံး ဉာဏ်ရည်နှင့် ထိရောက်မှု

အင်ဂျင်နီယာနယ်ပယ်၂၀၂၆ ဇူ ၂၉

Core dump epidemiology: ၁၈ နှစ်ကြာ bug ကို ပြင်ခြင်း

အင်ဂျင်နီယာနယ်ပယ်၂၀၂၆ ဇွန် ၃၀

Codex ဖြင့် ကိုယ်တိုင်တိုးတက်သော အခွန်အေးဂျင့်များ တည်ဆောက်ခြင်း

အင်ဂျင်နီယာနယ်ပယ်၂၀၂၆ မေ ၂၇