U bood nuxurka ugu muhiimsan
OpenAI

Janaayo 29, 2026

Injineernimada

Gudaha wakiilka xogta gudaha ee OpenAI

Waxaa qoray Bonnie Xu, Aravind Suresh, iyo Emma Tang

Soo kacaya…

Xogtu waxay awood siisaa sida nidaamyadu wax u bartaan, sida badeecooyinku u kobcaan, iyo sida shirkaduhu go'aanno u gaaraan. Laakiin helidda jawaabo si degdeg ah, si sax ah, iyo macnaha saxda ah leh badanaa way ka adag tahay intii ay ahaan lahayd. Si tan loo fududeeyo marka OpenAI ay sii ballaaranayso, waxaan dhisnay wakiilkayaga gaarka ah ee AI data-ga gudaha kaas oo sahamiya oo ka caqliyeeya madalkeenna .

Wakiilkayagu waa qalab gaar ah oo gudaha uun ah (ma aha wax dibadeed), loona dhisay si gaar ah xogta, rukhsadaha, iyo workflows-ka OpenAI. Waxaan soo bandhigaynaa sida aan u dhisnay oo aan u isticmaalno si aan u muujinno tusaalooyin dhab ah oo saameyn leh oo AI u taageeri karto shaqada maalinlaha ah ee kooxaheenna. Qalabka OpenAI ee aan u adeegsanay dhisiddiisa iyo socodsiintiisa (Codex, qaabka ugu weyn ee GPT‑5, Evals API(ku furmaa daaqad cusub), iyo Embeddings API(ku furmaa daaqad cusub)) waa isla qalabka aan u diyaarinno horumariyeyaasha meel kasta jooga.

Wakiilkeena data-ga wuxuu u oggolaanayaa shaqaalaha inay su'aal uga gudbaan aragti daqiiqado gudahood, halkii ay maalmo ka qaadan lahayd. Tani waxay hoos u dhigtaa caqabadda helidda xogta iyo falanqayn qoto dheer ee dhammaan shaqooyinka, ma aha oo keliya kooxda xogta. Maanta, kooxaha Engineering, Data Science, Go-To-Market, Finance, iyo Research ee OpenAI waxay ku tiirsan yihiin wakiilka si uu uga jawaabo su'aalaha xogta ee saamaynta sare leh. Tusaale ahaan, wuxuu ka caawin karaa jawaabidda sida loo qiimeeyo daahfurro iyo in la fahmo caafimaadka ganacsiga, dhammaanna iyada oo loo marayo qaabka dareenka leh ee luqadda dabiiciga ah. Wakiilku wuxuu isku daraa aqoonta heer-jadwal ee ku shaqaysa Codex iyo macnaha badeecadeed iyo mid urureed. Nidaamkiisa xusuusta ee si joogto ah wax u barta ayaa ka dhigan inuu sidoo kale ku fiicnaado wareeg kasta.

Sawir-qaadis muujinaya adeegsade codsanaya ChatGPT WAU ee Oktoobar 6, 2025 marka loo eego DevDay 2023. Wakiilku wuxuu sheegayaa ≈800M WAU sannadka 2025 iyo ≈100M sannadka 2023, iyadoo qoraalladu muujinayaan isbeddel +700M ah iyo koror qiyaastii ~8× ah, ka dibna raacaya macne sharaxaad leh.

Qoraalkan, waxaan ku kala qaadi doonnaa sababta aan ugu baahanayn wakiil AI data gaar ah, waxa ka dhigaya macnaha xogtiisa koodh-ku-kobcan iyo is-barashadiisa mid faa'iido badan, iyo casharradii aan jidka ku barannay.

Sababta aan ugu baahanayn qalab gaar ah

Madal xogeedka OpenAI waxay adeegtaa in ka badan 3.5k adeegsadayaal gudaha ah oo ka shaqeeya Engineering, Product, iyo Research, kuna baahsan in ka badan 600 petabytes oo xog ah oo ku jirta 70k datasets. Cabbirkaas, helidda jadwalka saxda ahi waxay noqon kartaa mid ka mid ah qaybaha ugu waqti badan ee falanqaynta.

Sida mid ka mid ah adeegsadayaashayada gudaha ahi yiri:

“Waxaan leenahay jadwalo badan oo aad isugu eg, waxaanan waqti badan ku bixiyaan inaan fahmo sida ay u kala duwan yihiin iyo midka aan isticmaalo. Qaar waxay ku jiraan adeegsadayaal aan login-gelin, qaarna maya. Qaar waxay leeyihiin beero is-dul-saaran; way adag tahay in la kala garto waxa yahay waxa.”

Xitaa marka jadwallada saxda ah la doorto, soo saarista natiijooyin sax ah way adkaan kartaa. Falanqeeyayaashu waa inay ka caqliyeeyaan xogta jadwalka iyo xiriirrada jadwalka si ay u hubiyaan in beddellada iyo filtarrada si sax ah loo dabaqay. Hababka guuldarrada ee caanka ah—many-to-many joins, khaladaadka filter pushdown, iyo nulls aan la maareyn—waxay si aamusnaan ah u baabi'in karaan natiijooyinka. Miisaanka OpenAI, falanqeeyayaashu waa inaysan waqtigooda ku lumin sixidda semantiga SQL ama waxqabadka weydiinta: diiraddoodu waa inay noqotaa qeexidda cabbirrada, xaqiijinta mala-awaalka, iyo qaadashada go'aanno xog-ku-salaysan.

Sawir-qaadis koodh SQL ah oo qeexaya laba CTE—order_enriched iyo monthly_segment—kuwaas oo ku xira xogta juqraafiyadda macaamiisha, kasoo saara beeraha bil-dalbasho, isla markaana xisaabiya isu-geynno bille ah sida tirada dalabaadka, dakhliga guud, dakhliga canshuurta leh, iyo celceliska maalmaha ship-to-receipt.

Bayaankan SQL wuxuu dheer yahay 180+ sadar. Ma sahlana in la ogaado in aan ku biirinayno jadwallada saxda ah iyo in aan weydiinayno tiirarka saxda ah.

Sida ay u shaqayso

Aan dhex marno waxa wakiilkeennu yahay, sida uu u habeeyo macnaha, iyo sida uu isugu sii wanaajiyo.

Wakiilkeennu wuxuu ku shaqeeyaa GPT‑5.2 waxaana loo qaabeeyey inuu ka caqliyeeyo madasha xogta OpenAI. Waxaa laga heli karaa meel kasta oo shaqaaluhu horeba uga shaqeeyaan: sida wakiil Slack, iyada oo loo marayo interface web ah, gudaha IDE-yada, gudaha Codex CLI via MCP(ku furmaa daaqad cusub), iyo si toos ah gudaha app-ka gudaha ee ChatGPT ee OpenAI iyada oo loo marayo MCP connector(ku furmaa daaqad cusub).

Jaantus cinwaankiisu yahay “Sida wakiilka data-gu u shaqeeyo.” Meelaha laga galo—Agent-UI, Local Agent-MCP, Remote Agent-MCP, iyo Slack Agent—waxay galaan Agent-API. API-gu wuxuu ku xirmaa aqoonta xogta gudaha iyo macnaha shirkadda, wuxuu la jaanqaadaa bakhaar xogeed iyo ilo madal ah, wuxuuna codsiyo la isdhaafsadaa nooca GPT-5.2 isagoo maraya Agent-MCP.

Adeegsadayaashu waxay weydiin karaan su'aalo adag oo furan kuwaas oo caadi ahaan u baahan lahaa dhowr wareeg oo sahamin gacanta ah. Qaado tusaalahan weydiin, oo adeegsada dataset tijaabo ah: “Safarrada tagaasida NYC, lammaaneyaasha ZIP ee pickup-ilaa-dropoff kee baa ugu aan la isku hallayn karin, oo leh farqiga ugu weyn ee u dhexeeya waqtiyada safarka caadiga ah iyo kuwa kiiskii ugu xumaa, goormase kala-duwanaanshahaasi dhacaa?”

Wakiilku wuxuu maareeyaa falanqaynta dhammaad-ilaa-dhammaad, laga bilaabo fahamka su'aasha ilaa sahminta xogta, socodsiinta weydiimaha, iyo isku dubbaridka natiijooyinka.

Sawir-qaadis muujinaya adeegsade weydiinaya lammaaneyaasha ZIP ee pickup→dropoff tagaasida NYC ee ugu “aan la isku hallayn karin.” Wakiilku wuxuu sharxayaa isticmaalka ~21k safar oo ka yimid samples.nyctaxi.trips, wuxuu qeexayaa caadiga (p50) iyo kiiskii ugu xumaa (p95), wuxuu dabaqaa filtarrada, wuxuuna sharxaa sida uu u aqoonsado goorta safarka ugu dheer ee lammaane kasta oo ZIP uu dhacay.

Jawaabta wakiilka ee su'aasha.

Mid ka mid ah awoodaha gaarka ah ee wakiilka waa sida uu dhibaatooyinka uga caqliyeeyo. Halkii uu ka raaci lahaa qoraal go'an, wakiilku wuxuu qiimeeyaa horumarkiisa. Haddii natiijo dhexe ay khalad u muuqato (tusaale, haddii ay leedahay eber saf sababtoo ah join ama filter khaldan), wakiilku wuxuu baarayaa waxa qaldamay, wuxuu hagaajiyaa habkiisa, wuuna isku dayaa mar kale. Inta geeddi-socodkani socdo, wuxuu hayaa macnaha buuxa, wuxuuna waxbarashada hore u sii qaadaa inta u dhexeysa tallaabooyinka. Tani waa geeddi-socod xiran oo is-barasho leh oo ka wareejinaya dib-u-celinta adeegsadaha una wareejinaya wakiilka laftiisa, taas oo suuragelinaysa natiijooyin degdeg badan iyo falanqayno tayo sare leh oo ka joogto badan workflows-ka gacanta.

Sawir-qaadis workflow hawleed muujinaya qorshaha tallaabo-tallaabo ee wakiil AI oo ku saabsan falanqaynta muddada safarrada tagaasida NYC. Waxa ku jira yoolal, baaritaanno gudaha ah, eegista bandhig-qorshaha, googo' koodh, iyo caqliyeyn ku saabsan xisaabinta spreads-ka p50/p95, aqoonsiga lammaaneyaasha ZIP ee aan la isku hallayn karin, iyo qorshaynta weydiimaha SQL.

Caqliyeynta wakiilka si loo aqoonsado lammaaneyaasha pickup–dropoff ee tagaasida NYC ee ugu aan la isku hallayn karin.

Wakiilku wuxuu daboolaa workflow-ga analytics oo dhan: helidda xogta, socodsiinta SQL, iyo daabicidda notebooks iyo warbixinno. Wuxuu fahmaa aqoonta gudaha ee shirkadda, wuxuu sameyn karaa baaritaan web oo xog dibadeed ah, wuxuuna waqti ka dib ku fiicnaadaa isticmaalka la bartay iyo xusuusta.

Macnuhu waa wax walba

Jawaabo tayo sare leh waxay ku xiran yihiin macne hodan ah oo sax ah. Macne la'aan, xitaa noocyo xooggan waxay soo saari karaan natiijooyin khaldan, sida si weyn u qiyaasid khaldan oo tirada adeegsadayaasha ah ama fasiraad khaldan oo eraybixin gudaha ah.

Sawir-qaadis adeegsade weydiinaya, “ChatGPT Image Gen logged-in DAU muxuu ahaa 30-kii maalmood ee u dambeeyay?” iyadoo xariiq xaaladeed hoos ka muujinayso in wakiilku “Working for 22m 41s,” taas oo muujinaysa weydiin dheer oo wali socota.

Wakiilka aan xusuusta lahayn, oo aan awoodin inuu si wax ku ool ah u weydiiyo.

Sawir-qaadis muujinaya adeegsade weydiinaya, “ChatGPT Image Gen logged-in DAU muxuu ahaa 30-kii maalmood ee u dambeeyay?” Hoosta fariinta, xariiq xaaladeed ayaa leh “Worked for 1m 22s,” taas oo muujinaysa in weydiintu wali socoto oo ay waqti dheer qaadanayso inay dhammaato.

Xusuusta wakiilka waxay suuragelisaa weydiimo degdeg badan iyadoo heshaysa jadwallada saxda ah.

Si looga fogaado hababkan guuldarrada ah, wakiilka waxaa lagu dhisay lakabyo badan oo macne ah kuwaas oo ku xira xogta OpenAI iyo aqoonta hay'adeed.

Jaantus cinwaankiisu yahay “Lakabyada macnaha ee wakiilka data-ga” oo muujinaya lix heer oo is dul saaran: 1) Isticmaalka Jadwalka, 2) Qoraallo Aadanaha, 3) Kobcin Codex, 4) Aqoonta Hay'adda, 5) Xusuus, iyo 6) Macnaha Runtime. Lakab kasta wuxuu u muuqdaa sida baar jiif ah oo qaab haram ah.

Lakabka #1: Isticmaalka Jadwalka

  • Xog ku saleynta xaqiijinta metadata: Wakiilku wuxuu ku tiirsan yahay metadata-ga bandhig-qorshaha (magacyada tiirarka iyo noocyada xogta) si uu ugu hago qorista SQL wuxuuna adeegsadaa lineage-ka jadwalka (tusaale, xiriirrada jadwalada upstream iyo downstream) si uu u bixiyo macne ku saabsan sida jadwallada kala duwan isugu xiran yihiin.
  • Ka-fahanka weydiinta: Nuugista weydiimihii taariikhiga ahaa waxay ka caawisaa wakiilka inuu fahmo sida uu u qoro weydiimihiisa iyo jadwallada sida caadiga ah la isugu xiro.

Lakabka #2: Qoraallada Aadanaha

  • Sharaxaad la habeeyey oo ku saabsan jadwallada iyo tiirarka oo ay bixiyeen khubaro domain ah, kuwaas oo qabta ujeeddo, semantics, macnaha ganacsi, iyo digniino la yaqaan oo aan si fudud looga fahmi karin bandhig-qorshayaasha ama weydiimihii hore.

Metadata keliya kuma filna. Si runtii loo kala garto jadwallada, waxaad u baahan tahay inaad fahanto sida loo sameeyey iyo halka ay ka yimaadeen.

Lakabka #3: Kobcin Codex

  • Iyadoo laga soo saarayo qeexid heer-koodh ah oo jadwalka ah, wakiilku wuxuu dhisaa faham qoto dheer oo ku saabsan waxa xogtu dhab ahaantii ka kooban tahay. 
    • Nuances-ka ku saabsan waxa lagu kaydiyo jadwalka iyo sida looga soo saaro dhacdo analytics ah waxay bixiyaan xog dheeraad ah. Tusaale ahaan, wuxuu bixin karaa macne ku saabsan gaar ahaanshaha qiimayaasha, sida badan ee xogta jadwalka loo cusboonaysiiyo, baaxadda xogta (tusaale, haddii jadwalku ka reebayo beero gaar ah, wuxuu leeyahay heerkan granulariyadeed), iwm.
  • Tani waxay bixisaa macne isticmaal oo la xoojiyey iyadoo muujinaysa sida jadwalku uga baxsan SQL loogu isticmaalo Spark, Python, iyo nidaamyo kale oo xogeed.
  • Tani waxay ka dhigan tahay in wakiilku kala saari karo jadwallo isu eg balse ku kala duwan siyaabo muhiim ah. Tusaale ahaan, wuxuu garan karaa in jadwal uu kaliya ka kooban yahay taraafikada first-party ee ChatGPT. Macnahan sidoo kale si toos ah ayaa loo cusboonaysiiyaa, sidaa darteed wuxuu ahaanayaa mid casri ah iyada oo aan dayactir gacanta ahi jirin.
Diagram titled “Codex-enriched knowledge pipeline.” Popular tables feed into multiple Codex tasks, which extract details from the OpenAI codebase, including a table’s purpose, grain and primary keys, downstream usage patterns, alternate table options, and data freshness.

Lakabka #4: Aqoonta Hay'adda 

  • Wakiilku wuxuu geli karaa Slack, Google Docs, iyo Notion, kuwaas oo qabta macnaha muhiimka ah ee shirkadda sida daahfurro, dhacdooyinka kalsoonida, magacyada gudaha ee sirta ah iyo qalabka, iyo qeexitaannada canonical-ka iyo mantiqiga xisaabinta cabbirrada muhiimka ah.
  • Dukumeentiyadan waa la nuugaa, waa la embed-gareeyaa, waxaana lagu kaydiyaa metadata iyo rukhsado. Adeeg soo-celin ah ayaa maamula xakamaynta gelitaanka iyo caching-ka waqtiga runtime-ka, taas oo u suurtogelisa wakiilka inuu si hufan oo ammaan ah u soo jiito xogtan.
Sawir-qaadis adeegsade weydiinaya sababta isticmaalka connector-ku u dhacay Diseembar. Wakiilku wuxuu sharxayaa in hoos-u-dhacu ka dhashay cilad logging ah oo bilaabatay Nofeembar 13, 2025, taasoo sababtay isticmaal si hoose loo tiriyey kadib daahfurkii ChatGPT 5.1. Telemetry-gii hore wuu madhnaa ilaa dhacdo cusub ay noqotay xogta lagu kalsoonyahay.

Lakabka #5: Xusuus

  • Marka wakiilka la siiyo sixitaanno ama uu ogaado nuances ku saabsan su'aalo xogeed gaar ah, wuxuu awood u leeyahay inuu kaydiyo waxbarashadan si mararka dambe loo adeegsado, taas oo u oggolaanaysa inuu si joogto ah ula hagaago adeegsadayaashiisa. 
    • Natiijo ahaan, jawaabaha mustaqbalka waxay ka bilaabmaan saldhig sax ah oo ka fiican halkii ay mar walba isla dhibaatooyinka mar kale ula kulmi lahaayeen.
    • Ujeeddada xusuustu waa in la hayo oo dib loo isticmaalo sixitaanno, filtarrro, iyo xaddidaadyo aan muuqan oo muhiim u ah saxnaanta xogta balse adag in laga soo fahmo lakabyada kale oo keliya. 
    • Tusaale ahaan, hal kiis, wakiilku ma uusan garanayn sida loo sameeyo filter tijaabo analytics gaar ah (wuxuu ku tiirsanaa iswaafajin xaraf ah oo string gaar ah oo lagu qeexay gate tijaabo). Xusuustu waxay ahayd mid aad muhiim u ah halkan si loo hubiyo inuu si sax ah u filter-gareeyo, halkii uu si qiyaas ah iswaafajin xaraf u samayn lahaa.
  • Markaad wakiilka siiso sixitaan ama marka uu cashar ka helo wada hadalkaaga, wuxuu ku waydiin doonaa inaad kaydiso xusuustaas si marka xigta loo isticmaalo. 
    • Xusuusyo sidoo kale si gacanta ah ayay adeegsadayaashu u abuuri oo u tafatiri karaan.
    • Xusuusuhu waxay ku kala xaddidan yihiin heer caalami ah iyo heer shakhsi ah, qalabka wakiilkuna wuxuu fududeeyaa tafatirkooda.
Banner ogeysiis ah oo muujinaya “Data agent wants to save 2 learnings to memory,” oo leh shay summadaysan “ChatGPT Top-level Metrics” iyo farriin xaqiijin ah oo dhinaca midig ka akhrinaysa “Saved to global memory” oo leh calaamad sax ah oo cagaaran.

Lakabka #6: Macnaha Runtime

  • Marka uusan jirin macne hore oo jadwal u yaal ama xogta jirta ay duugowday, wakiilku wuxuu soo diri karaa weydiimo toos ah bakhaarka xogta si uu u eego uguna weydiiyo jadwalka si toos ah. Tani waxay u oggolaanaysaa inuu xaqiijiyo bandhig-qorshayaasha, fahmo xogta waqtiga-dhabta ah, una jawaabo si waafaqsan.
  • Wakiilku sidoo kale wuxuu la hadli karaa nidaamyo kale oo Data Platform ah (adeegga metadata, Airflow, Spark) marka loo baahdo si uu u helo macne xogeed oo ballaaran oo ka jira meel ka baxsan bakhaarka.

We run a daily offline pipeline that aggregates table usage, human annotations, and Codex-derived enrichment into a single, normalized representation. This enriched context is then converted into embeddings using the OpenAI embeddings API(ku furmaa daaqad cusub) and stored for retrieval. At query time, the agent pulls only the most relevant embedded context via retrieval-augmented generation(ku furmaa daaqad cusub) (RAG) instead of scanning raw metadata or logs. This makes table understanding fast and scalable, even across tens of thousands of tables, while keeping runtime latency predictable and low. Runtime queries are issued to our data warehouse live as needed.

Jaantus cinwaankiisu yahay “Soo-celinta macnaha ee wakiilka data-ga.” Lakabyada preprocessing-ka offline—isticmaalka jadwalka, qoraallada aadanaha, kobcinta Codex, aqoonta hay'adda, iyo xusuusta—waxay galaan RAG embeddings. Soo-celinta live-ku waxay muujinaysaa wakiilka oo weydiinaya database isagoo adeegsanaya semantic search ama exact text retrieval si uu u soo saaro macnaha runtime.

Together, these layers ensure the agent’s reasoning is grounded in OpenAI’s data, code, and institutional knowledge, dramatically reducing errors and improving answer quality.

Built to think and work like a teammate

One-shot answers work when the problem is clear, but most questions aren’t. More often, arriving at the correct result requires back-and-forth refinement and some course correction.

The agent is built to behave like a teammate you can reason with. It’s a conversational, always-on and handles both quick answers and iterative exploration.

It carries over complete context across turns, so users can ask follow-up questions, adjust their intent, or change direction without restating everything. If the agent starts heading down the wrong path, users can interrupt mid-analysis and redirect it, just like working with a human collaborator who listens instead of plowing ahead.

When instructions are unclear or incomplete, the agent proactively asks clarifying questions. If no response is provided, it applies sensible defaults to make progress. For example, if a user asks about business growth with no date range specified, it may assume the last seven or 30 days. These priors allow it to stay responsive and non-blocking while still converging on the right outcome.

The result is an agent that works well both when you know exactly what you want (e.g., “Tell me about this table”) and just as strong when you’re exploring (e.g., “I’m seeing a dip here, can we break this down by customer type and timeframe?”). 

After rollout, we observed that users frequently ran the same analyses for routine repetitive work. To expedite this, the agent's workflows package recurring analyses into reusable instruction sets. Examples include workflows for weekly business reports and table validations. By encoding context and best practices once, workflows streamline repeat analyses and ensure consistent results across users.

Bar gelin UI ah oo leh qoraalka placeholder-ka “Weydii su'aal xogeed.” Hoostiisa waxaa yaal badhan summadaysan “Adeegso workflow,” dhinaca midigna waxaa yaal astaamaha makarafoonka iyo dirista. Barku wuxuu leeyahay geeso wareegsan wuxuuna dul yaal asalka madow.

Moving fast without breaking trust

Building an always-on, evolving agent means quality can drift just as easily as it can improve. Without a tight feedback loop, regressions are inevitable and invisible. The only way to scale capability without breaking trust is through systematic evaluation.

In this section, we’ll discuss how we leverage OpenAI’s Evals API(ku furmaa daaqad cusub) to measure and protect the agent’s response quality.

Its Evals are built on curated sets of question-answer pairs. Each question targets an important metric or analytical pattern we care deeply about getting right, paired with a manually authored “golden” SQL query that produces the expected result. For each eval, we send the natural language question to its query-generation endpoint, execute the generated SQL, and compare the output against the result of the expected SQL.

Jaantus cinwaankiisu yahay “Tubada qiimeynta wakiilka data-ga.” Lammaaneyaasha Q&A eval ee leh SQL la filayo waxay galaan tallaabo soo-saaris oo soo saarta SQL iyo natiijooyin. OpenAI Evals wuxuu barbar dhigaa natiijooyinka la soo saaray iyo kuwa la filayay isagoo adeegsanaya isbarbardhig dataframe iyo SQL, kuna soo saaraya dhibco iyo caqliyeyn.

Evaluation doesn’t rely on naive string matching. Generated SQL can differ syntactically while still being correct, and result sets may include extra columns that don’t materially affect the answer. To account for this, we compare both the SQL and the resulting data, and feed these signals into OpenAI’s Evals grader. The grader produces a final score along with an explanation, capturing both correctness and acceptable variation.

These evals are like unit tests that run continuously during development to identify regressions as canaries in production; this allows us to catch issues early and confidently iterate as the agent's capabilities expand.

Agent security

Our agent plugs directly into OpenAI’s existing security and access-control model. It operates purely as an interface layer, inheriting and enforcing the same permissions and guardrails that govern OpenAI’s data. 

All of the agent’s access is strictly pass-through, meaning users can only query tables they already have permission to access. When access is missing, it flags this or falls back to alternative datasets the user is authorized to use.

Finally, it's built for transparency. Like any system, it can make mistakes. It exposes its reasoning process by summarizing assumptions and execution steps alongside each answer. When queries are executed, it links directly to the underlying results, allowing users to inspect raw data and verify every step of the analysis.

Lessons learned

Building our agent from scratch surfaced practical lessons about how agents behave, where they struggle, and what actually makes them reliable at scale.

Lesson #1: Less is More

Early on, we exposed our full tool set to the agent, and quickly ran into problems with overlapping functionality. While this redundancy can be helpful for specific custom cases and is more obvious to a human when manually invoking, it’s confusing to agents. To reduce ambiguity and improve reliability, we restricted and consolidated certain tool calls.

Lesson #2: Guide the Goal, Not the Path

We also discovered that highly prescriptive prompting degraded results. While many questions share a general analytical shape, the details vary enough that rigid instructions often pushed the agent down incorrect paths. By shifting to higher-level guidance and relying on GPT‑5’s reasoning to choose the appropriate execution path, the agent became more robust and produced better results.

Lesson #3: Meaning Lives in Code

Schemas and query history describe a table’s shape and usage, but its true meaning lives in the code that produces it. Pipeline logic captures assumptions, freshness guarantees, and business intent that never surface in SQL or metadata. By crawling the codebase with Codex, our agent understands how datasets are actually constructed and is able to better reason about what each table actually contains. It can answer “what’s in here” and “when can I use it” far more accurately than from warehouse signals alone. 

Same vision, new tools

We’re constantly working to improve our agent by increasing its ability to handle ambiguous questions, improving its reliability and accuracy with stronger validations, and integrating it more deeply into workflows. We believe it should blend naturally into how people already work, instead of functioning like a separate tool.

While our tooling will keep benefiting from underlying improvements in agent reasoning, validation, and self-correction, our team’s mission remains the same: seamlessly deliver fast, trustworthy data analysis across OpenAI’s data ecosystem.

Qoraa

Bonnie Xu, Aravind Suresh, Emma Tang

Mahadcelin

Mahad gaar ah waxaa leh kooxaha Data Productivity iyo Data Science, iyo sidoo kale adeegsadayaashayada badan ee iskaga kala yimid waaxyaha kala duwan tijaabadooda iyo jawaabcelintooda.