Id-data ssaħħaħ kif is-sistemi jitgħallmu, kif il-prodotti jevolvu, u kif il-kumpaniji jieħdu deċiżjonijiet. Iżda li tikseb tweġibiet malajr, b’mod korrett, u bil-kuntest it-tajjeb ħafna drabi huwa aktar diffiċli milli suppost. Biex nagħmlu dan aktar faċli hekk kif tikber OpenAI, bnejna l-aġent intern tagħna stess tad-data bl-AI magħmul apposta li jesplora u jirraġuna fuq il-pjattaforma tagħna stess.
L-aġent tagħna huwa għodda interna personalizzata għall-użu intern biss (mhux offerta esterna), mibnija speċifikament madwar id-data, il-permessi, u l-workflows ta’ OpenAI. Qed nuru kif bnejnieh u nużawh biex ngħinu nressqu eżempji tal-modi reali u b’impatt li bihom l-AI tista’ tappoġġja x-xogħol ta’ kuljum fit-timijiet tagħna. L-għodod ta’ OpenAI li użajna biex nibnuh u nħaddmuh (Codex, il-mudell flagship GPT‑5 tagħna, l-API ta’ Evals(jinfetaħ f’tieqa ġdida), u l-API ta’ Embeddings(jinfetaħ f’tieqa ġdida)) huma l-istess għodod li nagħmlu disponibbli għall-iżviluppaturi kullimkien.
L-aġent tad-data tagħna jħalli lill-impjegati jgħaddu minn mistoqsija għal insight fi ftit minuti, mhux ġranet. Dan inaqqas il-livell ta’ diffikultà biex tinġibed data u analiżi sfumata fil-funzjonijiet kollha, mhux biss mit-tim tad-data tagħna. Illum, timijiet madwar l-Inġinerija, Data Science, Go-To-Market, Finanzi, u Riċerka f’OpenAI jiddependu fuq l-aġent biex iwieġeb mistoqsijiet ta’ data b’impatt għoli. Pereżempju, jista’ jgħin iwieġeb kif tevalwa tnedijiet u tifhem is-saħħa tan-negozju, kollox permezz tal-format intuwittiv tal-lingwa naturali. L-aġent jgħaqqad għarfien fuq livell ta’ tabella mħaddem minn Codex ma’ kuntest tal-prodott u tal-organizzazzjoni. Is-sistema ta’ Memorja tiegħu, li tkompli titgħallem, tfisser li ttejjeb ruħha wkoll ma’ kull dawra.

F’din il-kariga, se nispjegaw għaliex kellna bżonn aġent tad-data bl-AI magħmul apposta, x’jagħmel il-kuntest tad-data tiegħu arrikkit bil-kodiċi u t-tagħlim waħdu tant utli, u x’tagħlimiet ksibna matul it-triq.
Il-pjattaforma tad-data ta’ OpenAI sservi aktar minn 3.5k utenti interni li jaħdmu fl-Inġinerija, Prodott, u Riċerka, mifruxa fuq aktar minn 600 petabytes ta’ data f’aktar minn 70k dataset. F’dak id-daqs, li sempliċement issib it-tabella t-tajba jista’ jkun wieħed mill-aktar partijiet li jieħdu ħin meta tagħmel analiżi.
Kif qal wieħed mill-utenti interni:
“Għandna ħafna tabelli li huma pjuttost simili, u nqatta’ ħafna ħin nipprova nifhem kif huma differenti u liema għandi nuża. Xi wħud jinkludu utenti mhux logged-in, oħrajn le. Xi wħud għandhom fields li jikkoinċidu; diffiċli tifhem x’inhu xiex.”
Anke bit-tabelli korretti magħżula, li tipproduċi riżultati korretti jista’ jkun diffiċli. L-analisti jridu jirraġunaw dwar id-data tat-tabelli u r-relazzjonijiet bejn it-tabelli biex jiżguraw li t-trasformazzjonijiet u l-filtri jiġu applikati sewwa. Modi komuni ta’ falliment—many-to-many joins, żbalji ta’ filter pushdown, u nulls mhux ittrattati—jistgħu jinvalidaw ir-riżultati bil-kwiet. Fuq l-iskala ta’ OpenAI, l-analisti m’għandhomx ikollhom jgħarrqu ħinhom fid-debugging tas-semantika tal-SQL jew tal-prestazzjoni tal-query: il-fokus tagħhom għandu jkun fuq id-definizzjoni tal-metriċi, il-validazzjoni tal-assunzjonijiet, u t-teħid ta’ deċiżjonijiet immexxija mid-data.

Dan l-istatement SQL twil aktar minn 180 linja. Mhux faċli tkun taf jekk aħniex qed nagħmlu join mat-tabelli t-tajbin u nagħmlu query fuq il-kolonni t-tajbin.
Ejjew nimxu pass pass fuq x’inhu l-aġent tagħna, kif jikkura l-kuntest, u kif jibqa’ jtejjeb lilu nnifsu.
L-aġent tagħna huwa mħaddem minn GPT‑5.2 u huwa mfassal biex jirraġuna fuq il-pjattaforma tad-data ta’ OpenAI. Huwa disponibbli kull fejn l-impjegati diġà jaħdmu: bħala aġent fi Slack, permezz ta’ interface web, ġewwa IDEs, fil-Codex CLI via MCP(jinfetaħ f’tieqa ġdida), u direttament fl-app interna ChatGPT ta’ OpenAI permezz ta’ konnettur MCP(jinfetaħ f’tieqa ġdida).
L-utenti jistgħu jistaqsu mistoqsijiet kumplessi u miftuħa li normalment jeħtieġu diversi rawnds ta’ esplorazzjoni manwali. Ħu dan il-prompt bħala eżempju, li juża dataset tat-test: “Għal vjaġġi bit-taxi ta’ NYC, liema pari ZIP minn pickup għal dropoff huma l-aktar mhux affidabbli, bl-akbar differenza bejn ħinijiet tipiċi u tal-agħar każ, u meta sseħħ dik il-varjabbiltà?”
L-aġent jimmaniġġa l-analiżi minn tarf sa tarf, mill-fehim tal-mistoqsija għall-esplorazzjoni tad-data, it-tħaddim ta’ queries, u s-sintesi tas-sejbiet.

It-tweġiba tal-aġent għall-mistoqsija.
Waħda mis-superpotenzi tal-aġent hija kif jirraġuna fuq il-problemi. Minflok ma jsegwi skript fiss, l-aġent jevalwa l-progress tiegħu stess. Jekk riżultat intermedju jidher ħażin (eż., jekk ikollu żero rows minħabba join jew filter żbaljat), l-aġent jinvestiga x’mar ħażin, jaġġusta l-approċċ tiegħu, u jerġa’ jipprova. Matul dan il-proċess, iżomm kuntest sħiħ, u jġorr it-tagħlimiet ’il quddiem bejn il-passi. Dan il-proċess magħluq, li jitgħallem waħdu iċċaqlaq l-iterazzjoni mill-utent għall-aġent innifsu, u jippermetti riżultati aktar malajr u analiżijiet b’kwalità ogħla b’mod konsistenti minn workflows manwali.

Ir-raġunament tal-aġent biex jidentifika l-aktar pari ta’ pickup–dropoff ta’ taxis ta’ NYC li mhumiex affidabbli.
L-aġent ikopri l-workflow sħiħ tal-analytics: is-sejbien tad-data, it-tħaddim ta’ SQL, u l-pubblikazzjoni ta’ notebooks u rapporti. Jifhem għarfien intern tal-kumpanija, jista’ jagħmel web search għal informazzjoni esterna, u jtejjeb maż-żmien permezz ta’ użu mgħallem u Memorja.
Tweġibiet ta’ kwalità għolja jiddependu fuq kuntest għani u preċiż. Mingħajr kuntest, anke mudelli b’saħħithom jistgħu jipproduċu riżultati ħżiena, bħal stimi żbaljati ħafna tal-għadd ta’ utenti jew interpretazzjoni ħażina tat-terminoloġija interna.

L-aġent mingħajr Memorja, mhux kapaċi jagħmel query b’mod effettiv.

Il-Memorja tal-aġent tippermetti queries aktar veloċi billi ssib it-tabelli korretti.
Biex nevitaw dawn il-modi ta’ falliment, l-aġent huwa mibni madwar bosta saffi ta’ kuntest li jiggrawndjawh fid-data u l-għarfien istituzzjonali ta’ OpenAI.
- Iggrawndjar tal-metadata: L-aġent jiddependi fuq metadata tal-iskema (ismijiet tal-kolonni u tipi ta’ data) biex jinforma l-kitba tal-SQL u juża lineage tat-tabelli (eż., relazzjonijiet tat-tabelli upstream u downstream) biex jipprovdi kuntest dwar kif tabelli differenti huma relatati.
- Inferenza minn queries: L-ingestjoni ta’ queries storiċi tgħin lill-aġent jifhem kif jikteb il-queries tiegħu u liema tabelli normalment jiġu joined flimkien.
- Deskrizzjonijiet ikkurati ta’ tabelli u kolonni pprovduti minn esperti tad-dominju, li jaqbdu l-intenzjoni, is-semantika, it-tifsira tan-negozju, u caveats magħrufa li ma jistgħux jiġu inferiti faċilment minn skemi jew queries tal-passat.
Il-metadata waħedha mhix biżżejjed. Biex tassew tiddistingwi bejn it-tabelli, trid tifhem kif inħolqu u minn fejn joriġinaw.
- Billi joħroġ definizzjoni ta’ tabella fil-livell tal-kodiċi, l-aġent jibni fehim aktar profond ta’ x’verament fih id-data.
- Sfumaturi dwar x’inhu maħżun fit-tabella u kif jiġi derivat minn analytics event jipprovdu informazzjoni żejda. Pereżempju, jista’ jagħti kuntest dwar l-uniċità tal-valuri, kemm-il darba tiġi aġġornata d-data tat-tabella, il-firxa tad-data (eż., jekk it-tabella teskludi ċerti fields, għandha dan il-livell ta’ granularità), eċċ.
- Dan jipprovdi kuntest ta’ użu msaħħaħ billi juri kif tintuża t-tabella lil hinn mill-SQL f’Spark, Python, u sistemi oħra tad-data.
- Dan ifisser li l-aġent jista’ jiddistingwi bejn tabelli li jidhru simili iżda huma differenti b’modi kritiċi. Pereżempju, jista’ jgħid jekk tabella tinkludix biss traffiku ChatGPT first-party. Dan il-kuntest jiġġedded ukoll awtomatikament, għalhekk jibqa’ aġġornat mingħajr manutenzjoni manwali.
- L-aġent jista’ jaċċessa Slack, Google Docs, u Notion, li jaqbdu kuntest kritiku tal-kumpanija bħal tnedijiet, inċidenti ta’ affidabbiltà, codenames u għodod interni, u d-definizzjonijiet kanoniċi u l-loġika ta’ kalkolu għal metriċi ewlenin.
- Dawn id-dokumenti jiġu ingestjati, embedded, u maħżuna b’metadata u permessi. Servizz ta’ retrieval jimmaniġġa l-kontroll tal-aċċess u l-caching waqt ir-runtime, u jippermetti lill-aġent jiġbed din l-informazzjoni b’mod effiċjenti u sigur.

- Meta l-aġent jingħata korrezzjonijiet jew jiskopri sfumaturi dwar ċerti mistoqsijiet tad-data, ikun jista’ jsalva dawn it-tagħlimiet għall-ħin li jmiss, u b’hekk ikompli jitjieb mal-utenti tiegħu.
- B’riżultat ta’ dan, tweġibiet futuri jibdew minn baseline aktar preċiża minflok ma jerġgħu jiltaqgħu ripetutament mal-istess problemi.
- L-għan tal-Memorja huwa li żżomm u terġa’ tuża korrezzjonijiet, filtri, u restrizzjonijiet mhux ovvji li huma kritiċi għall-korrettezza tad-data iżda diffiċli biex jiġu inferiti mis-saffi l-oħra biss.
- Pereżempju, f’każ wieħed, l-aġent ma kienx jaf kif jiffiltra għal esperiment partikolari tal-analytics (kien qed jiddependi fuq tqabbil ma’ string speċifika definita f’experiment gate). Hawnhekk il-Memorja kienet kruċjalment importanti biex tiżgura li jkun jista’ jiffiltra b’mod korrett, minflok jipprova jqabbel strings b’mod fuzzy.
- Meta tagħti korrezzjoni lill-aġent jew meta jsib tagħlima mill-konversazzjoni tiegħek, huwa se jħeġġek issalva dik il-Memorja għall-ħin li jmiss.
- Il-memorji jistgħu wkoll jinħolqu u jiġu editjati manwalment mill-utenti.
- Il-memorji huma scoped fil-livell globali u personali, u l-għodod tal-aġent jagħmluha faċli biex teditjahom.

- Meta ma jeżisti l-ebda kuntest preċedenti għal tabella jew meta l-informazzjoni eżistenti tkun qadima, l-aġent jista’ joħroġ queries live lid-data warehouse biex jispezzjona u jagħmel query lit-tabella direttament. Dan jippermettilu jivvalida skemi, jifhem id-data f’ħin reali, u jwieġeb kif xieraq.
- L-aġent jista’ wkoll jitkellem ma’ sistemi oħra tal-Data Platform (servizz tal-metadata, Airflow, Spark) kif meħtieġ biex jikseb kuntest usa’ tad-data li jeżisti barra l-warehouse.
We run a daily offline pipeline that aggregates table usage, human annotations, and Codex-derived enrichment into a single, normalized representation. This enriched context is then converted into embeddings using the OpenAI embeddings API(jinfetaħ f’tieqa ġdida) and stored for retrieval. At query time, the agent pulls only the most relevant embedded context via retrieval-augmented generation(jinfetaħ f’tieqa ġdida) (RAG) instead of scanning raw metadata or logs. This makes table understanding fast and scalable, even across tens of thousands of tables, while keeping runtime latency predictable and low. Runtime queries are issued to our data warehouse live as needed.
Together, these layers ensure the agent’s reasoning is grounded in OpenAI’s data, code, and institutional knowledge, dramatically reducing errors and improving answer quality.
One-shot answers work when the problem is clear, but most questions aren’t. More often, arriving at the correct result requires back-and-forth refinement and some course correction.
The agent is built to behave like a teammate you can reason with. It’s a conversational, always-on and handles both quick answers and iterative exploration.
It carries over complete context across turns, so users can ask follow-up questions, adjust their intent, or change direction without restating everything. If the agent starts heading down the wrong path, users can interrupt mid-analysis and redirect it, just like working with a human collaborator who listens instead of plowing ahead.
When instructions are unclear or incomplete, the agent proactively asks clarifying questions. If no response is provided, it applies sensible defaults to make progress. For example, if a user asks about business growth with no date range specified, it may assume the last seven or 30 days. These priors allow it to stay responsive and non-blocking while still converging on the right outcome.
The result is an agent that works well both when you know exactly what you want (e.g., “Tell me about this table”) and just as strong when you’re exploring (e.g., “I’m seeing a dip here, can we break this down by customer type and timeframe?”).
After rollout, we observed that users frequently ran the same analyses for routine repetitive work. To expedite this, the agent's workflows package recurring analyses into reusable instruction sets. Examples include workflows for weekly business reports and table validations. By encoding context and best practices once, workflows streamline repeat analyses and ensure consistent results across users.

Building an always-on, evolving agent means quality can drift just as easily as it can improve. Without a tight feedback loop, regressions are inevitable and invisible. The only way to scale capability without breaking trust is through systematic evaluation.
In this section, we’ll discuss how we leverage OpenAI’s Evals API(jinfetaħ f’tieqa ġdida) to measure and protect the agent’s response quality.
Its Evals are built on curated sets of question-answer pairs. Each question targets an important metric or analytical pattern we care deeply about getting right, paired with a manually authored “golden” SQL query that produces the expected result. For each eval, we send the natural language question to its query-generation endpoint, execute the generated SQL, and compare the output against the result of the expected SQL.
Evaluation doesn’t rely on naive string matching. Generated SQL can differ syntactically while still being correct, and result sets may include extra columns that don’t materially affect the answer. To account for this, we compare both the SQL and the resulting data, and feed these signals into OpenAI’s Evals grader. The grader produces a final score along with an explanation, capturing both correctness and acceptable variation.
These evals are like unit tests that run continuously during development to identify regressions as canaries in production; this allows us to catch issues early and confidently iterate as the agent's capabilities expand.
Our agent plugs directly into OpenAI’s existing security and access-control model. It operates purely as an interface layer, inheriting and enforcing the same permissions and guardrails that govern OpenAI’s data.
All of the agent’s access is strictly pass-through, meaning users can only query tables they already have permission to access. When access is missing, it flags this or falls back to alternative datasets the user is authorized to use.
Finally, it's built for transparency. Like any system, it can make mistakes. It exposes its reasoning process by summarizing assumptions and execution steps alongside each answer. When queries are executed, it links directly to the underlying results, allowing users to inspect raw data and verify every step of the analysis.
Building our agent from scratch surfaced practical lessons about how agents behave, where they struggle, and what actually makes them reliable at scale.
Early on, we exposed our full tool set to the agent, and quickly ran into problems with overlapping functionality. While this redundancy can be helpful for specific custom cases and is more obvious to a human when manually invoking, it’s confusing to agents. To reduce ambiguity and improve reliability, we restricted and consolidated certain tool calls.
We also discovered that highly prescriptive prompting degraded results. While many questions share a general analytical shape, the details vary enough that rigid instructions often pushed the agent down incorrect paths. By shifting to higher-level guidance and relying on GPT‑5’s reasoning to choose the appropriate execution path, the agent became more robust and produced better results.
Schemas and query history describe a table’s shape and usage, but its true meaning lives in the code that produces it. Pipeline logic captures assumptions, freshness guarantees, and business intent that never surface in SQL or metadata. By crawling the codebase with Codex, our agent understands how datasets are actually constructed and is able to better reason about what each table actually contains. It can answer “what’s in here” and “when can I use it” far more accurately than from warehouse signals alone.
We’re constantly working to improve our agent by increasing its ability to handle ambiguous questions, improving its reliability and accuracy with stronger validations, and integrating it more deeply into workflows. We believe it should blend naturally into how people already work, instead of functioning like a separate tool.
While our tooling will keep benefiting from underlying improvements in agent reasoning, validation, and self-correction, our team’s mission remains the same: seamlessly deliver fast, trustworthy data analysis across OpenAI’s data ecosystem.
Awtur
Rikonoxximenti
Ringrazzjamenti speċjali lit-timijiet Data Productivity u Data Science, kif ukoll lill-ħafna utenti cross-functional tagħna għall-esperimentazzjoni u l-feedback tagħhom.


