29 ਜਨਵਰੀ 2026

OpenAI ਦੇ ਇਨ-ਹਾਊਸ ਡਾਟਾ ਏਜੰਟ ਦੇ ਅੰਦਰ

Bonnie Xu, Aravind Suresh, ਅਤੇ Emma Tang ਵੱਲੋਂ

ਲੋਡ ਹੋ ਰਿਹਾ ਹੈ…

ਡਾਟਾ ਇਹ ਨਿਰਧਾਰਤ ਕਰਦਾ ਹੈ ਕਿ systems ਕਿਵੇਂ ਸਿੱਖਦੇ ਹਨ, products ਕਿਵੇਂ evolve ਕਰਦੇ ਹਨ, ਅਤੇ ਕੰਪਨੀਆਂ ਕਿਵੇਂ ਫੈਸਲੇ ਲੈਂਦੀਆਂ ਹਨ। ਪਰ ਜਵਾਬ ਜਲਦੀ, ਸਹੀ ਅਤੇ ਠੀਕ context ਨਾਲ ਲੈਣਾ ਅਕਸਰ ਜਿੰਨਾ ਆਸਾਨ ਹੋਣਾ ਚਾਹੀਦਾ ਹੈ, ਉਤਨਾ ਨਹੀਂ ਹੁੰਦਾ। OpenAI ਦੇ scale ਹੋਣ ਨਾਲ ਇਹ ਆਸਾਨ ਬਣਾਉਣ ਲਈ, ਅਸੀਂ ਆਪਣਾ ਖ਼ਾਸ ਇਨ-ਹਾਊਸ AI ਡਾਟਾ ਏਜੰਟ ਬਣਾਇਆ ਜੋ ਸਾਡੇ ਆਪਣੇ platform ਉੱਤੇ ਖੋਜ ਅਤੇ ਤਰਕ ਕਰਦਾ ਹੈ.

ਸਾਡਾ ਏਜੰਟ ਇੱਕ custom internal-only tool ਹੈ (ਬਾਹਰੀ offering ਨਹੀਂ), ਜੋ ਖਾਸ ਤੌਰ 'ਤੇ OpenAI ਦੇ data, permissions, ਅਤੇ workflows ਦੇ ਆਲੇ ਦੁਆਲੇ ਬਣਾਇਆ ਗਿਆ ਹੈ। ਅਸੀਂ ਦਿਖਾ ਰਹੇ ਹਾਂ ਕਿ ਅਸੀਂ ਇਹ ਕਿਵੇਂ ਬਣਾਇਆ ਅਤੇ ਵਰਤਿਆ, ਤਾਂ ਜੋ ਇਹ ਸਾਹਮਣੇ ਆ ਸਕੇ ਕਿ AI ਸਾਡੀਆਂ ਟੀਮਾਂ ਵਿੱਚ ਦਿਨ-ਪ੍ਰਤੀਦਿਨ ਦੇ ਕੰਮ ਨੂੰ ਅਸਲ ਅਤੇ ਪ੍ਰਭਾਵਸ਼ਾਲੀ ਢੰਗ ਨਾਲ ਕਿਵੇਂ ਸਹਾਰਾ ਦੇ ਸਕਦਾ ਹੈ। ਇਸਨੂੰ ਬਣਾਉਣ ਅਤੇ ਚਲਾਉਣ ਲਈ ਜਿਨ੍ਹਾਂ OpenAI tools ਦੀ ਅਸੀਂ ਵਰਤੋਂ ਕੀਤੀ (Codex, ਸਾਡਾ GPT‑5 ਫਲੈਗਸ਼ਿਪ ਮਾਡਲ, Evals API⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ), ਅਤੇ Embeddings API⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ)) ਉਹੀ tools ਹਨ ਜੋ ਅਸੀਂ ਹਰ ਥਾਂ developers ਲਈ ਉਪਲਬਧ ਕਰਵਾਉਂਦੇ ਹਾਂ.

ਸਾਡਾ ਡਾਟਾ ਏਜੰਟ ਕਰਮਚਾਰੀਆਂ ਨੂੰ ਸਵਾਲ ਤੋਂ insight ਤੱਕ ਮਿੰਟਾਂ ਵਿੱਚ ਲਿਆਉਂਦਾ ਹੈ, ਦਿਨਾਂ ਵਿੱਚ ਨਹੀਂ। ਇਸ ਨਾਲ data ਖਿੱਚਣ ਅਤੇ ਸੁਖਮ analysis ਕਰਨ ਦੀ ਰੁਕਾਵਟ ਸਭ functions ਲਈ ਘਟਦੀ ਹੈ, ਸਿਰਫ਼ ਸਾਡੀ ਡਾਟਾ ਟੀਮ ਲਈ ਨਹੀਂ। ਅੱਜ, OpenAI ਵਿੱਚ Engineering, Data Science, Go-To-Market, Finance, ਅਤੇ Research ਦੀਆਂ ਟੀਮਾਂ ਉੱਚ-ਪ੍ਰਭਾਵ ਵਾਲੇ ਡਾਟਾ ਸਵਾਲਾਂ ਦੇ ਜਵਾਬ ਲਈ ਇਸ ਏਜੰਟ 'ਤੇ ਨਿਰਭਰ ਕਰਦੀਆਂ ਹਨ। ਉਦਾਹਰਨ ਵਜੋਂ, ਇਹ launches ਦਾ ਮੁਲਾਂਕਣ ਕਰਨ ਅਤੇ business health ਸਮਝਣ ਵਿੱਚ ਮਦਦ ਕਰ ਸਕਦਾ ਹੈ, ਉਹ ਵੀ natural language ਦੇ ਸੁਗਮ format ਰਾਹੀਂ। ਏਜੰਟ Codex-ਚਲਿਤ table-level knowledge ਨੂੰ product ਅਤੇ organizational context ਨਾਲ ਜੋੜਦਾ ਹੈ। ਇਸਦਾ ਲਗਾਤਾਰ ਸਿੱਖਦਾ ਮੇਮੋਰੀ system ਹਰ turn ਨਾਲ ਸੁਧਰਦਾ ਵੀ ਜਾਂਦਾ ਹੈ.

ਸਕ੍ਰੀਨਸ਼ਾਟ ਜਿਸ ਵਿੱਚ ਇੱਕ ਵਰਤੋਂਕਾਰ Oct 6, 2025 ਦੇ ChatGPT WAU ਦੀ DevDay 2023 ਨਾਲ ਤੁਲਨਾ ਮੰਗਦਾ ਹੈ। ਏਜੰਟ 2025 ਲਈ ≈800M WAU ਅਤੇ 2023 ਲਈ ≈100M ਦੱਸਦਾ ਹੈ, ਨੋਟਾਂ ਵਿੱਚ +700M ਬਦਲਾਅ ਅਤੇ ਲਗਭਗ ~8× ਵਾਧਾ ਦਿਖਾਉਂਦਾ ਹੈ, ਜਿਸ ਤੋਂ ਬਾਅਦ ਵਿਆਖਿਆਤਮਕ ਸੰਦਰਭ ਦਿੱਤਾ ਗਿਆ ਹੈ।

ਇਸ ਪੋਸਟ ਵਿੱਚ, ਅਸੀਂ ਵੇਖਾਂਗੇ ਕਿ ਸਾਨੂੰ ਇੱਕ ਖ਼ਾਸ AI ਡਾਟਾ ਏਜੰਟ ਦੀ ਲੋੜ ਕਿਉਂ ਪਈ, ਇਸਦਾ code-enriched data context ਅਤੇ self-learning ਇਸਨੂੰ ਇੰਨਾ ਲਾਭਕਾਰੀ ਕਿਵੇਂ ਬਣਾਉਂਦੇ ਹਨ, ਅਤੇ ਰਾਹ ਵਿੱਚ ਅਸੀਂ ਕੀ ਸਿੱਖਿਆ.

ਸਾਨੂੰ ਇੱਕ custom tool ਦੀ ਲੋੜ ਕਿਉਂ ਪਈ

OpenAI ਦਾ data platform 3.5k ਤੋਂ ਵੱਧ internal users ਨੂੰ ਸੇਵਾ ਦਿੰਦਾ ਹੈ ਜੋ Engineering, Product, ਅਤੇ Research ਵਿੱਚ ਕੰਮ ਕਰਦੇ ਹਨ, ਅਤੇ 70k datasets ਵਿੱਚ ਫੈਲੇ 600 petabytes ਤੋਂ ਵੱਧ data ਨੂੰ ਕਵਰ ਕਰਦਾ ਹੈ। ਇਸ ਪੱਧਰ 'ਤੇ, ਸਿਰਫ਼ ਸਹੀ ਟੇਬਲ ਲੱਭਣਾ ਵੀ analysis ਕਰਨ ਦੇ ਸਭ ਤੋਂ ਸਮਾਂ-ਖਪਾਉ ਹਿੱਸਿਆਂ ਵਿੱਚੋਂ ਇੱਕ ਹੋ ਸਕਦਾ ਹੈ.

ਜਿਵੇਂ ਇੱਕ internal user ਨੇ ਕਿਹਾ:

“ਸਾਡੇ ਕੋਲ ਬਹੁਤ ਸਾਰੀਆਂ ਟੇਬਲਾਂ ਹਨ ਜੋ ਕਾਫ਼ੀ ਮਿਲਦੀਆਂ-ਜੁਲਦੀਆਂ ਹਨ, ਅਤੇ ਮੈਂ ਇਹ ਸਮਝਣ ਵਿੱਚ ਬਹੁਤ ਸਮਾਂ ਲਗਾ ਦਿੰਦਾ ਹਾਂ ਕਿ ਉਨ੍ਹਾਂ ਵਿੱਚ ਫ਼ਰਕ ਕੀ ਹੈ ਅਤੇ ਕਿਸਦੀ ਵਰਤੋਂ ਕਰਨੀ ਹੈ। ਕੁਝ ਵਿੱਚ logged-out users ਸ਼ਾਮਲ ਹਨ, ਕੁਝ ਵਿੱਚ ਨਹੀਂ। ਕੁਝ ਵਿੱਚ overlap ਕਰਦੇ fields ਹਨ; ਇਹ ਸਮਝਣਾ ਔਖਾ ਹੈ ਕਿ ਕੀ ਕੀ ਹੈ.”

ਸਹੀ ਟੇਬਲਾਂ ਚੁਣ ਲੈਣ ਦੇ ਬਾਵਜੂਦ ਵੀ, ਸਹੀ ਨਤੀਜੇ ਤਿਆਰ ਕਰਨਾ ਮੁਸ਼ਕਲ ਹੋ ਸਕਦਾ ਹੈ। Analysts ਨੂੰ table data ਅਤੇ table relationships ਬਾਰੇ ਤਰਕ ਕਰਨਾ ਪੈਂਦਾ ਹੈ ਤਾਂ ਜੋ transformations ਅਤੇ filters ਸਹੀ ਤਰੀਕੇ ਨਾਲ ਲਾਗੂ ਹੋਣ। ਆਮ failure modes—many-to-many joins, filter pushdown errors, ਅਤੇ unhandled nulls—ਚੁੱਪਚਾਪ ਨਤੀਜਿਆਂ ਨੂੰ ਅਵੈਧ ਕਰ ਸਕਦੇ ਹਨ। OpenAI ਦੇ scale 'ਤੇ, analysts ਨੂੰ SQL semantics ਜਾਂ query performance debug ਕਰਨ 'ਤੇ ਸਮਾਂ ਨਹੀਂ ਗਵਾਉਣਾ ਚਾਹੀਦਾ: ਉਨ੍ਹਾਂ ਦਾ ਧਿਆਨ metrics ਪਰਿਭਾਸ਼ਿਤ ਕਰਨ, assumptions validate ਕਰਨ, ਅਤੇ data-driven decisions ਲੈਣ 'ਤੇ ਹੋਣਾ ਚਾਹੀਦਾ ਹੈ.

SQL ਕੋਡ ਦਾ ਸਕ੍ਰੀਨਸ਼ਾਟ ਜੋ ਦੋ CTEs—order_enriched ਅਤੇ monthly_segment—ਨੂੰ ਪਰਿਭਾਸ਼ਿਤ ਕਰਦਾ ਹੈ, ਜੋ customer geography data ਨੂੰ join ਕਰਦੇ ਹਨ, order-month fields ਨਿਕਾਲਦੇ ਹਨ, ਅਤੇ monthly aggregates ਜਿਵੇਂ order counts, gross revenue, revenue with tax, ਅਤੇ average ship-to-receipt days ਦੀ ਗਿਣਤੀ ਕਰਦੇ ਹਨ। — ਇਹ SQL statement 180+ lines ਲੰਬੀ ਹੈ। ਇਹ ਜਾਣਨਾ ਆਸਾਨ ਨਹੀਂ ਕਿ ਅਸੀਂ ਸਹੀ ਟੇਬਲਾਂ ਨੂੰ join ਕਰ ਰਹੇ ਹਾਂ ਅਤੇ ਸਹੀ columns ਨੂੰ query ਕਰ ਰਹੇ ਹਾਂ.

ਇਹ ਕਿਵੇਂ ਕੰਮ ਕਰਦਾ ਹੈ

ਆਓ ਵੇਖੀਏ ਕਿ ਸਾਡਾ ਏਜੰਟ ਕੀ ਹੈ, ਇਹ context ਕਿਵੇਂ curate ਕਰਦਾ ਹੈ, ਅਤੇ ਇਹ ਆਪਣੇ ਆਪ ਨੂੰ ਕਿਵੇਂ ਲਗਾਤਾਰ ਸੁਧਾਰਦਾ ਹੈ.

ਸਾਡਾ ਏਜੰਟ GPT‑5.2 ਨਾਲ ਸੰਚਾਲਿਤ ਹੈ ਅਤੇ OpenAI ਦੇ data platform ਉੱਤੇ ਤਰਕ ਕਰਨ ਲਈ ਡਿਜ਼ਾਈਨ ਕੀਤਾ ਗਿਆ ਹੈ। ਇਹ ਹਰ ਉਸ ਥਾਂ ਉਪਲਬਧ ਹੈ ਜਿੱਥੇ ਕਰਮਚਾਰੀ ਪਹਿਲਾਂ ਹੀ ਕੰਮ ਕਰਦੇ ਹਨ: Slack ਏਜੰਟ ਵਜੋਂ, web interface ਰਾਹੀਂ, IDEs ਦੇ ਅੰਦਰ, MCP ਰਾਹੀਂ Codex CLI ਵਿੱਚ⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ), ਅਤੇ ਸਿੱਧਾ MCP connector ਰਾਹੀਂ OpenAI ਦੀ ਅੰਦਰੂਨੀ ChatGPT ਐਪ ਵਿੱਚ⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ).

“ਡਾਟਾ ਏਜੰਟ ਕਿਵੇਂ ਕੰਮ ਕਰਦਾ ਹੈ.” ਸਿਰਲੇਖ ਵਾਲਾ ਡਾਇਗ੍ਰਾਮ. Entrypoints—Agent-UI, Local Agent-MCP, Remote Agent-MCP, ਅਤੇ Slack Agent—ਇੱਕ Agent-API ਵਿੱਚ feed ਕਰਦੇ ਹਨ। API ਅੰਦਰੂਨੀ data knowledge ਅਤੇ company context ਨਾਲ ਜੁੜਦੀ ਹੈ, data warehouse ਅਤੇ platform sources ਨਾਲ sync ਕਰਦੀ ਹੈ, ਅਤੇ Agent-MCP ਰਾਹੀਂ GPT-5.2 ਮਾਡਲ ਨਾਲ requests ਦਾ ਅਦਾਨ-ਪ੍ਰਦਾਨ ਕਰਦੀ ਹੈ।

ਵਰਤੋਂਕਾਰ ਜਟਿਲ, open-ended ਸਵਾਲ ਪੁੱਛ ਸਕਦੇ ਹਨ ਜਿਨ੍ਹਾਂ ਲਈ ਆਮ ਤੌਰ 'ਤੇ manual exploration ਦੇ ਕਈ rounds ਦੀ ਲੋੜ ਹੁੰਦੀ ਹੈ। ਇਸ example ਪ੍ਰੌੰਪਟ ਨੂੰ ਦੇਖੋ, ਜੋ ਇੱਕ test data set ਵਰਤਦਾ ਹੈ: “NYC taxi trips ਲਈ, ਕਿਹੜੇ pickup-to-dropoff ZIP ਜੋੜੇ ਸਭ ਤੋਂ ਗੈਰ-ਭਰੋਸੇਯੋਗ ਹਨ, ਜਿਨ੍ਹਾਂ ਵਿੱਚ typical ਅਤੇ worst-case travel times ਵਿਚਕਾਰ ਸਭ ਤੋਂ ਵੱਡਾ gap ਹੈ, ਅਤੇ ਇਹ variability ਕਦੋਂ ਹੁੰਦੀ ਹੈ?”

ਏਜੰਟ analysis ਨੂੰ end-to-end ਸੰਭਾਲਦਾ ਹੈ, ਸਵਾਲ ਨੂੰ ਸਮਝਣ ਤੋਂ ਲੈ ਕੇ data explore ਕਰਨ, queries ਚਲਾਉਣ ਅਤੇ findings synthesize ਕਰਨ ਤੱਕ.

ਸਕ੍ਰੀਨਸ਼ਾਟ ਜਿਸ ਵਿੱਚ ਇੱਕ ਵਰਤੋਂਕਾਰ ਪੁੱਛਦਾ ਹੈ ਕਿ ਕਿਹੜੇ NYC taxi pickup→dropoff ZIP ਜੋੜੇ ਸਭ ਤੋਂ “ਗੈਰ-ਭਰੋਸੇਯੋਗ” ਹਨ। ਏਜੰਟ samples.nyctaxi.trips ਤੋਂ ~21k trips ਦੀ ਵਰਤੋਂ ਸਮਝਾਉਂਦਾ ਹੈ, typical (p50) ਬਨਾਮ worst-case (p95) ਪਰਿਭਾਸ਼ਿਤ ਕਰਦਾ ਹੈ, filters ਲਾਗੂ ਕਰਦਾ ਹੈ, ਅਤੇ ਦੱਸਦਾ ਹੈ ਕਿ ਹਰ ZIP ਜੋੜੇ ਦੀ ਸਭ ਤੋਂ ਲੰਬੀ trip ਕਦੋਂ ਹੋਈ, ਇਹ ਕਿਵੇਂ ਪਛਾਣਦਾ ਹੈ। — ਸਵਾਲ ਲਈ ਏਜੰਟ ਦਾ ਜਵਾਬ.

ਏਜੰਟ ਦੀਆਂ ਸਭ ਤੋਂ ਵੱਡੀਆਂ ਤਾਕਤਾਂ ਵਿੱਚੋਂ ਇੱਕ ਇਹ ਹੈ ਕਿ ਇਹ ਸਮੱਸਿਆਵਾਂ ਉੱਤੇ ਕਿਵੇਂ ਤਰਕ ਕਰਦਾ ਹੈ। ਇੱਕ fixed script ਦੀ ਪਾਲਣਾ ਕਰਨ ਦੀ ਬਜਾਇ, ਏਜੰਟ ਆਪਣੀ progress ਦਾ ਖੁਦ ਮੁਲਾਂਕਣ ਕਰਦਾ ਹੈ। ਜੇ ਕੋਈ intermediate result ਗਲਤ ਲੱਗੇ (ਉਦਾਹਰਨ ਵਜੋਂ, ਗਲਤ join ਜਾਂ filter ਕਾਰਨ zero rows ਹੋਣ), ਤਾਂ ਏਜੰਟ ਜਾਂਚਦਾ ਹੈ ਕਿ ਗਲਤੀ ਕਿੱਥੇ ਹੋਈ, ਆਪਣਾ approach ਠੀਕ ਕਰਦਾ ਹੈ, ਅਤੇ ਫਿਰ ਦੁਬਾਰਾ ਕੋਸ਼ਿਸ਼ ਕਰਦਾ ਹੈ। ਇਸ ਪੂਰੀ ਪ੍ਰਕਿਰਿਆ ਦੌਰਾਨ, ਇਹ ਪੂਰਾ context ਸੰਭਾਲ ਕੇ ਰੱਖਦਾ ਹੈ ਅਤੇ ਸਿੱਖੀਆਂ ਗੱਲਾਂ ਨੂੰ ਕਦਮਾਂ ਦੇ ਵਿਚਕਾਰ ਅੱਗੇ ਲੈ ਜਾਂਦਾ ਹੈ। ਇਹ closed-loop, self-learning ਪ੍ਰਕਿਰਿਆ iteration ਨੂੰ user ਤੋਂ ਹਟਾ ਕੇ ਏਜੰਟ ਦੇ ਅੰਦਰ ਲਿਆਉਂਦੀ ਹੈ, ਜਿਸ ਨਾਲ ਤੇਜ਼ ਨਤੀਜੇ ਅਤੇ manual workflows ਨਾਲੋਂ ਲਗਾਤਾਰ ਉੱਚ-ਗੁਣਵੱਤਾ ਵਾਲੇ analyses ਸੰਭਵ ਹੁੰਦੇ ਹਨ.

ਇੱਕ task workflow ਦਾ ਸਕ੍ਰੀਨਸ਼ਾਟ ਜੋ NYC taxi trip durations ਦੇ ਵਿਸ਼ਲੇਸ਼ਣ ਲਈ ਇੱਕ AI ਏਜੰਟ ਦੀ ਕਦਮ-ਦਰ-ਕਦਮ ਯੋਜਨਾ ਦਿਖਾਉਂਦਾ ਹੈ। ਇਸ ਵਿੱਚ goals, internal searches, schema inspection, code snippets, ਅਤੇ p50/p95 spreads ਦੀ ਗਿਣਤੀ, ਗੈਰ-ਭਰੋਸੇਯੋਗ ZIP ਜੋੜਿਆਂ ਦੀ ਪਛਾਣ, ਅਤੇ SQL queries ਦੀ ਯੋਜਨਾ ਬਾਰੇ reasoning ਸ਼ਾਮਲ ਹੈ। — ਸਭ ਤੋਂ ਗੈਰ-ਭਰੋਸੇਯੋਗ NYC taxi pickup–dropoff ਜੋੜਿਆਂ ਦੀ ਪਛਾਣ ਕਰਨ ਲਈ ਏਜੰਟ ਦੀ ਰੀਜ਼ਨਿੰਗ.

ਏਜੰਟ ਪੂਰੇ analytics workflow ਨੂੰ ਕਵਰ ਕਰਦਾ ਹੈ: data ਖੋਜਣਾ, SQL ਚਲਾਉਣਾ, ਅਤੇ notebooks ਤੇ reports ਪ੍ਰਕਾਸ਼ਿਤ ਕਰਨਾ। ਇਹ ਅੰਦਰੂਨੀ company knowledge ਸਮਝਦਾ ਹੈ, ਬਾਹਰੀ ਜਾਣਕਾਰੀ ਲਈ web search ਕਰ ਸਕਦਾ ਹੈ, ਅਤੇ ਸਿੱਖੀ usage ਅਤੇ ਮੇਮੋਰੀ ਰਾਹੀਂ ਸਮੇਂ ਦੇ ਨਾਲ ਸੁਧਰਦਾ ਹੈ.

ਸੰਦਰਭ ਹੀ ਸਭ ਕੁਝ ਹੈ

ਉੱਚ-ਗੁਣਵੱਤਾ ਵਾਲੇ ਜਵਾਬ ਸਮ੍ਰਿੱਧ, ਸਹੀ context 'ਤੇ ਨਿਰਭਰ ਕਰਦੇ ਹਨ। Context ਬਿਨਾਂ, ਮਜ਼ਬੂਤ ਮਾਡਲ ਵੀ ਗਲਤ ਨਤੀਜੇ ਦੇ ਸਕਦੇ ਹਨ, ਜਿਵੇਂ user counts ਦਾ ਬਹੁਤ ਗਲਤ ਅੰਦਾਜ਼ਾ ਲਗਾਣਾ ਜਾਂ ਅੰਦਰੂਨੀ terminology ਨੂੰ ਗਲਤ ਸਮਝਣਾ.

ਸਕ੍ਰੀਨਸ਼ਾਟ ਜਿਸ ਵਿੱਚ ਇੱਕ ਵਰਤੋਂਕਾਰ ਪੁੱਛਦਾ ਹੈ, “ਪਿਛਲੇ 30 ਦਿਨਾਂ ਲਈ ChatGPT Image Gen logged-in DAU ਕੀ ਸੀ?” ਅਤੇ ਹੇਠਾਂ ਸਥਿਤੀ ਲਾਈਨ ਦਿਖਾਉਂਦੀ ਹੈ ਕਿ ਏਜੰਟ “22m 41s ਲਈ ਕੰਮ ਕਰ ਰਿਹਾ ਹੈ,” ਜੋ ਦਰਸਾਂਦੀ ਹੈ ਕਿ ਇੱਕ ਲੰਬੀ ਚੱਲ ਰਹੀ ਕੁਏਰੀ ਜਾਰੀ ਹੈ।

ਮੇਮੋਰੀ ਤੋਂ ਬਿਨਾਂ ਏਜੰਟ, ਪ੍ਰਭਾਵਸ਼ਾਲੀ ਢੰਗ ਨਾਲ ਕੁਏਰੀ ਕਰਨ ਵਿੱਚ ਅਸਮਰੱਥ.

ਸਕ੍ਰੀਨਸ਼ਾਟ ਜਿਸ ਵਿੱਚ ਇੱਕ ਵਰਤੋਂਕਾਰ ਪੁੱਛਦਾ ਹੈ, “ਪਿਛਲੇ 30 ਦਿਨਾਂ ਲਈ ChatGPT Image Gen logged-in DAU ਕੀ ਸੀ?” ਸੁਨੇਹੇ ਦੇ ਹੇਠਾਂ, ਇੱਕ ਸਥਿਤੀ ਲਾਈਨ “1m 22s ਲਈ ਕੰਮ ਕੀਤਾ” ਦਿਖਾਉਂਦੀ ਹੈ, ਜੋ ਦਰਸਾਂਦੀ ਹੈ ਕਿ ਕੁਏਰੀ ਅਜੇ ਵੀ ਚੱਲ ਰਹੀ ਹੈ ਅਤੇ ਪੂਰੀ ਹੋਣ ਵਿੱਚ ਲੰਮਾ ਸਮਾਂ ਲੈ ਰਹੀ ਹੈ।

ਏਜੰਟ ਦੀ ਮੇਮੋਰੀ ਸਹੀ ਟੇਬਲਾਂ ਲੱਭ ਕੇ ਤੇਜ਼ ਕੁਏਰੀਆਂ ਸੰਭਵ ਬਣਾਉਂਦੀ ਹੈ.

ਇਨ੍ਹਾਂ failure modes ਤੋਂ ਬਚਣ ਲਈ, ਏਜੰਟ ਨੂੰ context ਦੀਆਂ ਕਈ ਪਰਤਾਂ ਦੇ ਆਲੇ ਦੁਆਲੇ ਬਣਾਇਆ ਗਿਆ ਹੈ ਜੋ ਇਸਨੂੰ OpenAI ਦੇ data ਅਤੇ institutional knowledge ਵਿੱਚ grounded ਰੱਖਦੀਆਂ ਹਨ.

“ਡਾਟਾ ਏਜੰਟ ਦੇ ਸੰਦਰਭ ਦੀਆਂ ਪਰਤਾਂ” ਸਿਰਲੇਖ ਵਾਲਾ ਡਾਇਗ੍ਰਾਮ ਜਿਸ ਵਿੱਚ ਛੇ stacked tiers ਦਿਖਾਏ ਗਏ ਹਨ: 1) Table Usage, 2) Human Annotations, 3) Codex Enrichment, 4) Institutional Knowledge, 5) ਮੇਮੋਰੀ, ਅਤੇ 6) Runtime Context. ਹਰ ਪਰਤ ਪਿਰਾਮਿਡ ਆਕਾਰ ਵਿੱਚ ਇਕ ਹੌਰਿਜ਼ਾਂਟਲ ਬਾਰ ਵਜੋਂ ਦਿਖਾਈ ਦਿੰਦੀ ਹੈ।

ਪਰਤ #1: Table Usage

Metadata grounding: ਏਜੰਟ SQL ਲਿਖਣ ਲਈ schema metadata (column names ਅਤੇ data types) 'ਤੇ ਨਿਰਭਰ ਕਰਦਾ ਹੈ ਅਤੇ ਇਹ context ਦੇਣ ਲਈ table lineage (ਜਿਵੇਂ upstream ਅਤੇ downstream table relationships) ਵਰਤਦਾ ਹੈ ਕਿ ਵੱਖ-ਵੱਖ ਟੇਬਲਾਂ ਦਾ ਆਪਸੀ ਸੰਬੰਧ ਕਿਵੇਂ ਹੈ.
Query inference: historical queries ਨੂੰ ingest ਕਰਨਾ ਏਜੰਟ ਨੂੰ ਸਮਝਣ ਵਿੱਚ ਮਦਦ ਕਰਦਾ ਹੈ ਕਿ ਆਪਣੀਆਂ queries ਕਿਵੇਂ ਲਿਖਣੀਆਂ ਹਨ ਅਤੇ ਕਿਹੜੀਆਂ ਟੇਬਲਾਂ ਆਮ ਤੌਰ 'ਤੇ ਇਕੱਠੇ join ਹੁੰਦੀਆਂ ਹਨ.

ਪਰਤ #2: Human Annotations

domain experts ਦੁਆਰਾ ਦਿੱਤੀਆਂ ਟੇਬਲਾਂ ਅਤੇ columns ਦੀਆਂ curated descriptions, ਜੋ intent, semantics, business meaning, ਅਤੇ ਜਾਣੇ-ਪਛਾਣੇ caveats ਨੂੰ capture ਕਰਦੀਆਂ ਹਨ, ਜਿਨ੍ਹਾਂ ਨੂੰ schemas ਜਾਂ ਪਿਛਲੀਆਂ queries ਤੋਂ ਆਸਾਨੀ ਨਾਲ infer ਨਹੀਂ ਕੀਤਾ ਜਾ ਸਕਦਾ.

ਕੇਵਲ metadata ਕਾਫ਼ੀ ਨਹੀਂ ਹੈ। ਟੇਬਲਾਂ ਨੂੰ ਅਸਲ ਵਿੱਚ ਵੱਖਰਾ ਸਮਝਣ ਲਈ, ਤੁਹਾਨੂੰ ਜਾਣਨਾ ਪੈਂਦਾ ਹੈ ਕਿ ਉਹ ਕਿਵੇਂ ਬਣੀਆਂ ਅਤੇ ਕਿੱਥੋਂ ਆਈਆਂ ਹਨ.

ਪਰਤ #3: Codex Enrichment

ਟੇਬਲ ਦੀ code-level definition ਨਿਕਾਲ ਕੇ, ਏਜੰਟ ਹੋਰ ਡੂੰਘੀ ਸਮਝ ਬਣਾਉਂਦਾ ਹੈ ਕਿ data ਅਸਲ ਵਿੱਚ ਕੀ ਰੱਖਦਾ ਹੈ.
- ਟੇਬਲ ਵਿੱਚ ਕੀ ਸਟੋਰ ਹੈ ਅਤੇ ਇਹ analytics event ਤੋਂ ਕਿਵੇਂ derive ਕੀਤਾ ਗਿਆ ਹੈ, ਇਸ ਬਾਰੇ ਨੁਅੰਸ ਵਾਧੂ ਜਾਣਕਾਰੀ ਦਿੰਦੇ ਹਨ। ਉਦਾਹਰਨ ਲਈ, ਇਹ values ਦੀ uniqueness, table data ਕਿੰਨੀ ਵਾਰ update ਹੁੰਦੀ ਹੈ, data ਦਾ scope (ਜਿਵੇਂ ਜੇ ਟੇਬਲ ਕੁਝ fields ਨੂੰ exclude ਕਰਦੀ ਹੈ, ਤਾਂ ਇਸਦਾ granularity ਪੱਧਰ ਕੀ ਹੈ), ਆਦਿ ਬਾਰੇ context ਦੇ ਸਕਦਾ ਹੈ.
ਇਸ ਨਾਲ usage context ਹੋਰ ਵਧਦਾ ਹੈ ਕਿਉਂਕਿ ਇਹ ਦਿਖਾਉਂਦਾ ਹੈ ਕਿ ਟੇਬਲ SQL ਤੋਂ ਇਲਾਵਾ Spark, Python, ਅਤੇ ਹੋਰ data systems ਵਿੱਚ ਕਿਵੇਂ ਵਰਤੀ ਜਾਂਦੀ ਹੈ.
ਇਸਦਾ ਮਤਲਬ ਹੈ ਕਿ ਏਜੰਟ ਉਹਨਾਂ ਟੇਬਲਾਂ ਵਿੱਚ ਫ਼ਰਕ ਕਰ ਸਕਦਾ ਹੈ ਜੋ ਵੇਖਣ ਵਿੱਚ ਮਿਲਦੀਆਂ ਹਨ ਪਰ ਮਹੱਤਵਪੂਰਨ ਢੰਗ ਨਾਲ ਵੱਖਰੀਆਂ ਹਨ। ਉਦਾਹਰਨ ਲਈ, ਇਹ ਦੱਸ ਸਕਦਾ ਹੈ ਕਿ ਕੋਈ ਟੇਬਲ ਸਿਰਫ਼ first-party ChatGPT traffic ਸ਼ਾਮਲ ਕਰਦੀ ਹੈ ਜਾਂ ਨਹੀਂ। ਇਹ context ਆਪਣੇ ਆਪ refresh ਵੀ ਹੁੰਦਾ ਰਹਿੰਦਾ ਹੈ, ਇਸ ਲਈ ਇਹ ਬਿਨਾਂ manual maintenance ਦੇ ਨਵਾਂ ਬਣਿਆ ਰਹਿੰਦਾ ਹੈ.

Diagram titled “Codex-enriched knowledge pipeline.” Popular tables feed into multiple Codex tasks, which extract details from the OpenAI codebase, including a table’s purpose, grain and primary keys, downstream usage patterns, alternate table options, and data freshness.

ਪਰਤ #4: Institutional Knowledge

ਏਜੰਟ Slack, Google Docs, ਅਤੇ Notion ਤੱਕ ਪਹੁੰਚ ਕਰ ਸਕਦਾ ਹੈ, ਜੋ launches, reliability incidents, internal codenames ਅਤੇ tools, ਅਤੇ key metrics ਲਈ canonical definitions ਅਤੇ computation logic ਵਰਗਾ ਮਹੱਤਵਪੂਰਨ company context capture ਕਰਦੇ ਹਨ.
ਇਨ੍ਹਾਂ ਦਸਤਾਵੇਜ਼ਾਂ ਨੂੰ metadata ਅਤੇ permissions ਨਾਲ ingest, embed, ਅਤੇ store ਕੀਤਾ ਜਾਂਦਾ ਹੈ। ਇੱਕ retrieval service runtime ਦੌਰਾਨ access control ਅਤੇ caching ਨੂੰ ਸੰਭਾਲਦੀ ਹੈ, ਜਿਸ ਨਾਲ ਏਜੰਟ ਇਹ ਜਾਣਕਾਰੀ ਕੁਸ਼ਲਤਾ ਅਤੇ ਸੁਰੱਖਿਅਤ ਢੰਗ ਨਾਲ ਖਿੱਚ ਸਕਦਾ ਹੈ.

ਸਕ੍ਰੀਨਸ਼ਾਟ ਜਿਸ ਵਿੱਚ ਇੱਕ ਵਰਤੋਂਕਾਰ ਪੁੱਛਦਾ ਹੈ ਕਿ December ਵਿੱਚ connector usage ਕਿਉਂ ਘੱਟੀ। ਏਜੰਟ ਸਮਝਾਉਂਦਾ ਹੈ ਕਿ ਇਹ ਕਮੀ 13 Nov, 2025 ਤੋਂ ਸ਼ੁਰੂ ਹੋਈ logging issue ਕਰਕੇ ਸੀ, ਜਿਸ ਕਾਰਨ ChatGPT 5.1 launch ਤੋਂ ਬਾਅਦ usage ਘੱਟ ਗਿਣੀ ਗਈ। Legacy telemetry ਖਾਲੀ ਹੋ ਗਈ ਸੀ ਜਦੋਂ ਤੱਕ ਇੱਕ ਨਵਾਂ event source of truth ਨਾ ਬਣ ਗਿਆ।

ਪਰਤ #5: ਮੇਮੋਰੀ

ਜਦੋਂ ਏਜੰਟ ਨੂੰ corrections ਦਿੱਤੀਆਂ ਜਾਂਦੀਆਂ ਹਨ ਜਾਂ ਇਹ ਕੁਝ data questions ਬਾਰੇ ਨੁਅੰਸ ਖੋਜਦਾ ਹੈ, ਤਾਂ ਇਹ ਇਨ੍ਹਾਂ learnings ਨੂੰ ਅਗਲੀ ਵਾਰ ਲਈ save ਕਰ ਸਕਦਾ ਹੈ, ਜਿਸ ਨਾਲ ਇਹ ਆਪਣੇ users ਨਾਲ ਲਗਾਤਾਰ ਸੁਧਰਦਾ ਰਹਿੰਦਾ ਹੈ.
- ਇਸ ਦੇ ਨਤੀਜੇ ਵਜੋਂ, ਭਵਿੱਖ ਦੇ ਜਵਾਬ ਵਾਰ-ਵਾਰ ਉਹੀ ਸਮੱਸਿਆਵਾਂ ਆਉਣ ਦੀ ਬਜਾਇ ਹੋਰ ਸਹੀ baseline ਤੋਂ ਸ਼ੁਰੂ ਹੁੰਦੇ ਹਨ.
- ਮੇਮੋਰੀ ਦਾ ਉਦੇਸ਼ non-obvious corrections, filters, ਅਤੇ constraints ਨੂੰ ਸੰਭਾਲਣਾ ਅਤੇ ਮੁੜ ਵਰਤਣਾ ਹੈ ਜੋ data correctness ਲਈ ਮਹੱਤਵਪੂਰਨ ਹਨ ਪਰ ਹੋਰ ਪਰਤਾਂ ਤੋਂ ਇਕੱਲਿਆਂ infer ਕਰਨਾ ਔਖਾ ਹੈ.
- ਉਦਾਹਰਨ ਲਈ, ਇੱਕ ਮਾਮਲੇ ਵਿੱਚ ਏਜੰਟ ਨੂੰ ਪਤਾ ਨਹੀਂ ਸੀ ਕਿ ਕਿਸੇ ਖ਼ਾਸ analytics experiment ਲਈ filter ਕਿਵੇਂ ਕਰਨਾ ਹੈ (ਇਹ ਇੱਕ experiment gate ਵਿੱਚ ਪਰਿਭਾਸ਼ਿਤ ਖ਼ਾਸ string ਨਾਲ matching 'ਤੇ ਨਿਰਭਰ ਸੀ)। ਇੱਥੇ ਮੇਮੋਰੀ ਬਹੁਤ ਮਹੱਤਵਪੂਰਨ ਸੀ ਤਾਂ ਜੋ ਇਹ ਸਹੀ filter ਕਰ ਸਕੇ, fuzzily string match ਕਰਨ ਦੀ ਕੋਸ਼ਿਸ਼ ਕਰਨ ਦੀ ਬਜਾਇ.
ਜਦੋਂ ਤੁਸੀਂ ਏਜੰਟ ਨੂੰ correction ਦਿੰਦੇ ਹੋ ਜਾਂ ਇਹ ਤੁਹਾਡੀ conversation ਤੋਂ ਕੋਈ learning ਲੱਭਦਾ ਹੈ, ਤਾਂ ਇਹ ਤੁਹਾਨੂੰ ਉਹ ਮੇਮੋਰੀ ਅਗਲੀ ਵਾਰ ਲਈ save ਕਰਨ ਦਾ prompt ਦੇਵੇਗਾ.
- Memories ਨੂੰ users ਹੱਥੋਂ ਵੀ ਬਣਾਉਂਦੇ ਅਤੇ edit ਕਰ ਸਕਦੇ ਹਨ.
- Memories global ਅਤੇ personal level 'ਤੇ scoped ਹੁੰਦੀਆਂ ਹਨ, ਅਤੇ ਏਜੰਟ ਦਾ tooling ਉਨ੍ਹਾਂ ਨੂੰ edit ਕਰਨਾ ਆਸਾਨ ਬਣਾਉਂਦਾ ਹੈ.

ਨੋਟੀਫਿਕੇਸ਼ਨ ਬੈਨਰ ਜਿਸ ਵਿੱਚ “Data agent wants to save 2 learnings to memory” ਦਿਖਾਇਆ ਗਿਆ ਹੈ, ਨਾਲ “ChatGPT Top-level Metrics” ਲੇਬਲ ਵਾਲਾ ਆਈਟਮ ਹੈ, ਅਤੇ ਸੱਜੇ ਪਾਸੇ ਪੁਸ਼ਟੀ ਸੁਨੇਹਾ “Saved to global memory” ਹਰੇ checkmark ਨਾਲ ਦਿਖਦਾ ਹੈ।

ਪਰਤ #6: Runtime Context

ਜਦੋਂ ਕਿਸੇ ਟੇਬਲ ਲਈ ਪਹਿਲਾਂ ਤੋਂ context ਮੌਜੂਦ ਨਾ ਹੋਵੇ ਜਾਂ ਮੌਜੂਦਾ ਜਾਣਕਾਰੀ ਪੁਰਾਣੀ ਹੋਵੇ, ਤਾਂ ਏਜੰਟ data warehouse ਨੂੰ live queries ਭੇਜ ਕੇ ਟੇਬਲ ਨੂੰ ਸਿੱਧਾ inspect ਅਤੇ query ਕਰ ਸਕਦਾ ਹੈ। ਇਸ ਨਾਲ ਇਹ schemas validate ਕਰ ਸਕਦਾ ਹੈ, real-time ਵਿੱਚ data ਨੂੰ ਸਮਝ ਸਕਦਾ ਹੈ, ਅਤੇ ਉਸ ਅਨੁਸਾਰ ਜਵਾਬ ਦੇ ਸਕਦਾ ਹੈ.
ਵਿਆਪਕ data context ਲੈਣ ਲਈ, ਜੋ warehouse ਤੋਂ ਬਾਹਰ ਮੌਜੂਦ ਹੈ, ਏਜੰਟ ਲੋੜ ਅਨੁਸਾਰ ਹੋਰ Data Platform systems (metadata service, Airflow, Spark) ਨਾਲ ਵੀ ਗੱਲ ਕਰ ਸਕਦਾ ਹੈ.

We run a daily offline pipeline that aggregates table usage, human annotations, and Codex-derived enrichment into a single, normalized representation. This enriched context is then converted into embeddings using the OpenAI embeddings API⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ) and stored for retrieval. At query time, the agent pulls only the most relevant embedded context via retrieval-augmented generation⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ) (RAG) instead of scanning raw metadata or logs. This makes table understanding fast and scalable, even across tens of thousands of tables, while keeping runtime latency predictable and low. Runtime queries are issued to our data warehouse live as needed.

“ਡਾਟਾ ਏਜੰਟ ਵਿੱਚ context retrieval.” ਸਿਰਲੇਖ ਵਾਲਾ ਡਾਇਗ੍ਰਾਮ. Offline preprocessing layers—table usage, human annotations, Codex enrichment, institutional knowledge, ਅਤੇ ਮੇਮੋਰੀ—RAG embeddings ਵਿੱਚ feed ਕਰਦੀਆਂ ਹਨ। Live retrieval ਦਿਖਾਉਂਦਾ ਹੈ ਕਿ ਏਜੰਟ runtime context ਬਣਾਉਣ ਲਈ semantic search ਜਾਂ exact text retrieval ਰਾਹੀਂ ਇੱਕ database ਨੂੰ query ਕਰਦਾ ਹੈ।

Together, these layers ensure the agent’s reasoning is grounded in OpenAI’s data, code, and institutional knowledge, dramatically reducing errors and improving answer quality.

Built to think and work like a teammate

One-shot answers work when the problem is clear, but most questions aren’t. More often, arriving at the correct result requires back-and-forth refinement and some course correction.

The agent is built to behave like a teammate you can reason with. It’s a conversational, always-on and handles both quick answers and iterative exploration.

It carries over complete context across turns, so users can ask follow-up questions, adjust their intent, or change direction without restating everything. If the agent starts heading down the wrong path, users can interrupt mid-analysis and redirect it, just like working with a human collaborator who listens instead of plowing ahead.

When instructions are unclear or incomplete, the agent proactively asks clarifying questions. If no response is provided, it applies sensible defaults to make progress. For example, if a user asks about business growth with no date range specified, it may assume the last seven or 30 days. These priors allow it to stay responsive and non-blocking while still converging on the right outcome.

The result is an agent that works well both when you know exactly what you want (e.g., “Tell me about this table”) and just as strong when you’re exploring (e.g., “I’m seeing a dip here, can we break this down by customer type and timeframe?”).

After rollout, we observed that users frequently ran the same analyses for routine repetitive work. To expedite this, the agent's workflows package recurring analyses into reusable instruction sets. Examples include workflows for weekly business reports and table validations. By encoding context and best practices once, workflows streamline repeat analyses and ensure consistent results across users.

UI input bar ਜਿਸ ਵਿੱਚ placeholder text “ਡਾਟਾ ਸਵਾਲ ਪੁੱਛੋ.” ਹੈ। ਇਸਦੇ ਹੇਠਾਂ “ਇੱਕ workflow ਵਰਤੋ” ਲੇਬਲ ਵਾਲਾ ਬਟਨ ਹੈ, ਅਤੇ ਸੱਜੇ ਪਾਸੇ microphone ਅਤੇ send icons ਹਨ। ਬਾਰ ਦੇ ਕੋਨੇ ਗੋਲ ਹਨ ਅਤੇ ਇਹ ਗੂੜ੍ਹੇ background ਉੱਤੇ ਹੈ।

Moving fast without breaking trust

Building an always-on, evolving agent means quality can drift just as easily as it can improve. Without a tight feedback loop, regressions are inevitable and invisible. The only way to scale capability without breaking trust is through systematic evaluation.

In this section, we’ll discuss how we leverage OpenAI’s Evals API⁠(ਨਵੀਂ ਵਿੰਡੋ ਵਿੱਚ ਖੁੱਲ੍ਹਦਾ ਹੈ) to measure and protect the agent’s response quality.

Its Evals are built on curated sets of question-answer pairs. Each question targets an important metric or analytical pattern we care deeply about getting right, paired with a manually authored “golden” SQL query that produces the expected result. For each eval, we send the natural language question to its query-generation endpoint, execute the generated SQL, and compare the output against the result of the expected SQL.

“ਡਾਟਾ ਏਜੰਟ ਦੀ evaluation pipeline.” ਸਿਰਲੇਖ ਵਾਲਾ ਡਾਇਗ੍ਰਾਮ. ਉਮੀਦ ਕੀਤੀ SQL ਵਾਲੇ Q&A eval pairs ਇੱਕ generation step ਵਿੱਚ ਜਾਂਦੇ ਹਨ ਜੋ SQL ਅਤੇ results ਤਿਆਰ ਕਰਦਾ ਹੈ. OpenAI Evals dataframe ਅਤੇ SQL comparison ਦੀ ਵਰਤੋਂ ਕਰਕੇ generated ਅਤੇ expected results ਦੀ ਤੁਲਨਾ ਕਰਦਾ ਹੈ, ਅਤੇ score ਤੇ reasoning ਨਿਕਾਸ਼ ਕਰਦਾ ਹੈ।

Evaluation doesn’t rely on naive string matching. Generated SQL can differ syntactically while still being correct, and result sets may include extra columns that don’t materially affect the answer. To account for this, we compare both the SQL and the resulting data, and feed these signals into OpenAI’s Evals grader. The grader produces a final score along with an explanation, capturing both correctness and acceptable variation.

These evals are like unit tests that run continuously during development to identify regressions as canaries in production; this allows us to catch issues early and confidently iterate as the agent's capabilities expand.

Agent security

Our agent plugs directly into OpenAI’s existing security and access-control model. It operates purely as an interface layer, inheriting and enforcing the same permissions and guardrails that govern OpenAI’s data.

All of the agent’s access is strictly pass-through, meaning users can only query tables they already have permission to access. When access is missing, it flags this or falls back to alternative datasets the user is authorized to use.

Finally, it's built for transparency. Like any system, it can make mistakes. It exposes its reasoning process by summarizing assumptions and execution steps alongside each answer. When queries are executed, it links directly to the underlying results, allowing users to inspect raw data and verify every step of the analysis.

Lessons learned

Building our agent from scratch surfaced practical lessons about how agents behave, where they struggle, and what actually makes them reliable at scale.

Lesson #1: Less is More

Early on, we exposed our full tool set to the agent, and quickly ran into problems with overlapping functionality. While this redundancy can be helpful for specific custom cases and is more obvious to a human when manually invoking, it’s confusing to agents. To reduce ambiguity and improve reliability, we restricted and consolidated certain tool calls.

Lesson #2: Guide the Goal, Not the Path

We also discovered that highly prescriptive prompting degraded results. While many questions share a general analytical shape, the details vary enough that rigid instructions often pushed the agent down incorrect paths. By shifting to higher-level guidance and relying on GPT‑5’s reasoning to choose the appropriate execution path, the agent became more robust and produced better results.

Lesson #3: Meaning Lives in Code

Schemas and query history describe a table’s shape and usage, but its true meaning lives in the code that produces it. Pipeline logic captures assumptions, freshness guarantees, and business intent that never surface in SQL or metadata. By crawling the codebase with Codex, our agent understands how datasets are actually constructed and is able to better reason about what each table actually contains. It can answer “what’s in here” and “when can I use it” far more accurately than from warehouse signals alone.

Same vision, new tools

We’re constantly working to improve our agent by increasing its ability to handle ambiguous questions, improving its reliability and accuracy with stronger validations, and integrating it more deeply into workflows. We believe it should blend naturally into how people already work, instead of functioning like a separate tool.

While our tooling will keep benefiting from underlying improvements in agent reasoning, validation, and self-correction, our team’s mission remains the same: seamlessly deliver fast, trustworthy data analysis across OpenAI’s data ecosystem.