Maanta, waxaan sii deynaynaa GPT‑5.4 gudaha ChatGPT (sida GPT‑5.4 Thinking), API-ga, iyo Codex. Waa noockeenna ugu awoodda iyo hufnaanta badan ee ugu casriyeysan ee shaqada xirfadeed. Waxaan sidoo kale sii deynaynaa GPT‑5.4 Pro gudaha ChatGPT iyo API-ga, dadka doonaya waxqabadka ugu badan ee hawlaha adag.
GPT‑5.4 waxay isu keentaa kuwa ugu fiican ee horumarradeennii dhowaa ee caqliyeynta, kood-samaynta, iyo socod-shaqooyinka wakiilnimada hal nooc oo ugu casriyeysan. Waxay ku darsataa awoodaha kood-samaynta ee hoggaaminaya warshadda ee GPT‑5.3‑Codex iyadoo hagaajinaysa sida noocku uga shaqeeyo qalabka, deegaanada software-ka, iyo hawlaha xirfadeed ee ku lug leh xaashiyaha xogta, bandhigyada, iyo dukumeentiyada. Natiijadu waa nooc si sax ah, waxtar leh, oo hufan u qabta shaqo dhab ah oo adag—iyadoo kuu keenaysa wixii aad codsatay adigoo leh dib-u-noqosho yar.
Gudaha ChatGPT, GPT‑5.4 Thinking hadda waxay bixin kartaa qorshe hordhac ah oo fikirkeeda ah, si aad jiho uga beddeli karto bartamaha jawaabta inta ay shaqaynayso, oo aad u gaarto wax-soo-saar kama dambays ah oo si dhow ugu waafaqsan waxaad u baahan tahay adigoon u baahnayn wareegyo dheeraad ah. GPT‑5.4 Thinking sidoo kale waxay hagaajisaa cilmi-baarista qoto-dheer ee webka, gaar ahaan weydiimaha aadka u gaar ah, iyadoo si ka wanaagsan u ilaalinaysa macnaha su'aalaha u baahan feker dheer. Isku darka hagaajimadani waxay ka dhigan yihiin jawaabo tayo sare leh oo si dhakhso leh ku yimaada oo weli ku habboon hawsha gacanta ku jirta.
Gudaha Codex iyo API-ga, GPT‑5.4 waa nooca guud-ujeeddo ee ugu horreeya ee aan sii deynay oo leh awoodo isticmaalka kombiyuutarka oo asal ah oo heerkii ugu casriyeysnaa ah, taas oo suuragelinaysa wakiillo inay ku shaqeeyaan kombiyuutarro oo ay fuliyaan socod-shaqooyin adag oo ku baahsan codsiyo kala duwan. Waxay taageertaa ilaa 1M tokens oo macne ah, taas oo u oggolaanaysa wakiillada inay qorsheeyaan, fuliyaan, oo xaqiijiyaan hawlo ku baahsan muddooyin dheer. GPT‑5.4 waxay sidoo kale hagaajisaa sida noocyadu uga shaqeeyaan nidaamyada waaweyn ee qalab iyo xiriiriye-yaal iyadoo leh raadinta qalabka, taas oo ka caawisa wakiillada inay si hufan u helaan una adeegsadaan qalabka saxda ah iyagoo aan u hurayn garaadka. Ugu dambayn, GPT‑5.4 waa nooca caqliyeynta ugu hufan token ahaan ilaa hadda, iyadoo adeegsata token aad uga yar si ay u xalliso dhibaatooyinka marka loo eego GPT‑5.2—taas oo u tarjunta isticmaal token oo yaraada iyo xawaare dheereeya.
Isku darka horumarka ku yimid caqliyeynta guud, kood-samaynta, iyo shaqada aqooneed ee xirfadeed, GPT‑5.4 waxay suuragelinaysaa wakiillo la isku halayn karo, socod-shaqo horumariyayaal oo degdeg ah, iyo wax-soo-saar tayo sare leh oo ku baahsan ChatGPT, API-ga, iyo Codex.
GPT‑5.4 | GPT‑5.3‑Codex | GPT‑5.2 | |
GDPval (wins or ties) | 83.0% | 70.9% | 70.9% |
SWE-Bench Pro (Public) | 57.7% | 56.8% | 55.6% |
OSWorld-Verified | 75.0% | 74.0%* | 47.3% |
Toolathlon | 54.6% | 51.9% | 46.3% |
BrowseComp | 82.7% | 77.3% | 65.8% |
*Previously reported as 64.7%. GPT‑5.3‑Codex achieves 74.0% with a newly introduced API parameter that preserves the original image resolution.
Iyada oo lagu dhisayo awoodaha caqliyeynta guud ee GPT‑5.2, GPT‑5.4 waxay keentaa natiijooyin ka sii joogto badan oo si fiican loo sifeeyey hawlaha dunida dhabta ah ee muhiimka u ah xirfadlayaasha.
GDPval, oo tijaabiya awoodda wakiillada ee soo saarista shaqo aqooneed si fiican loo qeexay oo ku baahsan 44 shaqo, GPT‑5.4 waxay gaartaa heer cusub oo ugu sarreeya, iyadoo la jaanqaadaysa ama ka sarreysa xirfadlayaasha warshadaha 83.0% isbarbardhigyada, marka loo eego 70.9% ee GPT‑5.2.
GDPval gudaheeda, noocyadu waxay isku dayaan shaqo aqooneed si fiican loo qeexay oo ku baahsan 44 shaqo oo ka kala yimid 9-ka warshadood ee ugu sarreeya ee wax ku biiriya GDP-ga Maraykanka. Hawluhu waxay dalbadaan wax-soo-saar shaqo oo dhab ah, sida bandhigyo iib, xaashiyo xisaabeed, jadwalyo daryeel degdeg ah, jaantusyo wax-soo-saar, ama fiidiyowyo gaagaaban. Dadaalka caqliyeynta waxaa loo dejiyey xhigh GPT‑5.4 iyo heavy GPT‑5.2 (heer wax yar ka hooseeya gudaha ChatGPT).
“GPT-5.4 waa nooca ugu fiican ee aan waligeen tijaabinay. Hadda waa kan ugu sarreeya shaxda hoggaanka ee bartilmaameedkeenna APEX-Agents, kaas oo cabbira waxqabadka nooca ee shaqada adeegyada xirfadeed. Waxay ku fiican tahay abuurista wax-soo-saarro muddada-dheer ah sida slide decks, noocyo maaliyadeed, iyo falanqayn sharci, iyadoo bixisa waxqabad heer sare ah isla markaana ku socota si ka dheereeya kana jaban noocyada ugu casriyeysan ee tartamaya.”
Waxaan si gaar ah diiradda u saarnay hagaajinta awoodda GPT‑5.4 ee abuurista iyo tafatirka xaashiyaha xogta, bandhigyada, iyo dukumeentiyada. Bartilmaameed gudaha ah oo hawlaha noocaynta xaashiyaha xogta ah oo falanqeeye yar oo bangi maalgashi samayn karo, GPT‑5.4 waxay gaartaa dhibco celcelis ah oo ah 87.3%, marka loo eego 68.4% ee GPT‑5.2. Qayb ka mid ah weydiimaha qiimeynta bandhigyada, qiimeeyeyaasha bini'aadamku waxay doorbideen bandhigyada GPT‑5.4 68.0% waqtiga marka loo eego kuwa GPT‑5.2 sababtoo ah bilic ka xooggan, kala duwanaansho muuqaal oo weyn, iyo adeegsiga soo saarida sawirka oo waxtar badan.

Dukumeentiyada waxaa la soo saaray iyadoo dadaalka caqliyeynta loo dejiyey xhigh
Waxaad awoodahan ku tijaabin kartaa ChatGPT adigoo adeegsanaya GPT‑5.4 Thinking ama Pro. Haddii aad tahay macaamiil Enterprise ah, waxaan kugula talineynaa inaad isticmaasho ChatGPT for Excel add-in(ku furmaa daaqad cusub) ee aannu hadda sii deynay, kaas oo sidoo kale maanta la furay. Waxaan sidoo kale cusboonaysiinay spreadsheet(ku furmaa daaqad cusub) iyo presentation skills(ku furmaa daaqad cusub) ee laga heli karo Codex iyo API-ga.
Si GPT‑5.4 uga sii fiicnaato shaqada dunida dhabta ah, waxaan sii wadnay horumarkeenna ku aaddan dhimista mala-awaalka khaldan iyo khaladaadka. GPT‑5.4 waa noockeenna ugu xaqiiqda badan ilaa hadda: qayb weydiimo aqoonsi laga saaray ah oo isticmaalayaashu ku calaamadiyeen khaladaad xaqiiqo ah, sheegashooyinka gaarka ah ee GPT‑5.4 waxay 33% uga yar yihiin inay been noqdaan, jawaabaheeda oo dhammanna 18% uga yar yihiin inay ku jiraan wax khalad ah, marka loo eego GPT‑5.2.
“GPT-5.4 waxay dejisaa halbeeg cusub oo shaqada sharci ee dukumeenti-badan. Qiimeynteenna BigLaw Bench, waxay ka heshay 91%. Marka loo eego noocyada kale, GPT-5.4 hadda way ka fiican tahay qaabeynta falanqayn macaamil oo adag, ilaalinta saxnaanta qandaraasyada dhaadheer, iyo bixinta heerka sare ee faahfaahinta ay xirfadlayaasha sharcigu u baahan yihiin.”
GPT‑5.4 waa noockeenna guud-ujeeddo ee ugu horreeya ee leh awoodo isticmaalka kombiyuutarka oo asal ah waxayna calaamad u tahay tallaabo weyn oo hore loogu qaaday horumariyayaasha iyo wakiilladaba. Waa nooca ugu fiican ee hadda u furan horumariyayaasha dhisaya wakiillo dhammaystira hawlo dhab ah oo ku baahsan bogag iyo nidaamyo software.
Waxaan GPT‑5.4 u naqshadeynay inay waxqabad sare ku yeelato noocyo badan oo hawlo isticmaalka kombiyuutarka ah. Aad bay ugu fiican tahay qorista kood si loogu shaqeeyo kombiyuutarrada iyadoo la adeegsanayo maktabado sida Playwright, iyo sidoo kale bixinta amarro jiir iyo kiiboodh iyadoo laga jawaabayo sawir-qaadyada shaashadda. Habdhaqankeeda waxaa lagu hagikaraa fariimaha horumariyaha, taas oo ka dhigan in horumariyayaashu ay habdhaqanka u waafajin karaan adeegsiyo gaar ah. Horumariyayaashu xitaa waxay habayn karaan habdhaqanka badbaadada ee noocka si uu ugu habboonaado heerar kala duwan oo dulqaad halis ah iyagoo qeexaya siyaasad xaqiijin oo gaar ah.
Waxqabadka iyo dabacsanaanta noocka ayaa ka muuqda benchmark-yada tijaabiya isticmaalka kombiyuutarka xaalado kala duwan. OSWorld-Verified, oo cabbira awoodda noocka ee hagidda deegaanka desktop-ka iyadoo la adeegsanayo sawir-qaadyo shaashad iyo ficillo kiiboodh/jiir, GPT‑5.4 waxay gaartaa heer guul oo ugu casriyeysan oo ah 75.0%, taas oo aad uga sarreysa 47.3% ee GPT‑5.2, isla markaana dhaafaysa waxqabadka bini'aadamka oo ah 72.4%.1
WebArena-Verified, oo tijaabisa isticmaalka browser-ka, GPT‑5.4 waxay gaartaa heer guul oo hoggaaminaya oo ah 67.3% marka ay adeegsato isdhexgal ku salaysan DOM iyo sawir-qaadyo shaashad labadaba, marka loo eego 65.4% ee GPT‑5.2. Online-Mind2Web, oo sidoo kale tijaabisa isticmaalka browser-ka, GPT‑5.4 waxay gaartaa heer guul oo ah 92.8% iyadoo adeegsanaysa oo keliya indho-indheyn ku salaysan sawir-qaadyo shaashad, taas oo ka fiican Habka Wakiilka ee ChatGPT Atlas, kaas oo gaadha heer guul oo ah 70.9%.
Sugitaanka qalabku waa marka kaaliyehu joogsado si uu u sugo jawaabaha qalabka. Haddii 3 qalab la waco isku mar, kadibna 3 kale la waco isku mar, tirada sugitaannadu waxay noqon lahayd 2. Sugitaannada qalabku waa qiyaas ka fiican daahitaanka marka loo eego wicitaannada qalabka sababtoo ah waxay ka tarjumayaan faa'iidooyinka isbarbar-socodsiinta.
GPT‑5.4 waxay fasirtaa sawir-qaadyada shaashadda ee isdhexgalka browser-ka waxayna la falgashaa qaybaha UI iyadoo adeegsanaysa gujin ku salaysan iskuduwayaal si ay u dirto emayllo una jadwalayso dhacdo jadwal. Fiidiyowga lama dedejin.
Isticmaalka kombiyuutarka ee GPT‑5.4 ee la hagaajiyey wuxuu ku dhisan yahay awoodaha aragti guud ee muuqaalka ee noocka oo la hagaajiyey. MMMU-Pro, oo ah tijaabo fahamka muuqaalka iyo caqliyeynta ee noocka, GPT‑5.4 waxay gaartaa heer guul oo ah 81.2% iyada oo aan la adeegsan qalab, taas oo ka fiican 79.5% ee GPT‑5.2. Aragti muuqaal oo la hagaajiyey waxay sidoo kale u tarjuntaa awoodo ka wanaagsan oo akhrinta dukumeenti ah. OmniDocBench, GPT‑5.4 oo aan lahayn dadaal caqliyeyn waxay gaartaa khalad celcelis ah (oo lagu cabbiray masaafada tafatirka ee la caadiyeeyey inta u dhaxaysa saadaasha noocka iyo xaqiiqda saxda ah) oo ah 0.109, taas oo ka fiican 0.140 ee GPT‑5.2.
MMMUPro waxaa lagu socodsiiyey iyadoo dadaalka caqliyeynta loo dejiyey xhigh. OmniDocBench waxaa lagu socodsiiyey iyadoo dadaalka caqliyeynta loo dejiyey none, si ay uga tarjumto waxqabad qiime jaban oo daahitaan hoose leh.
Waxaan sidoo kale hagaajinaynaa fahamka muuqaalka ee sawirro cufan oo xallin sare leh halkaas oo daacadnimo buuxda muhiim tahay. Laga bilaabo GPT‑5.4, waxaan soo bandhigaynaa heerka original ee faahfaahinta gelinta(ku furmaa daaqad cusub) sawirka kaas oo taageeraya aragti daacadnimo buuxda leh ilaa 10.24M pixel guud ahaan ama 6000-pixel oo cabbirka ugu badan ah, midka hooseya ha noqdee; heerka faahfaahinta gelinta sawirka ee high hadda wuxuu taageeraa ilaa 2.56M pixel guud ahaan ama 2048-pixel oo cabbirka ugu badan ah. Tijaabooyinkii hore ee aan la samaynay isticmaalayaasha API-ga, waxaan aragnay koror xooggan oo ku yimid awoodda meelaynta, fahamka sawirka, iyo saxnaanta gujinta marka la adeegsanayo faahfaahinta original ama high.
“Qiimeynteenna cabbiraysa waxqabadka isticmaalka kombiyuutarka ee ku baahsan qiyaastii 30K portal oo HOA iyo canshuur hantiyeed ah, GPT-5.4 waxay gaartay heer guul oo 95% ah isku daygii ugu horreeyey iyo 100% saddex isku day gudahood, marka loo eego qiyaastii 73–79% noocyadii hore ee CUA (Wakiilka Isticmaalka Kombiyuutarka). Waxay sidoo kale dhamaystirtay kalfadhiyada qiyaastii 3x ka dheereeya iyadoo adeegsanaysa qiyaastii 70% token ka yar, taas oo si wax ku ool ah u hagaajinaysa isku halaynta iyo hufnaanta kharashka marka la baahiyo.”
Gudaha API-ga, horumariyayaashu waxay awoodahan ku heli karaan iyagoo adeegsanaya qalabka computer ee la cusboonaysiiyey. Fadlan eeg dukumeentiyadeenna la cusboonaysiiyey(ku furmaa daaqad cusub) si aad u aragto hababka ugu fiican ee lagu taliyey.
GPT‑5.4 waxay isku darsataa xoogga kood-samaynta ee GPT‑5.3‑Codex iyo awoodaha hoggaaminaya ee shaqada aqooneed iyo isticmaalka kombiyuutarka, kuwaas oo ugu muhiimsan hawlaha soconaya waqti dheer halkaas oo noocku isticmaali karo qalab, ku celcelin karo, una sii riixi karo shaqada faragelin gacmeed oo yar. Waxay la siman tahay ama ka sarreysaa GPT‑5.3‑Codex gudaha SWE-Bench Pro iyadoo leh daahitaan ka hooseeya dadaallada caqliyeynta kala duwan.
Waxaan ku qiyaasnaa daahitaanka annagoo eegayna habdhaqanka wax-soo-saarka ee noocyadeenna, kadibna ku dayaneyna si offline ah. Qiyaasta daahitaanku waxay xisaabisaa muddada wicitaanka qalabka (wakhtiga fulinta koodka), token-yada la muunadeeyey, iyo token-yada gelinta. Daahitaanka dunida dhabta ah aad buu u kala duwanaan karaa, wuxuuna ku xiran yahay arrimo badan oo aan lagu qaban dayashadeenna. Heerarka caqliyeynta waxaa laga kala qaaday none ilaa xhigh.
Marka la shido, habka /fast ee Codex wuxuu keenaa ilaa 1.5x xawaare token oo ka dheereeya GPT‑5.4. Waa isla noockii iyo isla garaadkii, kaliya waa ka dheereeya. Taas macnaheedu waa in isticmaalayaashu ay si dhakhso leh uga gudbi karaan hawlaha kood-samaynta, ku celcelinta, iyo khalad-bixinta iyagoo wali ku jira qulqulka shaqada. Horumariyayaashu waxay ku heli karaan GPT‑5.4 isla xawaarahan degdegga ah iyagoo adeegsanaya API-ga iyagoo isticmaalaya priority processing(ku furmaa daaqad cusub).
Qiimeyn iyo tijaabooyin gudaha ah gudaheeda waxaan ogaanay in GPT‑5.4 ay ku fiican tahay hawlaha frontend-ka ee adag, iyadoo leh natiijooyin si muuqata uga bilicsan oo ka shaqayn fiican nooc kasta oo aan horay u sii deynay.
Si loo muujiyo awoodaha la hagaajiyey ee isticmaalka kombiyuutarka iyo kood-samaynta oo wada shaqaynaya, waxaan sidoo kale sii deynaynaa xirfad tijaabo ah oo Codex ah oo la yiraahdo “Playwright (Interactive)(ku furmaa daaqad cusub)”. Tani waxay u oggolaanaysaa Codex inay si muuqaal ah u khalad-bixiso web iyo Electron apps; xitaa waxaa loo adeegsan karaa in lagu tijaabiyo app ay dhisayso inta ay dhisayso.
Ciyaar jilitaan jardiino madadaalo oo lagu sameeyey GPT‑5.4 hal weydiin oo si khafiif ah loo qeexay, iyadoo la adeegsanayo Playwright Interactive ee tijaabinta browser-ka iyo soo saarida sawirka ee agabka isometric-ka ah. Jilitaanku wuxuu ka kooban yahay dhigista waddooyin tile-ku-saleysan, dhismaha raacdooyin iyo muuqaal-beereed, helitaanka jidka martida, safaf, iyo wareegyada raacdooyinka, halka cabbirrada jardiinada sida lacagta, tirada martida, farxadda, nadaafadda, iyo qiimeyntu ay kor u kacaan ama hoos u dhacaan iyadoo ku xiran sida qaab-dhismeedku u shaqeeyo iyo sida martidu uga falceliso. Playwright waxaa loo adeegsaday otomaatigga tijaabooyinka browser-ka iyadoo la dhisayo oo la ballaarinayo jardiinada, la dhigayo lagana saarayo waddooyin iyo soojiidashooyin, la hubinayo hagidda kamaradda, iyo xaqiijinta in martida, safafka, xaaladaha raacdooyinka, iyo cabbirrada UI ay si sax ah u cusboonaysiiyeen wareegyo badan oo ciyaareed.
Weydiin: Use $playwright-interactive and $imagegen. Create an interactive isometric theme park simulation game that I can build and navigate in the browser. Use imagegen to establish the overall visual vision and generate the game’s assets, including rides, paths, terrain, trees, water, food stalls, decorations, buildings, icons, and UI illustrations. The world should feel cohesive, polished, and visually rich, with a premium art direction that works well from an isometric perspective. Let me place and remove paths, add attractions, position scenery, and move around the park smoothly while monitoring guest activity, ride status, and park growth. Include believable guest movement, simple park management systems like money, cleanliness, queueing, and happiness, and make the experience feel playful, clear, and complete rather than like a rough prototype. Prioritize charm, readability, and strong game feel over realism.
When play testing, be sure to build and expand a park through several rounds of play, verify that placement and navigation work smoothly, confirm that guests react to the park layout and attractions, and ensure the visuals, UI, and interactions feel stable and cohesive.
“GPT-5.4 hadda waa hoggaamiyaha bartilmaameedyadeenna gudaha. Injineeradayadu waxay u arkaan inay ka dabiici badan tahay kana go'aan badan tahay noocyadii hore. Waxay ka shaqaysaa dhibaatooyinka aan caddayn iyadoo aan nafteeda ka laba-labayn, waxayna si firfircoon uga shaqaysaa isbarbar-socodsiinta shaqada si wax walba u socdaan.”
Iyadoo la adeegsanayo GPT‑5.4, waxaan si weyn u hagaajinnay sida noocyadu ula shaqeeyaan qalabka dibadda. Wakiilladu hadda waxay ka shaqayn karaan nidaamyo qalab oo waaweyn, waxay si la isku halayn karo u dooran karaan qalabka saxda ah, waxayna dhammaystiri karaan socod-shaqooyin tallaabo-badan leh kharash iyo daahitaan hooseeya.
Gudaha API-ga, GPT‑5.4 waxay soo bandhigaysaa raadinta qalabka(ku furmaa daaqad cusub), taas oo u oggolaanaysa noocyada inay si hufan u shaqeeyaan marka la siiyo qalab badan.
Markii hore, marka nooc la siiyo qalab, dhammaan qeexitaannada qalabka waxaa si toos ah loogu dari jiray weydiinta. Nidaamyada leh qalab badan, tani waxay ku dari kartay kumannaan—ama xitaa tobannaan kun—oo token codsi kasta ah, taas oo kordhinaysay kharashka, gaabinaysay jawaabaha, oo macnaha ku buuxinaysay xog laga yaabo in noocku waligiisba uusan adeegsan.
Iyadoo la adeegsanayo raadinta qalabka, GPT‑5.4 halkii waxay heshaa liis fudud oo qalabka la heli karo ah oo ay la socoto awood raadinta qalabka. Marka noocku u baahdo inuu isticmaalo qalab, wuxuu raadin karaa qeexitaanka qalabkaas wuxuuna ku dari karaa wada-sheekaysiga wakhtigaas.
Habkani wuxuu si weyn u yareeyaa tirada token-yada looga baahan yahay socod-shaqooyinka qalab-badan wuxuuna ilaaliyaa kaydka, taas oo codsiyada ka dhigaysa kuwo ka dheereeya kana jaban. Waxay sidoo kale awood u siinaysaa wakiillada inay si la isku halayn karo ula shaqeeyaan nidaamyo qalab oo aad uga waaweyn. MCP servers-ka laga yaabo inay ka koobnaadaan tobannaan kun oo token oo qeexitaanno qalab ah, kororka hufnaantu aad buu u weynaan karaa.
Si loo muujiyo kororka hufnaanta, waxaan qiimeynay 250 hawlood oo ka socda benchmark-ga MCP Atlas(ku furmaa daaqad cusub) ee Scale iyadoo dhammaan 36 MCP servers lagu shiday laba hab: (1) in shaqo kasta oo MCP ah si toos ah loogu muujiyo macnaha noocka, iyo (2) in dhammaan MCP servers laga dhigo kuwo ku jira gadaasha raadinta qalabka. Habaynta raadinta qalabku waxay hoos u dhigtay guud ahaan isticmaalka token-ka 47% iyadoo la gaadhay isla saxnaantii.
Tirinta token-ka tusaalaha ah waxay ka timaaddaa celceliska 250 hawlood ee xogta dadweynaha ee MCP-Atlas.
GPT‑5.4 sidoo kale waxay hagaajisaa wicitaanka qalabka, iyadoo ka dhigaysa mid ka saxan oo hufan marka la go'aaminayo goorta iyo sida loo isticmaalo qalabka inta lagu jiro caqliyeynta, gaar ahaan gudaha API-ga. Marka loo eego GPT‑5.2, waxay gaartaa saxnaan sare wareegyo yar gudaheed gudaha Toolathlon, oo ah benchmark tijaabiya sida wanaagsan ee wakiillada AI ay u isticmaali karaan qalab iyo API-yo dunida dhabta ah si ay u dhammaystiraan hawlo tallaabo-badan leh. Tusaale ahaan, wakiilku wuxuu u baahan yahay inuu akhriyo emayllo, ka soo saaro lifaaqyada shaqooyinka, geliyo, qiimeeyo, kuna diiwaangeliyo natiijooyinka xaashi xogeed.
Sugitaanka qalabku waa marka kaaliyehu joogsado si uu u sugo jawaabaha qalabka. Haddii 3 qalab la waco isku mar, kadibna 3 kale la waco isku mar, tirada sugitaannadu waxay noqon lahayd 2. Sugitaannada qalabku waa qiyaas ka fiican daahitaanka marka loo eego wicitaannada qalabka sababtoo ah waxay ka tarjumayaan faa'iidooyinka isbarbar-socodsiinta.
Adeegsiyada xasaasiga u ah daahitaanka halka la doorbido dadaalka caqliyeynta None, GPT‑5.4 waxay sii wanaajisaa kuwii ka horreeyey.
Gudaha τ2-bench(ku furmaa daaqad cusub), noocku waa inuu isticmaalaa qalab si uu u fuliyo hawl adeeg macmiil, halkaas oo laga yaabo inuu jiro isticmaale la dayday oo la xiriiri kara kana fal geli kara xaaladda dunida. Dadaalka caqliyeynta waxaa loo dejiyey None.
GPT‑5.4 way kaga fiican tahay raadinta webka ee wakiilnimada. BrowseComp, oo ah cabbirka sida wanaagsan ee wakiillada AI ay si adkaysi leh ugu dhex baari karaan webka si ay u helaan xog adag in la helo, GPT‑5.4 waxay ka booday GPT‑5.2 qiyaas 17%abs, halka GPT‑5.4 Pro ay dhigto heer cusub oo ugu sarreeya oo ah 89.3%.
Ficil ahaan, tani waxay ka dhigan tahay in GPT‑5.4 Thinking ay ka xooggan tahay ka jawaabista su'aalaha u baahan in la isu geeyo xog ka timid ilo badan oo webka ku yaal. Waxay si ka adkaysi badan u raadin kartaa wareegyo badan si ay u aqoonsato ilaha ugu habboon, gaar ahaan su'aalaha “needle-in-a-haystack”, kadibna ugu soo ururiso jawaab cad oo si fiican loo caqliyeeyey.
BrowseComp gudaheeda, waxaan adeegsanay blocklist raadis ah oo ka saaraysa bogagga ay ku jiraan jawaabaha benchmark-ga qiimeynta si looga hortago wasakheyn loona hubiyo cabbir waxqabad oo caddaalad ah. GPT‑5.4 waxaa la cabbiray taariikh ka dambeysa GPT‑5.2, sidaas darteed dhibcuhu waxay ka tarjumayaan isbeddelada nooca, nidaamkeenna raadinta, iyo xaaladda internetka. GPT‑5.4 waxaa lagu tijaabiyey blocklist dheer oo la cusboonaysiiyey. Noocyadu waxay adeegsadaan qalabka raadinta ChatGPT, kaas oo yeelan kara kala duwanaansho yar marka loo eego raadinta API.
“GPT-5.4 xhigh waa heerka ugu sarreeya ee cusub ee adeegsiga qalabka tallaabo-badan. Zapier waxay waddaa qaar ka mid ah tijaabooyinka ugu adag ee adeegsiga qalabka ee warshadda, iyadoo ku tijaabinaysa noocyada boqolaal socod-shaqo oo horumarsan oo nolosha dhabta ah ah. GPT-5.4 waxay dhammaysay shaqadii ay noocyadii hore ka tanaasuleen - waana nooca ugu adkaysiga badan ilaa maanta.”
Si la mid ah sida Codex u dulmaro habkeeda marka ay bilowdo shaqada, GPT‑5.4 Thinking gudaha ChatGPT hadda waxay dulmaraysaa shaqadeeda iyadoo leh hordhac marka weydiimuhu dheer yihiin oo adag yihiin. Waxa kale oo aad ku dari kartaa tilmaamo ama aad ku hagaajin kartaa jihadeeda bartamaha jawaabta. Tani waxay fududaynaysaa in noocka loo hago natiijada saxda ah ee aad rabto adigoon dib u bilaabin ama u baahnayn wareegyo badan oo dheeraad ah. Awooddani hadda waxay ka jirtaa chatgpt.com(ku furmaa daaqad cusub) iyo app-ka Android-ka, waxayna dhowaan imanaysaa app-ka iOS.
Noocku waxa kale oo uu sii feker karaa muddo dheer hawlaha adag isagoo ilaalinaya wacyi xooggan oo ku saabsan tallaabooyinkii hore ee wada-sheekaysiga. Tani waxay u oggolaanaysaa inuu maareeyo socod-shaqooyin dhaadheer iyo weydiimo aad u adag isaga oo jawaabaha ka dhigaya kuwo isku xiran oo khuseeya dhammaan inta lagu jiro.
Fiidiyowgan waxaa loo dedejiyey ujeeddooyin tusaalayn ah.
Bilihii u dambeeyey, waxaan sii wadnay hagaajinta ilaalintii aan ku soo bandhignay GPT‑5.3‑Codex annagoo GPT‑5.4 u diyaarinayna sii-deyn. Si la mid ah GPT‑5.3‑Codex, waxaan GPT‑5.4 ula dhaqmeynaa sidii awood cyber oo Heer Sare ah marka loo eego Qaab-dhismeedka Diyaar-garowga, waxaana ku sii deynaynaa ilaalinta u dhiganta sida lagu diiwaangeliyey kaarka siistamka(ku furmaa daaqad cusub). Kuwaas waxaa ka mid ah xidhmo badbaado cyber oo la ballaariyey, oo ay ku jiraan nidaamyo kormeer, xakamaynta gelitaanka la aamini karo, iyo xannibaad aan isku mar ahayn oo codsiyada khatarta sare leh loogu sameeyo macaamiisha ku jira dushooyinka xog haynta eberka ah (ZDR), iyadoo ay barbar socoto maalgashi joogto ah oo lagu sameeyo nidaamka amniga ee ballaaran.
Sababtoo ah awoodaha amniga internetku si dabiici ah waa laba-isticmaal, waxaan ilaalinaynaa hab taxaddar leh oo sii-deyn ah annagoo sii wadna hagaajinta siyaasadaha iyo kala-soocayaashayada. Macaamiil gaar ah oo ku jira dushooyinka ZDR, xannibaadda heer-codsi wali waa qayb ka mid ah xidhmadeenna yareynta khatarta cyber; maadaama kala-soocayaashu wali hagaagayaan, qaar ka mid ah calaamadayn been ah ayaa dhici karta inta aan sii sifeeyneyno ilaalintan. Cusboonaysiintan waxay ujeedadeedu tahay in la hagaajiyo sida ilaalintu u shaqeyso ficil ahaan, oo ay ku jiraan dhimista diidmooyinka aan loo baahnayn iyo jawaabaha aadka loo taxaddaray, iyadoo la ilaalinayo difaac adag oo ka dhan ah si xun u isticmaalka.
Waxaan sii wadnay cilmi-baaristeenna badbaadada ee la socodsiinta Chain-of-Thought (CoT) si aan si fiican u fahamno sida noocyadu u caqliyeeyaan ugana caawino ogaanshaha anshax-xumo suurtagal ah. Iyada oo qayb ka ah shaqadan, waxaan soo bandhigaynaa qiimeyn cusub oo il-furan ah, xakamaynta CoT, oo cabbiraysa in noocyadu si ula kac ah u qarin karaan caqliyeyntooda si ay uga baxsadaan kormeerka. Waxaan ogaanay in awoodda GPT‑5.4 Thinking ee xakamaynta CoT-geedu ay hooseyso, taas oo ah sifo wanaagsan oo badbaado, isla markaana soo jeedinaysa in noocku aanu lahayn awood uu ku qariyo caqliyeyntiisa iyo in kormeerka CoT uu wali yahay qalab badbaado oo wax ku ool ah.
GPT‑5.4 maanta si tartiib tartiib ah ayaa loogu soo bandhigayaa ChatGPT iyo Codex. Gudaha API-ga, GPT‑5.4 hadda waa la heli karaa sida gpt-5.4. GPT‑5.4 Pro sidoo kale gudaha API-ga waa laga heli karaa sida gpt-5.4-pro horumariyayaasha u baahan waxqabadka ugu badan ee hawlaha ugu adag.
Gudaha ChatGPT, GPT‑5.4 Thinking maanta laga bilaabo waxay u diyaar tahay isticmaalayaasha ChatGPT Plus, Team, iyo Pro, iyadoo beddelaysa GPT‑5.2 Thinking. GPT‑5.2 Thinking waxay sii ahaan doontaa mid la heli karo saddex bilood isticmaalayaasha lacagta bixiya ee model picker-ka qaybta Legacy Models, ka dibna waxaa la joojin doonaa Juun 5, 2026. Kuwa ku jira qorshooyinka Enterprise iyo Edu waxay awoodsiin karaan gelitaan hore iyagoo adeegsanaya dejimaha maamulka. GPT‑5.4 Pro waxay diyaar u tahay qorshooyinka Pro iyo Enterprise. Daaqadaha macnaha(ku furmaa daaqad cusub) gudaha ChatGPT ee GPT‑5.4 Thinking wax isbeddel ah kama gelin GPT‑5.2 Thinking.
GPT‑5.4 waa noockeenna koowaad ee caqliyeynta mainline ee ku dara awoodaha kood-samaynta ugu casriyeysan ee GPT‑5.3‑codex isla markaana loogu soo bandhigayo ChatGPT, API-ga iyo Codex. Waxaan ugu yeeraynaa GPT‑5.4 si ay uga tarjumto boodkaas, iyo si loo fududeeyo doorashada u dhexeysa noocyada marka la adeegsanayo Codex. Wakhti ka dib, waxaad filan kartaa in noocyadeenna Instant iyo Thinking ay ku horumaraan xawaare kala duwan.
GPT‑5.4 gudaha Codex waxay ku jirtaa taageero tijaabo ah oo daaqadda macnaha 1M ah. Horumariyayaashu waxay tan tijaabin karaan iyagoo habaynaya model_context_window iyo model_auto_compact_token_limit. Codsiyada dhaafa daaqadda caadiga ah ee macnaha 272K waxaa lagu xisaabinayaa xadadka isticmaalka iyadoo lagu qaadayo 2x heerka caadiga ah.
Gudaha API-ga, GPT‑5.4 qiimaheeda token-kiiba waa ka sarreeyaa GPT‑5.2 si ay uga tarjumto awoodaheeda la hagaajiyey, halka hufnaanteeda weyn ee token-ku ay ka caawiso dhimista tirada guud ee token-yada looga baahan yahay hawlo badan. Qiimaha Batch iyo Flex waxaa lagu heli karaa nuska heerka caadiga ah ee API-ga, halka priority processing lagu heli karo laba jibbaar heerka caadiga ah ee API-ga.
Nooca API | Qiimaha gelinta | Qiimaha gelinta kaydsan | Qiimaha soo-saarka |
gpt-5.2 | $1.75 / M tokens | $0.175 / M tokens | $14 / M tokens |
gpt-5.4 | $2.50 / M tokens | $0.25 / M tokens | $15 / M tokens |
gpt-5.2-pro | $21 / M tokens | - | $168 / M tokens |
gpt-5.4-pro | $30 / M tokens | - | $180 / M tokens |
Xirfadeed
Qiimeyn | GPT‑5.4 | GPT‑5.4 | GPT‑5.3-Codex | GPT‑5.2 | GPT‑5.2 |
GDPval | 83.0% | 82.0% | 70.9% | 70.9% | 74.1% |
FinanceAgent v1.1 | 56.0% | 61.5% | 54.0% | 59.5% | — |
Hawlaha Noocaynta Bangiyada Maalgashiga (Gudaha) | 87.3% | 83.6% | 79.3% | 68.4% | 71.7% |
OfficeQA | 68.1% | — | 65.1% | 63.1% | — |
Kood-samaynta
Qiimeyn | GPT‑5.4 | GPT‑5.4 | GPT‑5.3-Codex | GPT‑5.2 | GPT‑5.2 |
SWE-Bench Pro (Dadweyne) | 57.7% | — | 56.8% | 55.6% | — |
Terminal-Bench 2.0 | 75.1% | — | 77.3% | 62.2% | — |
Isticmaalka kombiyuutarka iyo aragga
Qiimeyn | GPT‑5.4 | GPT‑5.4 | GPT‑5.3-Codex | GPT‑5.2 | GPT‑5.2 |
OSWorld-Verified | 75.0% | — | 74.0% | 47.3% | — |
MMMU Pro (qalab la'aan) | 81.2% | — | — | 79.5% | — |
MMMU Pro (qalab leh) | 82.1% | — | — | 80.4% | — |
Adeegsiga qalabka
Qiimeyn | GPT‑5.4 | GPT‑5.4 | GPT‑5.3-Codex | GPT‑5.2 | GPT‑5.2 |
BrowseComp | 82.7% | 89.3% | 77.3% | 65.8% | 77.9% |
MCP Atlas | 67.2% | — | — | 60.6% | — |
Toolathlon | 54.6% | — | 51.9% | 45.7% | — |
Tau2-bench Telecom | 98.9% | — | — | 98.7% | — |
Tacliimeed
Qiimeyn | GPT‑5.4 | GPT‑5.4 | GPT‑5.3-Codex | GPT‑5.2 | GPT‑5.2 |
Cilmi-baarista Sayniska ee ugu casriyeysan | 33.0% | 36.7% | — | 25.2% | — |
FrontierMath Tier 1–3 | 47.6% | 50.0% | — | 40.7% | — |
FrontierMath Tier 4 | 27.1% | 38.0% | — | 18.8% | 31.3% |
GPQA Diamond | 92.8% | 94.4% | 92.6% | 92.4% | 93.2% |
Imtixaankii Ugu Dambeeyey ee Aadanaha (qalab la'aan) | 39.8% | 42.7% | — | 34.5% | 36.6% |
Imtixaankii Ugu Dambeeyey ee Aadanaha (qalab leh) | 52.1% | 58.7% | — | 45.5% | 50.0% |
Macne dheer
Qiimeyn | GPT‑5.4 | GPT‑5.4 | GPT‑5.3-Codex | GPT‑5.2 | GPT‑5.2 |
Graphwalks BFS 0K–128K | 93.0% | — | — | 94.0% | — |
Graphwalks BFS 256K–1M | 21.4% | — | — | — | — |
Graphwalks parents 0–128K (saxnaan) | 89.8% | — | — | 89.0% | — |
Graphwalks parents 256K–1M (saxnaan) | 32.4% | — | — | — | — |
OpenAI MRCR v2 8-needle 4K–8K | 97.3% | — | — | 98.2% | — |
OpenAI MRCR v2 8-needle 8K–16K | 91.4% | — | — | 89.3% | — |
OpenAI MRCR v2 8-needle 16K–32K | 97.2% | — | — | 95.3% | — |
OpenAI MRCR v2 8-needle 32K–64K | 90.5% | — | — | 92.0% | — |
OpenAI MRCR v2 8-needle 64K–128K | 86.0% | — | — | 85.6% | — |
OpenAI MRCR v2 8-needle 128K–256K | 79.3% | — | — | 77.0% | — |
OpenAI MRCR v2 8-needle 256K–512K | 57.5% | — | — | — | — |
OpenAI MRCR v2 8-needle 512K–1M | 36.6% | — | — | — | — |
Caqliyeyn aan la taaban karin
Qiimeyn | GPT‑5.4 | GPT‑5.4 | GPT‑5.3-Codex | GPT‑5.2 | GPT‑5.2 |
ARC-AGI-1 (La xaqiijiyey) | 93.7% | 94.5% | — | 86.2% | 90.5% |
ARC-AGI-2 (La xaqiijiyey) | 73.3% | 83.3% | — | 52.9% | 54.2% (high) |
Qiimeynno aan lahayn caqliyeyn
Qiimeyn | GPT‑5.4 | GPT‑5.2 | GPT‑4.1 |
OmniDocBench (masaafada tafatirka ee la caadiyeeyey) | 0.109 | 0.140 | — |
Tau2-bench Telecom | 64.3% | 57.2% | 43.6% |
Qiimeynaha waxaa lagu socodsiiyey iyadoo dadaalka caqliyeynta loo dejiyey xhigh, marka laga reebo halka si kale loo qeexay. Benchmark-yada waxaa lagu fuliyey deegaan cilmi-baaris, kaas oo mararka qaar siin kara wax-soo-saar wax yar ka duwan wax-soo-saarka ChatGPT.
Qoraa
Qoraallada hoose
1 Waxqabadka aadanaha waxaa lagu soo sheegay OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments(ku furmaa daaqad cusub).


