25. märts 2025

Tutvustame 4o pildigenereerimist

Kasuliku ja väärtusliku pildigeneratsiooni avamine looduslikult multimodaalse mudeli abil, mis suudab anda täpseid, korrektseid ja fotorealistlikke väljundeid.

Proovi ChatGPT-s

Laadimine…

OpenAI-s oleme juba ammu uskunud, et piltide genereerimine peaks olema meie keelemudelite peamine võimekus. Seepärast oleme GPT‑4o‑sse ehitanud oma seni kõige arenenuma pildigeneraatori. Tulemus—piltide genereerimine, mis pole mitte ainult ilus, vaid ka kasulik.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Kasulik piltide genereerimine

Alates esimestest koopamaalidest kuni tänapäevaste infograafikuteni on inimesed kasutanud visuaalseid kujundeid suhtlemiseks, veenmiseks ja analüüsimiseks – mitte ainult kaunistamiseks. Tänapäeva generatiivsed mudelid võivad esile kutsuda sürrealistlikke, hingematvaid stseene, kuid neil on raskusi igapäevaste kujunditega, mida inimesed kasutavad teabe jagamiseks ja loomiseks. Logodest diagrammideni – pildid võivad edastada täpset tähendust, kui neid täiendada sümbolitega, mis viitavad jagatud keelele ja kogemusele.

GPT‑4o piltide genereerimine paistab silma teksti täpse renderdamise, viipade täpse järgimise ning GPT‑4o loomupärase teadmistebaasi ja vestluskonteksti kasutamisega – sealhulgas üleslaaditud piltide teisendamisel või nende kasutamisel visuaalse inspiratsioonina. Need võimalused muudavad täpselt sellise pildi loomise lihtsamaks, nagu sa ette kujutad, aidates sul visuaalide kaudu tõhusamalt suhelda ja arendades piltide genereerimist praktiliseks tööriistaks, mis on täpne ja võimas.

Paranenud võimekused

Me koolitasime oma mudeleid veebipiltide ja tekstide ühise jaotuse alusel, õppides mitte ainult seda, kuidas pildid on keelega seotud, vaid ka seda, kuidas need omavahel seotud on. Koos intensiive järelkoolitusega on saadud mudelil üllatav visuaalne sujuvus, mis on võimeline koostama pilte, mis on kasulikud, järjepidevad ja kontekstitundlikud.

Teksti renderdamine

Pilt on väärt tuhat sõna, kuid mõnikord võib mõne sõna õigesse kohta koostamine tõsta pildi tähendust. 4o võime täpseid sümboleid piltidega siduda muudab piltide genereerimise visuaalseks suhtlusvahendiks.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Mitmeastmeline genereerimine

Kuna piltide genereerimine on nüüd GPT‑4o loomulik osa, saad pilte täiustada loomuliku vestluse kaudu. GPT‑4o võib tugineda vestluse kontekstis olevatele piltidele ja tekstile, tagades kogu järjepidevuse. Näiteks kui kujundate videomängu tähemärki, jääb tähemärgi välimus täpsustamise ja katsetamise ajal mitmes iteratsioonis ühtseks.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Juhendi järgimine

GPT‑4o piltide genereerimine järgib üksikasjalikke viipide ja pöörab detailidele suurt tähelepanu. Kui teised süsteemid on hädas ~5–8 objektiga, siis GPT‑4o saab hakkama kuni 10–20 erineva objektiga. Objektide tihedam sidumine nende omaduste ja suhetega võimaldab paremat kontrolli.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Kontekstiõpe

GPT‑4o saab analüüsida ja õppida kasutaja laetud üles piltidest, integreerides nende üksikasjad sujuvalt oma konteksti, et toetada piltide genereerimist.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Maailmateadmine

Looduslik pildigeneratsioon lubab 4o-l linkida oma teadmine teksti ja piltide vahel, mille tulemuseks on mudel, mis tundub nutikam ja tõhusam.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Fotorealism ja stiil

Erinevaid pildistiile kajastavate piltide alusel toimuv koolitus võimaldab mudelil luua või muuta pilte veenvalt.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Piirangud

Meie mudel ei ole täiuslik. Oleme teadlikud mitmest piirangust, mida hetkel esineb, ja tegeleme nende lahendamisega mudeli täiustamise kaudu pärast esialgset turuletoomist.

Oleme märganud, et GPT‑4o kärbib aeg-ajalt pikemaid pilte, näiteks plakateid, liiga tihedalt, eriti alumise osa lähedal.

Ohutus

Kooskõlas meie mudeli spetsifikatsiooniga püüame maksimeerida loomingulist vabadust, toetades väärtuslikke kasutusjuhtumeid, nagu mängude arendamine, ajaloo uurimine ja haridus, säilitades samal ajal tugevad ohutusstandardid. Samal ajal on sama oluline kui kunagi varem blokeerida taotlused, mis rikuvad neid standardeid. Allpool on hinnangud täiendavatele riskivaldkondadele, kus töötame selle nimel, et lubada turvalist ja kõrge kasutusväärtusega sisu ning pakkuda kasutajate laiemat loomingulist väljendust tugina.

Päritolu C2PA ja sisemise pöörduva otsingu kaudu
Kõigil koostatud piltidel on kaasas C2PA metaandmed, mis tuvastavad pildi GPT‑4o‑st pärinevana, et tagada läbipaistvus. Oleme loonud ka sisemise otsingutööriista, mis kasutab generatsioonide tehnilisi atribuute, et kinnita, kas sisu pärineb meie mudelist.

Halbade asjade blokeerimine
Jätkame koostatud piltide taotluste blokeerimist, mis võivad rikkuda meie sisupõhimõtteid, näiteks laste seksuaalse kuritarvitamise materjalid ja seksuaalsed süvavõltsingud. Kui päris inimeste piltide pildid on kontekstis, kehtestame rangemad piirangud loodavate piltide tüüpidele, eriti rangete kaitsemeetmetega alastuse ja graafilise vägivalla osas. Nagu iga käivitamise puhul, ei ole ohutus kunagi lõppenud ja on pigem pidev investeerimisvaldkond. Kui me selle mudeli tegeliku kasutamise kohta rohkem teada saame, kohandame oma poliitikaid vastavalt.

Lisateabe saamiseks meie lähenemisviisi kohta külastage GPT‑4o süsteemikaardi piltide genereerimise lisandit⁠.

Mõtlemise kasutamine ohutuse tagamiseks
Sarnaselt meie arutlevale joondamise⁠ tööle oleme koolitanud arutleva LLM-i töötama otse inimese kirjutatud ja tõlgendatavate ohutusspetsifikatsioonide põhjal. Kasutasime seda arutlusvõimekust LLM-i arendamise ajal, et aidata meil tuvastada ja lahendada ebaselgusi meie põhimõtetes. Koos meie multimodaalsete edusammude ja olemasolevate ChatGPT ja Sora jaoks välja töötatud ohutustehnikatega võimaldab see meil mõõdukas⁠ nii sisendteksti kui ka väljundpilte meie põhimõtete alusel.

Juurdepääs ja kättesaadavus

4. pildi genereerimine jõuab alates tänasest ChatGPT vaikimisi pildigeneraatorina Plus, Pro, Team ja Free kasutajatele, juurdepääs on peagi saadaval Enterprise ja Edu kasutajatele. Seda saab kasutada ka Soras. Neile, kelle südames on DALL·E jaoks eriline koht, pääseb sellele endiselt ligi spetsiaalse DALL·E GPT kaudu.

Arendajad saavad peagi API kaudu GPT‑4o abil pilte koostada, juurdepääs hakkab järgmise paari nädala jooksul laienema.

Piltide loomine ja kohandamine on sama lihtne kui GPT‑4o abil vestlus – kirjelda lihtsalt, mida vajad, sealhulgas eripärad nagu kuvasuhe, täpsed värvid hex-koodidega või läbipaistev taust. Kuna see mudel loob detailsemaid pilte, võtab piltide renderdamine kauem aega, sageli kuni üks minut.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Livestream kordus

Autor

OpenAI

Juhtimine

Gabriel Goh: pildi genereerimine

Jackie Shannon: ChatGPT toode

Mengchao Zhong, Wayne Chang: ChatGPT inseneritöö

Rohan Sahai: Sora tootejuhtimine ja insenerlus

Brendan Quinn, Tomer Kaftan: tuletamine

Prafulla Dhariwal: multimodaalne organisatsioon

Teadustöö

Alusuuringud

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Põhiuuringud

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Teadustöö panustajad

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Mudeli käitumine

Laurentia Romaniuk

Multimodaalne organisatsioon

Andrew Gibiansky, Yang Lu

Andmed

Andmejuhid

Gildas Chabot, James Park Lennon

Andmed

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Moderaatorid

Hazel Byrne, Jennifer Luckenbill, Mariano López

Human Data Advisors

Long Ouyang

Skaleerimine

Tuletamise juhid

Brendan Quinn, Tomer Kaftan

Tuletamine

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

Kohaldatud

ChatGPT tootejuht

Jackie Shannon

ChatGPT juhtivad insenerid

Mengchao Zhong, Wayne Chang

Tootedisaini juht

Matt Chan

Andmeteadus

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

Sora tootejuhid

Rohan Sahai, Wesam Manassra

Sora tootejuhtimine ja insenerlus

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

Ohutus

Ohutusjuht

Somay Jain

Ohutus

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Strateegia

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

Turundus ja kommunikatsioon

Kommunikatsiooni- ja turundusjuhid

Minnia Feng, Natalie Summers, Taya Christianson

Kommunikatsioon

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

Disain ja Loovus

Juhid

Kendra Rimbach, Veit Moeller

Disain

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

Suur tänu

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco