2025. gada 25. marts

Iepazīstinām ar 4o Image Generation

Atbloķējot noderīgu un vērtīgu attēlu ģenerēšanu ar dabiski multimodālu modeli, kas spēj nodrošināt precīzus, akurātus, fotoreālistiskus rezultātus.

Izmēģini ChatGPT

Notiek ielāde…

OpenAI mēs jau sen uzskatām, ka attēlu ģenerēšanai vajadzētu būt mūsu valodu modeļu galvenajai spējai. Tāpēc mēs esam radījuši mūsu līdz šim modernāko attēlu ģeneratoru GPT‑4o. Rezultāts — ne tikai skaistu, bet arī noderīgu attēlu ģenerēšana.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Noderīga attēlu ģenerēšana

No pirmajiem alu gleznojumiem līdz mūsdienu infografikām cilvēki ir izmantojuši vizuālos tēlus, lai sazinātos, pārliecinātu viens otru un analizētu, nevis tikai dekorācijai. Mūsdienu ģeneratīvie modeļi var uzburt sirreālas, elpu aizraujošas ainas, bet tiem ir grūtības ar ikdienišķiem attēliem, ko cilvēki izmanto, lai dalītos un radītu informāciju. No logotipiem līdz diagrammām attēli var nodot precīzu nozīmi, ja tos papildina simboli, kas attiecas uz kopīgu valodu un pieredzi.

GPT‑4o attēlu ģenerēšana izceļas ar precīzu teksta attēlošanu, precīzi sekojot uzvednēm un izmantojot 4o raksturīgo zināšanu bāzi un tērzēšanas kontekstu, tostarp pārveidojot augšupielādētos attēlus vai izmantojot tos kā vizuālu iedvesmu. Šīs iespējas atvieglo tieši tāda attēla izveidi, kādu tu iedomājies, palīdzot efektīvāk sazināties, izmantojot vizuālos elementus, un padarot attēlu ģenerēšanu par praktisku rīku ar precizitāti un jaudu.

Uzlabotas iespējas

Mēs apmācījām savus modeļus, izmantojot tiešsaistes attēlu un teksta kopīgo izplatību, apgūstot ne tikai to, kā attēli saistās ar valodu, bet arī to, kā tie saistās savā starpā. Apvienojumā ar intensīvu pēcapmācību iegūtajam modelim ir pārsteidzošs vizuālais plūdums, kas spēj ģenerēt noderīgus, konsekventus un kontekstam atbilstošus attēlus.

Teksta atveide

Attēls ir tūkstoš vārdu vērts, bet dažkārt dažu vārdu ģenerēšana pareizajā vietā var paaugstināt attēla nozīmi. 4o spēja apvienot precīzus simbolus ar attēliem pārvērš attēlu ģenerēšanu par vizuālās komunikācijas līdzekli.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Vairāku soļu ģenerēšana

Tā kā attēlu ģenerēšana tagad ir GPT‑4o pamata funkcija, tu vari uzlabot attēlus, izmantojot dabiskas sarunas. GPT‑4o var balstīties uz attēliem un tekstu tērzēšanas kontekstā, nodrošinot konsekvenci it visā. Piemēram, ja tu veido videospēles tēlu, tā izskats paliek saskaņots vairākās iterācijās, kamēr tu to pilnveido un izmēģini.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Norādījumu ievērošana

GPT‑4o attēlu ģenerēšana seko detalizētām uzvednēm, pievēršot uzmanību sīkumiem. Kamēr citas sistēmas cīnās ar ~5–8 objektiem, GPT‑4o var apstrādāt līdz pat 10–20 dažādiem objektiem. Ciešāka objektu saistīšana ar to īpašībām un savstarpējo sasaisti ļauj labāk kontrolēt.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Mācīšanās kontekstā

GPT‑4o var analizēt un mācīties no lietotāju augšupielādētajiem attēliem, nemanāmi integrējot to detaļas savā kontekstā, lai veicinātu attēlu ģenerēšanu.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Pasaules zināšanas

Dzimtā attēlu ģenerēšana iespējo 4o saistīt savas zināšanas starp tekstu un attēliem, radot modeli, kas šķiet gudrāks un efektīvāks.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Fotoreālisms un stils

Mācības ar attēliem, kas atspoguļo plašu attēlu stilu klāstu, ļauj modelim pārliecinoši izveidot vai pārveidot attēlus.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Ierobežojumi

Mūsu modelis nav perfekts. Pašlaik mēs zinām par vairākiem ierobežojumiem, kurus centīsimies novērst, uzlabojot modeli pēc tā sākotnējās palaišanas.

Esam pamanījuši, ka GPT‑4o dažkārt var pārāk cieši apgriezt garākus attēlus, piemēram, plakātus, īpaši apakšā.

Drošība

Saskaņā ar mūsu modeļa specifikāciju mēs cenšamies maksimāli palielināt radošo brīvību, atbalstot vērtīgus lietošanas gadījumus, piemēram, spēļu izstrādi, vēsturisko izpēti un izglītību, vienlaikus saglabājot stingrus drošības standartus. Tajā pašā laikā aizvien ir tikpat svarīgi kā vienmēr bloķēt pieprasījumus, kas pārkāpj šos standartus. Tālāk ir sniegti papildu riska jomu izvērtējumi, kurās mēs strādājam, lai iespējotu drošu, augstas lietderības saturu un atbalstītu plašāku lietotāju radošo izpausmi.

Izcelsme, izmantojot C2PA un iekšējo atgriezenisko meklēšanu
Lai nodrošinātu pārredzamību, visiem ģenerētajiem attēliem ir pievienoti C2PA⁠ metadati, kas norāda, ka attēls ir GPT‑4o ģenerēts. Mēs esam izveidojuši arī iekšējo meklēšanas rīku, kas izmanto tehniskos atribūtus, lai palīdzētu verificēt, vai saturs ir mūsu modeļa radīts.

Bloķējam kaitīgo saturu
Mēs turpinām bloķēt pieprasījumus pēc ģenerētiem attēliem, kas var pārkāpt mūsu satura politiku, piemēram, bērnu seksuālas izmantošanas materiālus un seksuālus dziļviltojumus. Kad attēlos ir redzami reāli cilvēki, mums ir pastiprināti ierobežojumi attiecībā uz to, kāda veida attēlus var radīt, ar īpaši stingriem drošības pasākumiem attiecībā uz kailumu un grafisku vardarbību. Tāpat kā palaižot jebkuru jaunu programmu, drošība nekad nav galīga, bet drīzāk nepārtrauktu ieguldījumu joma. Kad mēs uzzināsim vairāk par šī modeļa izmantošanu reālajā pasaulē, mēs attiecīgi pielāgosim savas politikas.

Lai uzzinātu vairāk par mūsu pieeju, apmeklē GPT‑4o sistēmas kartes attēlu ģenerēšanas pielikumu⁠.

Izmantojot argumentāciju drošības garantēšanai
Līdzīgi kā mūsu apzinātās saskaņošanas⁠ darbā, mēs esam apmācījuši argumentācijas LLM strādāt tieši no cilvēka rakstītām un interpretējamām drošības specifikācijām. Mēs izmantojām šo argumentāciju LLM izstrādes laikā, lai tā palīdzētu mums identificēt un risināt neskaidrības mūsu politikās. Līdztekus mūsu multimodālajiem sasniegumiem un esošajām drošības metodēm, kas izstrādātas ChatGPT un Sora, tas ļauj mums moderēt⁠ gan ievades tekstu, gan attēlu rezultātus atbilstoši mūsu politikām.

Piekļuve un pieejamība

4o attēlu ģenerēšana sākot no šodienas tiks ieviesta Plus, Pro, Team un Free lietotājiem kā noklusējuma attēlu ģenerators ChatGPT, un drīz būs pieejama piekļuve Enterprise un Edu. Tā ir pieejams arī lietošanai Sora. Tiem, kuru sirdīs DALL·E ieņem īpašu vietu, tai joprojām var piekļūt, izmantojot speciālu DALL·E GPT.

Izstrādātāji drīz varēs ģenerēt attēlus ar GPT‑4o, izmantojot API, un piekļuve tiks nodrošināta nākamo nedēļu laikā.

Attēlu izveide un pielāgošana ir tikpat vienkārša kā tērzēšana, izmantojot GPT‑4o — vienkārši apraksti nepieciešamo, tostarp jebkādas specifikācijas, piemēram, malu attiecību, precīzas krāsas, izmantojot heksadecimālos kodus, vai caurspīdīgu fonu. Tā kā šis modelis izveido detalizētākus attēlus, attēlu atveide aizņem ilgāku laiku, bieži vien līdz vienai minūtei.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Tiešraides atkārtojums

Autors

OpenAI

Vadība

Gabriel Goh: Attēlu ģenerēšana

Jackie Shannon: ChatGPT produkts

Mengchao Zhong, Wayne Chang: ChatGPT inženierija

Rohan Sahai: Sora produkts un inženierija

Brendan Quinn, Tomer Kaftan: Secinājumi

Prafulla Dhariwal: Multimodāla organizācija

Izpēte

Fundamentālie pētījumi

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Pamatpētījumi

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Izpētes līdzstrādnieki

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Modeļa uzvedība

Laurentia Romaniuk

Multimodāla organizācija

Andrew Gibiansky, Yang Lu

Dati

Datu vadība

Gildas Chabot, James Park Lennon

Dati

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Moderatori

Hazel Byrne, Jennifer Luckenbill, Mariano López

Cilvēkresursu konsultanti

Long Ouyang

Mērogošana

Izvedumu vadība

Brendan Quinn, Tomer Kaftan

Izvedumi

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

Lietišķo

ChatGPT produktu vadītājs

Jackie Shannon

ChatGPT inženierijas vadītāji

Mengchao Zhong, Wayne Chang

Produktu dizaina vadītājs

Matt Chan

Datu zinātne

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

Sora produktu vadītāji

Rohan Sahai, Wesam Manassra

Sora produkti un inženierija

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

Drošība

Drošības vadītājs

Somay Jain

Drošība

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Stratēģija

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

Mārketings un komunikācija

Komunikācijas un mārketinga vadītāji

Minnia Feng, Natalie Summers, Taya Christianson

Komunikācija

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

Dizains un Radošums

Vadība

Kendra Rimbach, Veit Moeller

Dizains

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

Īpaša pateicība

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco