25 martie 2025

Prezentăm generarea de imagini 4o

Deblocarea generării de imagini utile și valoroase cu un model nativ multimodal capabil de ieșiri precise, exacte și fotorealiste.

Încearcă ChatGPT

Se încarcă…

La OpenAI, am crezut de mult timp că generarea de imagini ar trebui să fie o capacitate principală a modelelor noastre lingvistice. De aceea, am integrat în GPT‑4o cel mai avansat generator de imagini de până acum. Rezultatul — generarea de imagini care nu este doar frumoasă, ci și utilă.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Generarea de imagini utile

De la primele picturi rupestre până la infograficele moderne, oamenii au folosit imagini vizuale pentru a comunica, a convinge și a analiza - nu doar pentru a decora. Modelele generative de astăzi pot evoca scene suprarealiste, uluitoare, dar întâmpină dificultăți cu imaginile de bază pe care oamenii le folosesc pentru a partaja și a crea informații. De la logo-uri la diagrame, imaginile pot transmite un sens precis atunci când sunt îmbogățite cu simboluri care fac referire la un limbaj și o experiență comune.

Generarea de imagini GPT‑4o excelează la redarea precisă a textului, urmărirea exactă a solicitărilor și valorificarea bazei de cunoștințe intrinsece a 4o și a contextului din discuție – inclusiv transformarea imaginilor încărcate sau utilizarea lor ca inspirație vizuală. Aceste capabilități facilitează crearea exactă a imaginii pe care ți-o imaginezi, ajutându-te să comunici mai eficient prin imagini și să transformi generarea de imagini într-un instrument practic cu precizie și putere.

Capabilități îmbunătățite

Am antrenat modelele noastre pe distribuția comună a imaginilor și textului online, învățând nu doar cum se raportează imaginile la limbaj, ci și cum se raportează între ele. Combinat cu un antrenament post-instruire agresiv, modelul rezultat are o fluență vizuală surprinzătoare, capabil să genereze imagini utile, consecvente și conștiente de context.

Randare text

O imagine valorează o mie de cuvinte, dar uneori generarea câtorva cuvinte în locul potrivit poate amplifica semnificația unei imagini. Capacitatea 4o de a îmbina simboluri precise cu imagini transformă generarea de imagini într-un instrument de comunicare vizuală.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Generare multi-tur

Deoarece generarea de imagini este acum nativă în GPT‑4o, poți rafina imaginile prin conversație naturală. GPT‑4o poate construi pe baza imaginilor și textului în contextul discuției, asigurând consecvența pe tot parcursul. De exemplu, dacă proiectezi un caracter pentru un joc video, aspectul caracterului rămâne coerent de-a lungul mai multor iterații, pe măsură ce îl rafinezi și experimentezi cu el.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Urmărirea instrucțiunilor

Generarea de imagini de către GPT‑4o urmează solicitări detaliate, acordând atenție detaliilor. În timp ce alte sisteme întâmpină dificultăți cu aproximativ 5-8 obiecte, GPT‑4o poate gestiona până la 10-20 de obiecte diferite. Legarea mai strânsă a obiectelor de trăsăturile și relațiile lor permite un control mai bun.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Învățare în context

GPT‑4o poate analiza și învăța din imaginile încărcate de utilizatori, integrând fără probleme detaliile acestora în contextul său pentru a sprijini generarea imaginilor.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Cunoștințe despre lume

Generarea de imagini native activează 4o să facă legătura între cunoștințele sale și imagini, rezultând un model care pare mai inteligent și mai eficient.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Fotorealism și stil

Instruirea pe imagini care reflectă o varietate vastă de stiluri de imagine permite modelului să creeze sau să transforme imagini convingător.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Limitări

Modelul nostru nu este perfect. Suntem conștienți de mai multe limitări în acest moment, pe care le vom aborda prin îmbunătățiri ale modelului după lansarea inițială.

Am observat că GPT‑4o poate decupa uneori imagini mai lungi, cum ar fi posterele, prea strâns, în special în partea de jos.

Siguranță

În conformitate cu Specificațiile Modelului nostru, ne propunem să maximizăm libertatea creativă prin susținerea unor cazuri de utilizare valoroase precum dezvoltarea de jocuri, explorarea istorică și educația, menținând în același timp standarde stricte de siguranță. În același timp, rămâne la fel de important ca întotdeauna să blocăm cererile care încalcă acele standarde. Mai jos sunt evaluările unor domenii de risc suplimentare în care lucrăm pentru a activa conținut sigur și de înaltă utilitate și pentru a accepta o exprimare creativă mai largă pentru utilizatori.

Proveniență prin C2PA și căutare internă reversibilă
Toate imaginile generate sunt însoțite de metadate C2PA⁠, care vor identifica o imagine ca provenind din GPT‑4o, pentru a asigura transparență. Am construit și un instrument intern de căutare care folosește atribute tehnice ale generărilor pentru a verifica dacă un conținut provine din modelul nostru.

Blocarea lucrurilor rele
Continuăm să blocăm solicitările de imagini generate care pot încălca politicile noastre privind conținutul, cum ar fi materialele de abuz sexual asupra copiilor și deepfake-urile sexuale. Când imaginile cu persoane reale sunt în context, avem restricții sporite cu privire la tipul de imagini care pot fi create, cu măsuri de protecție deosebit de robuste în jurul nudității și violenței grafice. Ca în cazul oricărei lansări, siguranța nu se termină niciodată și este mai degrabă un domeniu de investiții continue. Pe măsură ce aflăm mai multe despre utilizarea acestui model în lumea reală, vom ajusta politicile noastre în consecință.

Pentru mai multe informații despre abordarea noastră, vizitează anexa la cardul de sistem GPT‑4o pentru generarea de imagini⁠.

Folosirea raționamentului pentru a îmbunătăți siguranța
Similar cu munca noastră de aliniere deliberativă⁠, am antrenat un LLM de raționament să lucreze direct pe baza specificațiilor de siguranță scrise și ușor de înțeles de către oameni. Am folosit acest model de raționament LLM în timpul dezvoltării pentru a ne ajuta să identificăm și să rezolvăm ambiguitățile din politicile noastre. Împreună cu progresele noastre multimodale și tehnicile de siguranță existente dezvoltate pentru ChatGPT și Sora, acest lucru ne permite să moderăm⁠ atât textul de intrare, cât și imaginile de ieșire conform politicilor noastre.

Acces și disponibilitate

Generarea de imagini 4o se lansează de astăzi pentru utilizatorii Plus, Pro, Team și Free ca generator de imagini implicit în ChatGPT, cu acces disponibil în curând pentru Enterprise și Edu. Este disponibil și pentru utilizare în Sora. Pentru cei care au un loc special în inimile lor pentru DALL·E, acesta poate fi accesat în continuare printr-un DALL·E GPT dedicat.

Dezvoltatorii vor putea în curând să genereze imagini cu GPT‑4o prin API, accesul urmând să fie disponibil în săptămânile următoare.

Crearea și personalizarea imaginilor este la fel de simplă ca o discuție folosind GPT‑4o - trebuie doar să descrii de ce ai nevoie, inclusiv orice specific, cum ar fi raportul de aspect, culorile exacte folosind coduri hexadecimale sau un fundal transparent. Pentru că acest model creează imagini mai detaliate, redarea acestora durează mai mult, adesea până la un minut.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Reluare livestream

Autor

OpenAI

Conducere

Gabriel Goh: Generare de imagini

Jackie Shannon: Produs ChatGPT

Mengchao Zhong, Wayne Chang: Inginerie ChatGPT

Rohan Sahai: Produs și inginerie Sora

Brendan Quinn, Tomer Kaftan: Inferență

Prafulla Dhariwal: Organizare multimodală

Cercetare

Cercetare fundamente

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Cercetare de bază

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Contribuitori cercetare

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Comportament model

Laurentia Romaniuk

Organizare multimodală

Andrew Gibiansky, Yang Lu

Date

Lideri date

Gildas Chabot, James Park Lennon

Date

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Moderatori

Hazel Byrne, Jennifer Luckenbill, Mariano López

Consultanți date umane

Long Ouyang

Scalare

Lideri inferență

Brendan Quinn, Tomer Kaftan

Inferență

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

Aplicat

Lider produs ChatGPT

Jackie Shannon

Lideri inginerie ChatGPT

Mengchao Zhong, Wayne Chang

Lider design produs

Matt Chan

Știință date

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

Lideri produse Sora

Rohan Sahai, Wesam Manassra

Produs și inginerie Sora

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

Siguranță

Lider siguranță

Somay Jain

Siguranță

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Strategie

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

Marketing & comunicații

Lideri comunicare și marketing

Minnia Feng, Natalie Summers, Taya Christianson

Comunicații

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

Design & creativ

Lideri

Kendra Rimbach, Veit Moeller

Design

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

Mulțumiri speciale

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco

Prezentăm generarea de imagini 4o

Generarea de imagini utile

Capabilități îmbunătățite

Randare text

Generare multi-tur

Urmărirea instrucțiunilor

Învățare în context

Cunoștințe despre lume

HTML

Fotorealism și stil

Limitări

Siguranță

Acces și disponibilitate

Reluare livestream

Autor

Conducere

Cercetare

Date

Scalare

Aplicat

Sora

Siguranță

Strategie

Marketing &amp; comunicații

Design &amp; creativ

Mulțumiri speciale

Marketing & comunicații

Design & creativ