25 marzo 2025

Ti presentiamo la generazione di immagini con 4o

Sblocca la generazione di immagini utili e di valore con un modello multimodale nativo in grado di fornire risultati precisi, accurati e fotorealistici.

Prova su ChatGPT

Caricamento in corso...

Noi di OpenAI, crediamo da tempo che la generazione di immagini debba essere una funzionalità primaria dei nostri modelli linguistici. Ecco perché abbiamo integrato il nostro generatore di immagini più avanzato mai realizzato in GPT‑4o Il risultato è una generazione di immagini non solo bella, ma anche utile.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Generazione di immagini utili

Dalle prime pitture rupestri alle moderne infografiche, gli esseri umani hanno sempre utilizzato le immagini per comunicare, persuadere e analizzare, non solo per decorare. I modelli generativi odierni sono in grado di evocare scene surreali e mozzafiato, ma faticano con le immagini di uso comune che le persone utilizzano per condividere e creare informazioni. Dai loghi ai diagrammi, le immagini possono trasmettere un significato preciso quando sono arricchite da simboli che rimandano a un linguaggio e a un'esperienza condivisi.

La generazione di immagini GPT‑4o si distingue per la sua capacità di riprodurre accuratamente il testo, seguire con precisione le istruzioni e sfruttare la base di conoscenze intrinseca di 4o e il contesto della chat, compresa la trasformazione delle immagini caricate o il loro utilizzo come ispirazione visiva. Queste funzionalità rendono più facile creare esattamente l'immagine che hai in mente, aiutandoti a comunicare in modo più efficace attraverso le immagini e trasformando la generazione di immagini in uno strumento pratico, preciso e potente.

Funzionalità migliorate

Abbiamo addestrato i nostri modelli sulla distribuzione congiunta di immagini e testi online, imparando non solo come le immagini si relazionano al linguaggio, ma anche come si relazionano tra loro. In combinazione con un post-addestramento intensivo, il modello risultante ha una sorprendente fluidità visiva, in grado di generare immagini utili, coerenti e sensibili al contesto.

Rendering del testo

Un'immagine vale più di mille parole, ma a volte aggiungere qualche parola al posto giusto può esaltare il significato di un'immagine. La capacità di 4o di combinare simboli precisi con immagini trasforma la generazione di immagini in uno strumento di comunicazione visiva.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Generazione a turni multipli

Poiché la generazione di immagini è ora nativa su GPT‑4o, puoi perfezionare le immagini tramite una conversazione naturale. GPT‑4o può basarsi su immagini e testo nel contesto della chat, garantendo coerenza in tutto il contesto. Ad esempio, se si sta progettando un personaggio di un videogioco, l'aspetto del personaggio rimane coerente attraverso più iterazioni mentre lo si perfeziona e lo si sperimenta.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Rispetto delle istruzioni

La generazione di immagini di GPT‑4o segue prompt dettagliati con grande attenzione ai particolari. Mentre altri sistemi hanno difficoltà con circa 5-8 oggetti, GPT‑4o è in grado di gestire fino a 10-20 oggetti diversi. Il legame più stretto tra gli oggetti e le loro caratteristiche e relazioni consente un controllo migliore.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Apprendimento contestualizzato

GPT‑4o può analizzare e apprendere dalle immagini caricate dagli utenti, integrando perfettamente i loro dettagli nel contesto per migliorare la generazione di immagini.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Conoscenza del mondo

La generazione di immagini native consente a 4o di collegare le proprie conoscenze tra testo e immagini, ottenendo un modello più intelligente ed efficiente.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Fotorealismo e stile

L'addestramento su immagini che riflettono un'ampia varietà di stili consente al modello di creare o trasformare immagini in modo convincente.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Limiti

Il nostro modello non è perfetto. Siamo consapevoli delle numerose limitazioni attuali, che cercheremo di risolvere migliorando il modello dopo il lancio iniziale.

Abbiamo notato che GPT‑4o può occasionalmente ritagliare immagini più lunghe, come i poster, in modo troppo stretto, specialmente nella parte inferiore.

Sicurezza

In linea con la nostra specifica del modello, miriamo a massimizzare la libertà creativa supportando casi d'uso di valore come lo sviluppo di giochi, l'esplorazione storica e l'istruzione, mantenendo al contempo elevati standard di sicurezza. Allo stesso tempo, rimane importante come sempre bloccare le richieste che violano tali standard. Di seguito sono riportate le valutazioni di ulteriori aree di rischio in cui stiamo lavorando per consentire contenuti sicuri e di elevata utilità e supportare una più ampia espressione creativa per gli utenti.

Provenienza tramite C2PA e ricerca interna reversibile
Tutte le immagini generate sono accompagnate da metadati C2PA⁠, che identificano l'immagine come proveniente da GPT‑4o, per garantire la trasparenza. Abbiamo anche creato uno strumento di ricerca interno che utilizza gli attributi tecnici delle generazioni per aiutare a verificare se il contenuto proviene dal nostro modello.

Blocco dei contenuti dannosi
Continuiamo a bloccare le richieste di immagini generate che potrebbero violare le nostre norme sui contenuti, come i materiali relativi ad abusi sessuali su minori e i deepfake a sfondo sessuale. Quando le immagini ritraggono persone reali nel loro contesto, abbiamo inasprito le restrizioni relative al tipo di immagini che possono essere create, con misure di protezione particolarmente rigorose in materia di nudità e violenza esplicita. Come per qualsiasi lancio, la sicurezza non è mai definitiva, ma piuttosto un'area di investimento continuo. Man mano che acquisiremo maggiori informazioni sull'uso reale di questo modello, adegueremo le nostre politiche di conseguenza.

Per ulteriori informazioni sul nostro approccio, visita l'addendum alla scheda di sistema di GPT‑4o⁠ per la generazione di immagini.

Utilizzo del ragionamento per migliorare la sicurezza
Analogamente al nostro lavoro di allineamento deliberativo⁠, abbiamo addestrato un LLM di ragionamento per operare direttamente sulla base di specifiche di sicurezza scritte da esseri umani e interpretabili. Abbiamo utilizzato questo LLM di ragionamento durante lo sviluppo per aiutarci a identificare e risolvere le ambiguità nelle nostre politiche. Insieme ai nostri progressi multimodali e alle tecniche di sicurezza esistenti sviluppate per ChatGPT e Sora, questo ci consente di moderare⁠ sia il testo di input che le immagini di output in conformità con le nostre politiche.

Accesso e disponibilità

La generazione di immagini 4o è disponibile da oggi per gli utenti Plus, Pro, Team e Free come generatore di immagini predefinito in ChatGPT, mentre l'accesso sarà presto disponibile anche per Enterprise ed Edu. È disponibile anche su Sora. Per coloro che hanno un posto speciale nel loro cuore per DALL·E, è ancora possibile accedervi tramite un DALL·E GPT dedicato.

Gli sviluppatori potranno presto generare immagini con GPT‑4o tramite l'API, con accesso disponibile nelle prossime settimane.

Creare e personalizzare immagini è semplice come chattare con GPT‑4o: basta descrivere ciò che serve, inclusi eventuali dettagli come le proporzioni, i colori esatti utilizzando codici esadecimali o uno sfondo trasparente. Poiché questo modello crea immagini più dettagliate, il rendering richiede più tempo, spesso fino a un minuto.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Registrazione della diretta

Autore

OpenAI

Direttivo

Gabriel Goh: Generazione di immagini

Jackie Shannon: ChatGPT Product

Mengchao Zhong, Wayne Chang: Ingegneria di ChatGPT

Rohan Sahai: Sora Product and Engineering

Brendan Quinn, Tomer Kaftan: Inferenza

Prafulla Dhariwal: Organizzazione multimodale

Ricerca

Ricerca di base

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Ricerca principale

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Collaboratori di ricerca

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Comportamento del modello

Laurentia Romaniuk

Organizzazione multimodale

Andrew Gibiansky, Yang Lu

Dati

Responsabili dei dati

Gildas Chabot, James Park Lennon

Dati

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Moderatori

Hazel Byrne, Jennifer Luckenbill, Mariano López

Consulenti dati personali

Long Ouyang

Scaling

Responsabili di inferenze

Brendan Quinn, Tomer Kaftan

Inferenza

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

Applicato

Responsabile prodotto ChatGPT

Jackie Shannon

Responsabili tecnici ChatGPT

Mengchao Zhong, Wayne Chang

Responsabile della progettazione dei prodotti

Matt Chan

Data science

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

Responsabili prodotto Sora

Rohan Sahai, Wesam Manassra

Sora Product and Engineering

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

Sicurezza

Responsabile della sicurezza

Somay Jain

Sicurezza

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Strategia

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

Marketing e comunicazione

Responsabili comunicazione e marketing

Minnia Feng, Natalie Summers, Taya Christianson

Comunicazioni

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

Design & Creative

Responsabili

Kendra Rimbach, Veit Moeller

Design/Progettazione

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

Ringraziamenti speciali

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco

Ti presentiamo la generazione di immagini con 4o

Generazione di immagini utili

Funzionalità migliorate

Rendering del testo

Generazione a turni multipli

Rispetto delle istruzioni

Apprendimento contestualizzato

Conoscenza del mondo

HTML

Fotorealismo e stile

Limiti

Sicurezza

Accesso e disponibilità

Registrazione della diretta

Autore

Direttivo

Ricerca

Dati

Scaling

Applicato

Sora

Sicurezza

Strategia

Marketing e comunicazione

Design &amp; Creative

Ringraziamenti speciali

Design & Creative