25 mars 2025

Présentation de la génération d'images 4o

Déverrouiller la génération d'images utiles et précieuses avec un modèle nativement multimodal capable de produire des sorties précises, exactes et photoréalistes.

Essayer dans ChatGPT

Chargement…

Chez OpenAI, nous avons toujours cru que la génération d'images devait être une capacité essentielle de nos modèles de langage. C'est pourquoi nous avons intégré à GPT‑4o notre générateur d'images le plus avancé jusqu'à présent. Le résultat—une génération d'images qui n'est pas seulement belle, mais aussi utile.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Génération d'images pratiques

Des premières peintures rupestres aux infographies modernes, les humains ont utilisé l'imagerie visuelle pour communiquer, persuader et analyser, et non pas seulement pour décorer. Les modèles génératifs d'aujourd'hui peuvent évoquer des scènes surréalistes et époustouflantes, mais ils ont du mal avec les images utilitaires que les gens utilisent pour partager et créer des informations. Des logos aux diagrammes, les images peuvent transmettre une signification précise lorsqu'elles sont enrichies de symboles qui renvoient à un langage et à une expérience communs.

La génération d'images GPT‑4o excelle à rendre le texte avec précision, à suivre précisément les invites et à exploiter la base de connaissances et le contexte de clavardage inhérents à 4o, y compris la transformation d'images téléversées ou leur utilisation comme source d'inspiration visuelle. Ces fonctionnalités facilitent la création de l'image que vous avez en tête, vous aidant ainsi à communiquer plus efficacement à travers des visuels et faisant de la génération d'images un outil pratique, précis et puissant.

Capacités améliorées

Nous avons formé nos modèles sur la distribution conjointe des images et des textes en ligne, apprenant non seulement comment les images se rapportent au langage, mais aussi comment elles se rapportent entre elles. Combiné à une post-formation agressive, le modèle résultant présente une fluidité visuelle surprenante, capable de générer des images utiles, cohérentes et conscientes du contexte.

Rendu de texte

Une image vaut mille mots, mais parfois, générer quelques mots au bon endroit peut rehausser le sens d'une image. La capacité de 4o à associer des symboles précis à l'imagerie transforme la génération d'images en un outil de communication visuelle.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Génération multi-tours

Étant donné que la génération d'images est désormais intégrée à GPT‑4o, vous pouvez affiner les images par le biais d'une conversation naturelle. GPT‑4o peut s’appuyer sur des images et du texte dans le contexte du clavardage, assurant ainsi la cohérence tout au long. Par exemple, si vous concevez un personnage de jeu vidéo, l'apparence du personnage reste cohérente à travers plusieurs itérations à mesure que vous affinez et expérimentez.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Respect des instructions

La génération d'images par GPT‑4o suit des invites détaillées avec une attention particulière aux détails. Alors que d'autres systèmes peinent à traiter environ 5 à 8 objets, GPT‑4o peut gérer jusqu'à 10 à 20 objets différents. La liaison plus étroite des objets à leurs caractéristiques et relations permet un meilleur contrôle.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Apprentissage en contexte

GPT‑4o peut analyser et apprendre à partir d'images téléversées par l'utilisateur, en intégrant de façon transparente leurs détails dans son contexte pour informer la génération d'images.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Connaissances mondiales

La génération d'images natives permet à 4o de relier ses connaissances entre le texte et les images, ce qui se traduit par un modèle plus intelligent et plus efficace.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Photoréalisme et style

La formation sur des images reflétant une vaste gamme de styles permet au modèle de créer ou de transformer des images de manière convaincante.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Limites

Notre modèle n'est pas parfait. Nous sommes conscients de plusieurs limitations actuellement, que nous nous efforcerons de corriger par des améliorations du modèle après le lancement initial.

Nous avons remarqué que GPT‑4o peut parfois recadrer les images longues, comme les affiches, de manière trop serrée, surtout vers le bas.

Sécurité

Conformément à notre spécification modèle, nous visons à maximiser la liberté créative en soutenant des cas d'utilisation précieux tels que le développement de jeux, l'exploration historique et l'éducation, tout en maintenant des normes de sécurité strictes. En même temps, il demeure aussi important que jamais de bloquer les demandes qui violent ces normes. Vous trouverez ci-dessous des évaluations d'autres domaines de risque où nous travaillons pour activer un contenu sécurisé et très utile et apporter un soutien à une expression créative plus large pour les utilisateurs.

Provenance par C2PA et recherche interne réversible
Toutes les images générées sont accompagnées de métadonnées C2PA, qui identifieront une image comme provenant de GPT‑4o, afin d'assurer la transparence. Nous avons également développé un outil de recherche interne qui utilise les attributs techniques des générations pour vérifier si le contenu provient de notre modèle.

Bloquer les contenus inappropriés
Nous continuons de bloquer les demandes d'images générées susceptibles de violer nos politiques de contenu, telles que les contenus pédopornographiques et les hypertrucage à caractère sexuel. Lorsque des images de personnes réelles sont mises en contexte, nous appliquons des restrictions accrues quant au type d'images pouvant être créées, avec des mesures de protection particulièrement strictes concernant la nudité et la violence graphique. Comme pour tout lancement, la sécurité n'est jamais achevée et représente plutôt un domaine d'investissement continu. Au fur et à mesure que nous en apprendrons davantage sur l'utilisation de ce modèle dans le monde réel, nous ajusterons nos politiques en conséquence.

Pour en savoir plus sur notre approche, consultez l'addendum sur la génération d'images de la fiche système GPT‑4o⁠.

Utiliser le raisonnement pour renforcer la sécurité
À l’instar de notre travail d’alignement délibératif⁠, nous avons formé un LLM de raisonnement à travailler directement à partir de spécifications de sécurité rédigées par des humains et interprétables. Nous avons utilisé ce LLM de raisonnement pendant le développement pour nous aider à identifier et à résoudre les ambiguïtés dans nos politiques. Avec nos avancées multimodales et les techniques de sécurité existantes développées pour ChatGPT et Sora, cela nous permet de modérer⁠ à la fois le texte d’entrée et les images de sortie conformément à nos politiques.

Accès et disponibilité

La génération d'images 4o est déployée dès aujourd'hui pour les utilisateurs Plus, Pro, Team et Free en tant que générateur d'images par défault dans ChatGPT, avec un accès bientôt disponible pour Enterprise et Edu. Il est également disponible dans Sora. Pour ceux qui réservent une place spéciale dans leur cœur pour DALL·E, il est toujours possible d'y accéder via un GPT DALL·E dédié.

Les développeurs pourront bientôt générer des images avec GPT‑4o via l'API, l'accès étant déployé dans les prochaines semaines.

Créer et personnaliser des images est aussi simple que de clavarder à l'aide de GPT‑4o - décrivez simplement ce dont vous avez besoin, y compris les détails tels que le rapport hauteur/largeur, les couleurs exactes à l'aide de codes hexadécimaux ou un fond transparent. Parce que ce modèle crée des images plus détaillées, le rendu des images prend plus de temps, souvent jusqu'à une minute.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Retransmission en direct

Auteur

OpenAI

Leadership

Gabriel Goh : Génération d’images

Jackie Shannon : Produit ChatGPT

Mengchao Zhong, Wayne Chang : Ingénierie ChatGPT

Rohan Sahai : Produit et ingénierie de Sora

Brendan Quinn, Tomer Kaftan : Inférence

Prafulla Dhariwal : Organisation multimodale

Recherche

Recherche fondamentale

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Recherche fondamentale

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Contributeurs à la recherche

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Comportement modèle

Laurentia Romaniuk

Organisation multimodale

Andrew Gibiansky, Yang Lu

Données

Responsable des données

Gildas Chabot, James Park Lennon

Données

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Modérateurs

Hazel Byrne, Jennifer Luckenbill, Mariano López

Conseillers en données humaines

Long Ouyang

Mise à l’échelle

Pistes d'inférence

Brendan Quinn, Tomer Kaftan

Inférence

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

Appliqué

Chef de produit ChatGPT

Jackie Shannon

Responsables de l'ingénierie de ChatGPT

Mengchao Zhong, Wayne Chang

Responsable de la conception de produits

Matt Chan

Science des données

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

Responsables de produit Sora

Rohan Sahai, Wesam Manassra

Produit et ingénierie de Sora

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

Sécurité

Responsable de la sécurité

Somay Jain

Sécurité

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Stratégie

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

Marketing et communications

Responsables des communications et du marketing

Minnia Feng, Natalie Summers, Taya Christianson

Communications

Alex Whitcomb-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

Conception et création

Responsables

Kendra Rimbach, Veit Moeller

Conception

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

Remerciements spéciaux

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco