25 mars 2025

Présentation de la génération d'images 4o

Débloquer une génération d’images utile et précieuse avec un modèle multimodal natif, capable de générer des sorties précises et photoréalistes.

Essayer dans ChatGPT

Chargement...

Chez OpenAI, nous avons toujours pensé que la génération d’images devait être l’une des principales fonctionnalités de nos modèles de langage. C’est pourquoi nous avons intégré notre générateur d’images le plus avancé à ce jour dans GPT‑4o. Résultat : une génération d’images non seulement belle, mais aussi pratique.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Une génération d’images utile

Des premières peintures rupestres aux infographies modernes, les humains ont utilisé l’imagerie visuelle pour communiquer, persuader et analyser, et pas seulement dans un but décoratif. Les modèles génératifs actuels peuvent évoquer des scènes surréalistes et époustouflantes, mais peinent à reproduire les images courantes que les internautes utilisent pour créer des informations et en échanger. Des logos aux diagrammes, les images peuvent transmettre un sens précis lorsqu’elles sont enrichies de symboles qui renvoient à une expérience et à un langage partagés.

La génération d’images GPT‑4o excelle pour rendre le texte avec précision,suivre fidèlement les prompts et exploiter la base de connaissances et le contexte des chats inhérents à 4o, y compris en transformant des images chargées ou en les utilisant comme source d’inspiration visuelle. Ces fonctionnalités facilitent la création de l’image que vous souhaitez, vous aident à communiquer plus efficacement par le biais de visuels et font de la génération d’images un outil pratique, précis et puissant.

Fonctionnalités améliorées

Nous avons formé nos modèles sur la distribution conjointe d’images et de textes en ligne, en leur apprenant non seulement comment les images sont reliées au langage, mais aussi entre elles. Associé à un post-entraînement agressif, le modèle obtenu présente une fluidité visuelle surprenante, capable de générer des images utiles, cohérentes et tenant compte du contexte.

Rendu de texte

Une image vaut mieux que mille mots, mais il suffit parfois de placer quelques mots au bon endroit pour rehausser le sens d’une image. La capacité de 4o à associer des symboles précis à des images fait de la génération d’images un outil de communication visuelle.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Génération à tour de rôle

La génération d’images étant désormais native dans GPT‑4o, vous pouvez affiner les images par le biais d’une conversation naturelle. GPT‑4o s’appuie sur des images et du texte dans le contexte du chat, en garantissant la cohérence de l’ensemble. Par exemple, si vous concevez un personnage de jeu vidéo, l’apparence du personnage reste cohérente à travers de multiples itérations, au fur et à mesure que vous l’affinez.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Respect des instructions

La génération d’images de GPT‑4o suit des prompts précis avec le souci du détail. Alors que d’autres systèmes peinent à traiter 5 à 8 objets, GPT‑4o peut gérer jusqu’à 10 à 20 objets différents. La liaison plus étroite des objets à leurs caractéristiques et à leurs relations permet un meilleur contrôle.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Apprentissage en contexte

GPT‑4o peut analyser et s’entraîner à partir d’images chargées par l’utilisateur, en intégrant des détails dans le contexte pour améliorer la génération d’images.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Connaissances mondiales

La génération d’images natives permet à 4o de créer un lien entre le texte et les images, ce qui donne un modèle plus intelligent et plus efficace.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Photoréalisme et style

L’entraînement sur des images reflétant une vaste gamme de styles d’image permet au modèle de créér ou de transformer des images de façon convaincante.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Limites

Notre modèle est loin d’être parfait. Nous sommes conscients de ses limites actuelles et nous nous efforcerons d’y remédier en l’améliorant après le lancement initial.

Nous avons remarqué que GPT‑4o recadrait parfois un peu trop des images longues, comme des affiches, en particulier la partie inférieure.

Sécurité

Conformément aux spécifications de notre modèle, nous visons à maximiser la liberté de création en favorisant des cas d’utilisation intéressants tels que le développement de jeux, l’exploration historique et l’enseignement, tout en respectant des normes de sécurité strictes. Dans le même temps, il est toujours aussi important de bloquer les demandes qui ne respectent pas ces normes. Ci-dessous, vous trouverez des évaluations d’autres domaines de risque dans lesquels nous travaillons pour garantir des contenus sûrs et utiles, et favoriser une expression créative plus large pour les utilisateurs.

Provenance via C2PA et recherche réversible interne
Toutes les images générées sont accompagnées de métadonnées C2PA, qui identifient une image comme provenant de GPT‑4o, afin d’assurer la transparence. Nous avons également conçu un outil de recherche interne qui exploite les attributs techniques des générations pour vérifier si le contenu provient de notre modèle.

Blocage des contenus indésirables
Nous continuons de bloquer les demandes d’images générées susceptibles d’enfreindre notre politique relative aux contenus, telles que les contenus pédopornographiques et les deepfakes à caractère sexuel. Lorsqu’il s’agit d’images de personnes réelles, nous avons des restrictions accrues concernant le type d’images qui peuvent être créées, avec des garanties particulièrement solides en ce qui concerne la nudité et la violence graphique. Comme pour tout lancement, la sécurité n’est jamais acquise et constitue plutôt un domaine d’investissement permanent. À mesure que nous en apprenons davantage sur l’utilisation réelle de ce modèle, nous ajustons nos politiques en conséquence.

Pour en savoir plus sur notre approche, consultez l’addendum à la fiche système GPT‑4o⁠ sur la génération d’images.

Utiliser le raisonnement pour renforcer la sécurité
À l’instar de notre travail sur l’alignement délibératif⁠, nous avons formé un LLM de raisonnement à partir de spécifications de sécurité écrites par des êtres humains et interprétables. Nous avons utilisé ce modèle de raisonnement LLM pendant la phase de développement pour nous aider à identifier et à résoudre les ambiguïtés de nos politiques. Avec nos avancées multimodales et les techniques de sécurité existantes développées pour ChatGPT et Sora, cela nous permet de modérer⁠ à la fois la saisie de texte et les images de sortie conformément à nos politiques.

Accès et disponibilité

4o image generation rolls out starting today to Plus, Pro, Team, and Free users as the default image generator in ChatGPT, with access coming soon to Enterprise and Edu. It’s also available to use in Sora. For those who hold a special place in their hearts for DALL·E, it can still be accessed through a dedicated DALL·E GPT.

Developers will soon be able to generate images with GPT‑4o via the API, with access rolling out in the next few weeks.

Creating and customizing images is as simple as chatting using GPT‑4o - just describe what you need, including any specifics like aspect ratio, exact colors using hex codes, or a transparent background. Because this model creates more detailed pictures, images take longer to render, often up to one minute.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Revoir l’événement vidéo

Auteur

OpenAI

Leadership

Gabriel Goh : Génération d’images

Jackie Shannon : Produit ChatGPT

Mengchao Zhong, Wayne Chang : Ingénierie de ChatGPT

Rohan Sahai : Produit et ingénierie Sora

Brendan Quinn, Tomer Kaftan : Inférence

Prafulla Dhariwal : Organisation multimodale

Recherches

Recherche fondamentale

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Recherche fondamentale

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Contributeurs à la recherche

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Comportement des modèles

Laurentia Romaniuk

Organisation multimodale

Andrew Gibiansky, Yang Lu

Données

Responsables des données

Gildas Chabot, James Park Lennon

Données

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Modérateurs

Hazel Byrne, Jennifer Luckenbill, Mariano López

Conseillers en données humaines

Long Ouyang

Mise à l’échelle

Responsables d’inférence

Brendan Quinn, Tomer Kaftan

Inférence

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

Appliqué

Responsable produit ChatGPT

Jackie Shannon

Responsables de l’ingénierie ChatGPT

Mengchao Zhong, Wayne Chang

Responsable de la conception produit

Matt Chan

Science des données

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

Responsables produit Sora

Rohan Sahai, Wesam Manassra

Produit et ingénierie Sora

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

Sécurité

Responsable de la sécurité

Somay Jain

Sécurité

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Stratégie

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

Marketing et communication

Responsables des communications et du marketing

Minnia Feng, Natalie Summers, Taya Christianson

Communications

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

Conception et graphismes

Responsables

Kendra Rimbach, Veit Moeller

Conception

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

Remerciements

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco