25 mars 2025

Vi presenterar bildgenerering med 4o

Få tillgång till användbar och användbar bildgenerering med en inbyggd multimodal modell som ger precisa, korrekta och fotorealistiska utdata.

Prova i ChatGPT

Laddar …

På OpenAI har vi länge ansett att bildgenerering borde vara en huvudfunktion i våra språkmodeller. Därför har vi integrerat vår mest avancerade bildgenerator hittills i GPT‑4o. Resultatet är bildgenerering som inte bara är vacker, utan också användbar.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Användbar bildgenerering

Från de första grottmålningarna till modern infografik har människor använt visuella framställningar inte bara för dekoration, utan också för att kommunicera, övertyga och analysera. Dagens generativa modeller kan skapa surrealistiska, häpnadsväckande scener, men har svårt för ändamålsenlig grafik som används för att dela och skapa information. Från logotyper till diagram kan bilder förmedla en precis betydelse när de kompletteras med symboler från ett gemensamt språk och gemensamma erfarenheter.

Exakt återgivning av texter, noggrann tillämpning av promptar och användning av all kunskap och chattkontext är utmärkande egenskaper för bildgenerering med GPT‑4o. Uppladdade bilder kan också omvandlas eller användas som visuell inspiration. Med de här funktionerna blir det lättare att skapa precis den bild du föreställer dig, så att du kan kommunicera mer effektivt med visuellt innehåll. På så vis kan bildgenerering bli ett praktiskt, precist och kraftfullt verktyg.

Förbättrade funktioner

Vi har tränat våra modeller på att placera bilder och texter tillsammans på webben, så att de inte bara lär sig relationen mellan bild och språk, utan också relationen mellan bilderna sinsemellan. I kombination med aggressiv efterträning uppvisar den resulterande modellen en överraskande visuell kompetens och kan generera bilder som är användbara, konsekventa och medvetna om kontexten.

Textrendering

En bild säger mer än tusen ord, men ibland är det ännu bättre att generera några ord på rätt plats. 4o blandar precisa symboler med bilder och grafik, vilket gör bildgenerering till ett verktyg för visuell kommunikation.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Generering i flera steg

Eftersom bildgenerering nu är integrerat i GPT‑4o kan du förbättra bilderna i naturliga konversationer. GPT‑4o kan använda bilder och text i chattkontexten som grund för att säkerställa konsekvens. Om du till exempel designar en karaktär för ett videospel förblir karaktärens utseende konsekvent i flera iterationer när du förfinar och experimenterar.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Instruktionsefterlevnad

GPT‑4o:s bildgenerering följer noggrant detaljerade promptar. Medan andra system har svårt med cirka 5–8 objekt, kan GPT‑4o hantera upp till 10–20 olika objekt. En starkare koppling mellan objekt och deras egenskaper och relationer ger bättre kontroll.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Inlärning i kontext

GPT‑4o kan analysera bilder som laddats upp av användare och lära sig av dem. Detaljer integreras sömlöst i kontexten för att stödja bildgenereringen.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Kunskap om världen

Den inbyggda bildgenereringen gör det möjligt för 4o att kombinera sin kunskap om texter och bilder, vilket resulterar i en modell som känns smartare och mer effektiv.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Fotorealism och stil

Tack vare träningen med bildmaterial som speglar en stor variation av visuella stilar kan modellen skapa eller omvandla bilder på ett övertygande sätt.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Begränsningar

Vår modell är inte perfekt. Vår modell är inte perfekt. Vi är medvetna om flera begränsningar som vi kommer att åtgärda efter lanseringen genom att förbättra modellen.

Vi har märkt att GPT‑4o ibland kan beskära längre bilder, som affischer, för hårt, särskilt nära nederkanten.

Säkerhet

I linje med vår modellspecifikation vill vi maximera den kreativa friheten genom att stödja användbara tillämpningar som spelutveckling, historisk forskning och utbildning, samtidigt som vi upprätthåller höga säkerhetsstandarder. Det är också fortfarande lika viktig att blockera förfrågningar som bryter mot dessa standarder. Nedan följer en lista över ytterligare riskområden där vi arbetar för att tillhandahålla säkert, användbart innehåll och ge användarna möjlighet att uttrycka sig kreativt på ett mer omfattande sätt.

Spårbarhet via C2PA och intern omvändbar sökning
Alla genererade bilder har C2PA-metadata, som identifierar en bild som skapad av GPT‑4o, för att ge transparens. Vi har också utvecklat ett internt sökverktyg som använder tekniska genereringsfunktioner för att verifiera om innehållet kommer från vår modell.

Blockering av skadligt innehåll
Vi fortsätter att blockera förfrågningar om genererade bilder som kan bryta mot vår innehållspolicy, till exempel innehåll som rör sexuella övergrepp mot barn och sexuella deepfakes. När bilder av riktiga människor förekommer i sitt sammanhang har vi strängare begränsningar för vilken typ av bilder som får skapas, med särskilt strikta skyddsåtgärder för nakenhet och grafiskt våld. Som vid varje produktlansering är säkerhetsaspekten aldrig ett avslutat ämne, utan snarare ett område som det kontinuerligt investeras i. Allteftersom vi lär oss mer om den faktiska användningen av denna modell kommer vi att anpassa våra riktlinjer därefter.

För mer om vårt tillvägagångssätt kan du besöka tillägget om bildgenerering till systemkort för GPT‑4o⁠.

Använda resonemang för att stärka säkerheten
Precis som i vårt arbete med deliberativ anpassning⁠ har vi tränat en resonerande LLM som arbetar direkt med säkerhetsriktlinjer som är skrivna av människor och som är lätta att förstå. Vi använde denna resonemangs-LLM under utvecklingen för att hjälpa oss att identifiera och åtgärda oklarheter i våra policyer. Tillsammans med våra multimodala framsteg och befintliga säkerhetstekniker som utvecklats för ChatGPT och Sora kan vi moderera⁠ både inmatad text och utmatade bilder i enlighet med våra riktlinjer.

Åtkomst och tillgänglighet

4o image generation rolls out starting today to Plus, Pro, Team, and Free users as the default image generator in ChatGPT, with access coming soon to Enterprise and Edu. It’s also available to use in Sora. For those who hold a special place in their hearts for DALL·E, it can still be accessed through a dedicated DALL·E GPT.

Developers will soon be able to generate images with GPT‑4o via the API, with access rolling out in the next few weeks.

Creating and customizing images is as simple as chatting using GPT‑4o - just describe what you need, including any specifics like aspect ratio, exact colors using hex codes, or a transparent background. Because this model creates more detailed pictures, images take longer to render, often up to one minute.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Repris av livestream

Författare

OpenAI

Ledarskap

Gabriel Goh: Bildgenerering

Jackie Shannon: ChatGPT-produkt

Mengchao Zhong, Wayne Chang: ChatGPT-utveckling

Rohan Sahai: Sora-produkt och -utveckling

Brendan Quinn, Tomer Kaftan: Inferens

Prafulla Dhariwal: Multimodal organisation

Forskning

Grundläggande forskning

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Kärnforskning

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Forskningsmedarbetare

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Modellbeteende

Laurentia Romaniuk

Multimodal Organisation

Andrew Gibiansky, Yang Lu

Data

Dataansvariga

Gildas Chabot, James Park Lennon

Data

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Moderatorer

Hazel Byrne, Jennifer Luckenbill, Mariano López

Rådgivare för mänskliga data

Long Ouyang

Skalning

Inferensansvarige

Brendan Quinn, Tomer Kaftan

Inferens

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev och Stanley Hsieh

Tillämpad

ChatGPT-produktchef

Jackie Shannon

Utvecklingschefer för ChatGPT

Mengchao Zhong, Wayne Chang

Chef för produktdesign

Matt Chan

Datavetenskap

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

Produktchefer för Sora

Rohan Sahai, Wesam Manassra

Sora-produkt och -utveckling

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

Säkerhet

Säkerhetsansvarig

Somay Jain

Säkerhet

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Strategi

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

Marknadsföring och kommunikation

Kommunikations- och marknadsföringsansvariga

Minnia Feng, Natalie Summers, Taya Christianson

Kommunikation

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

Design och kreativ produktion

Ansvariga

Kendra Rimbach, Veit Moeller

Design

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

Ett särskilt tack

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco