25 de março de 2025

Apresentamos a geração de imagens do 4o

Desbloqueando a geração de imagens úteis e valiosas com um modelo nativamente multimodal capaz de produzir resultados precisos, exatos e fotorrealistas.

Testar no ChatGPT

Carregando…

Na OpenAI, sempre acreditamos que a geração de imagens deve ser uma capacidade primordial de nossos modelos de linguagem. É por isso que incorporamos o nosso gerador de imagens mais avançado até hoje no GPT‑4o. O resultado: geração de imagens que não são apenas belas, mas também úteis.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Geração de imagens úteis

Desde as primeiras pinturas rupestres até os infográficos modernos, os seres humanos usam imagens visuais para comunicar, persuadir e analisar — e não apenas para decorar. Os modelos generativos atuais conseguem criar cenas surreais e de tirar o fôlego, mas têm dificuldades com as imagens comuns que as pessoas usam para compartilhar e criar informações. De logotipos a diagramas, as imagens podem transmitir significados precisos quando complementadas com símbolos que remetem a uma linguagem e experiência compartilhadas.

A geração de imagens do GPT‑4o se destaca na renderização precisa de texto, no seguimento exato de instruções e no aproveitamento da base de conhecimento inerente do 4o e do contexto do chat — incluindo a transformação de imagens enviadas ou o uso delas como inspiração visual. Essas funcionalidades facilitam a criação exata da imagem que você idealiza, ajudando você a se comunicar de forma mais eficaz por meio de recursos visuais e transformando a geração de imagens em uma ferramenta prática, precisa e poderosa.

Capacidades aprimoradas

Treinamos nossos modelos na distribuição conjunta de imagens e textos online, aprendendo não apenas como as imagens se relacionam com a linguagem, mas também como elas se relacionam entre si. Combinado com um pós-treinamento agressivo, o modelo resultante apresenta uma fluência visual surpreendente, capaz de gerar imagens úteis, consistentes e contextualizadas.

Renderização de texto

Uma imagem vale mais que mil palavras, mas às vezes, algumas palavras colocadas no lugar certo podem elevar o significado de uma imagem. A capacidade do 4o de combinar símbolos precisos com imagens transforma a geração de imagens em uma ferramenta de comunicação visual.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Geração de múltiplas voltas

Como a geração de imagens agora é nativa do GPT‑4o, você pode refinar as imagens por meio de conversas naturais. O GPT‑4o consegue analisar imagens e texto no contexto do chat, garantindo consistência em todas as interações. Por exemplo, se você estiver criando um personagem para um videogame, a aparência do personagem permanece coerente em várias iterações à medida que você refina e experimenta.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Cumprimento de instruções

A geração de imagens do GPT‑4o segue instruções detalhadas com atenção aos detalhes. Enquanto outros sistemas têm dificuldades com cerca de 5 a 8 objetos, o GPT‑4o consegue lidar com até 10 a 20 objetos diferentes. A maior vinculação dos objetos às suas características e relações permite um melhor controle.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here’s the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Aprendizagem contextualizada

O GPT‑4o consegue analisar e aprender com imagens enviadas pelos usuários, integrando perfeitamente os detalhes dessas imagens ao seu contexto para gerar novas imagens.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Conhecimento mundial

A geração nativa de imagens permite que o 4o conecte seu conhecimento entre texto e imagens, resultando em um modelo que se mostra mais inteligente e eficiente.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Fotorrealismo e estilo

O treinamento com imagens que refletem uma vasta gama de estilos permite ao modelo criar ou transformar imagens de forma convincente.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Limitações

Nosso modelo não é perfeito. Estamos cientes de várias limitações no momento e trabalharemos para resolvê-las por meio de melhorias no modelo após o lançamento inicial.

Notamos que o GPT‑4o ocasionalmente recorta imagens mais longas, como pôsteres, de forma muito apertada, especialmente na parte inferior.

Segurança

Em consonância com nossa Especificação de Modelo, buscamos maximizar a liberdade criativa, oferecendo suporte a casos de uso valiosos, como desenvolvimento de jogos, exploração histórica e educação, mantendo, ao mesmo tempo, altos padrões de segurança. Ao mesmo tempo, continua sendo tão importante quanto sempre bloquear solicitações que violem esses padrões. A seguir, apresentamos avaliações de áreas de risco adicionais nas quais estamos trabalhando para viabilizar conteúdo seguro e de alta utilidade, além de apoiar uma expressão criativa mais ampla para os usuários.

Proveniência via C2PA e busca interna reversível
Todas as imagens geradas vêm com metadados C2PA, que identificam uma imagem como proveniente do GPT‑4o, para garantir a transparência. Também desenvolvemos uma ferramenta de busca interna que utiliza atributos técnicos de gerações para ajudar a verificar se o conteúdo veio do nosso modelo.

Bloqueando o conteúdo inadequado
Continuamos bloqueando solicitações de imagens geradas que possam violar nossas políticas de conteúdo, como materiais de abuso sexual infantil e deepfakes sexuais. Quando as imagens de pessoas reais estão em contexto, temos restrições mais rigorosas quanto ao tipo de imagens que podem ser criadas, com salvaguardas particularmente robustas em relação à nudez e à violência explícita. Como em qualquer lançamento, a segurança nunca está concluída e é, na verdade, uma área contínua de investimento. À medida que aprendermos mais sobre o uso prático desse modelo, ajustaremos nossas políticas de acordo.

Para mais informações sobre nossa abordagem, visite o adendo de geração de imagens do cartão do sistema GPT‑4o⁠.

Utilizando o raciocínio para promover a segurança
De forma semelhante ao nosso trabalho de alinhamento deliberativo⁠ , treinamos um LLM (Liderança em Aprendizagem Baseada em Evidências) de raciocínio para trabalhar diretamente com especificações de segurança escritas por humanos e interpretáveis. Utilizamos esse raciocínio do LLM durante o desenvolvimento para nos ajudar a identificar e abordar ambiguidades em nossas políticas. Em conjunto com nossos avanços multimodais e as técnicas de segurança existentes desenvolvidas para o ChatGPT e o Sora, isso nos permite moderar⁠ tanto o texto de entrada quanto as imagens de saída de acordo com nossas políticas.

Acesso e disponibilidade

4o image generation rolls out starting today to Plus, Pro, Team, and Free users as the default image generator in ChatGPT, with access coming soon to Enterprise and Edu. It’s also available to use in Sora. For those who hold a special place in their hearts for DALL·E, it can still be accessed through a dedicated DALL·E GPT.

Developers will soon be able to generate images with GPT‑4o via the API, with access rolling out in the next few weeks.

Creating and customizing images is as simple as chatting using GPT‑4o - just describe what you need, including any specifics like aspect ratio, exact colors using hex codes, or a transparent background. Because this model creates more detailed pictures, images take longer to render, often up to one minute.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Reprise da transmissão ao vivo

Autoria

OpenAI

Liderança

Gabriel Goh: Geração de Imagens

Jackie Shannon: ChatGPT Product

Mengchao Zhong, Wayne Chang: Engenharia ChatGPT

Rohan Sahai: Produto e Engenharia Sora

Brendan Quinn, Tomer Kaftan: Inferência

Prafulla Dhariwal: Organização Multimodal

Pesquisa

Pesquisa Fundamental

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Pesquisa Central

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Colaboradores de pesquisa

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Comportamento do Modelo

Laurentia Romaniuk

Organização Multimodal

Andrew Gibiansky, Yang Lu

Dados

Líderes de Dados

Gildas Chabot, James Park Lennon

Dados

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Moderadores

Hazel Byrne, Jennifer Luckenbill, Mariano López

Consultores de Dados Humanos

Long Ouyang

Escalonamento

Inferência de Liderança

Brendan Quinn, Tomer Kaftan

Inferência

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

Aplicado

Líder de Produto do ChatGPT

Jackie Shannon

Líderes de Engenharia do ChatGPT

Mengchao Zhong, Wayne Chang

Líder de Design de Produto

Matt Chan

Ciência de Dados

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao e Yilei Qian

Sora

Líderes de Produto Sora

Rohan Sahai e Wesam Manassra

Sora - Produtos e Engenharia

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai e Wesam Manassra

Segurança

Líder de Segurança

Somay Jain

Segurança

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Estratégia

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan e Zoe Stoll

Marketing e Comunicação

Lideranças de Comunicação e Marketing

Minnia Feng, Natalie Summers e Taya Christianson

Comunicações

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith e Souki Mansoor

Design e Criação

Lideranças

Kendra Rimbach, Veit Moeller

Design

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke e Yara Khakbaz

Agradecimentos especiais

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi e Vinnie Monaco