2025년 3월 25일

4o 이미지 생성 소개

정확하고 사실적인 결과물을 생성할 수 있는 네이티브 멀티모달 모델로 실용적이고 가치 있는 이미지 생성을 실현합니다.

로딩 중...

OpenAI는 이미지 생성이 언어 모델의 핵심 기능이 되어야 한다고 오랫동안 믿어왔습니다. 그래서 지금까지 가장 진보된 이미지 생성기를 GPT‑4o에 통합했습니다. 그 결과 아름답고 유용한 이미지를 생성할 수 있습니다.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

실용적인 이미지 생성

최초의 동굴 벽화부터 현대의 인포그래픽에 이르기까지 인간은 시각적 이미지를 단순한 장식이 아닌 소통, 설득, 분석의 수단으로 사용해 왔습니다. 오늘날의 생성형 모델은 초현실적이고 숨막히는 장면을 만들어낼 수 있지만 사람들이 정보를 공유하고 생성하기 위해 사용하는 실용적인 이미지를 다루는 데 어려움을 겪습니다. 로고부터 다이어그램까지, 이미지는 공유된 언어와 경험을 참조하는 기호와 함께 사용될 때 정확한 의미를 전달할 수 있습니다.

GPT‑4o 이미지 생성은 텍스트를 정확하게 렌더링하고 프롬프트를 정밀하게 따르며, 4o의 고유한 지식 기반과 채팅 컨텍스트를 활용하는 데 뛰어납니다. 업로드된 이미지를 변환하거나 시각적 영감으로 사용하는 것도 포함됩니다. 이러한 기능을 통해 원하는 이미지를 정확히 제작할 수 있으며 시각적 커뮤니케이션을 더욱 효과적으로 만들고 이미지 생성을 정밀하고 강력한 실용 툴로 발전시킵니다.

개선된 기능

우리는 온라인 이미지와 텍스트의 공동 배포에 대해 모델을 훈련시켜 이미지가 언어와 어떻게 연관되는지뿐만 아니라 서로 어떻게 연관되는지 학습하게 했습니다. 강도 높은 사후 훈련을 통해 유용하고 일관되며 컨텍스트를 인식하는 이미지를 생성할 수 있는 뛰어난 시각적 유창성을 갖추게 되었습니다.

텍스트 렌더링

한 장의 그림은 천 마디 말보다 가치가 있지만 때로는 적절한 위치에 몇 개의 단어를 생성하는 것만으로도 이미지의 의미를 크게 높일 수 있습니다. 4o는 정확한 기호와 이미지를 결합하여 이미지 생성을 시각적 커뮤니케이션 툴로 만듭니다.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

멀티 턴 생성

이미지 생성이 GPT‑4o에 네이티브로 통합되어 자연스러운 대화를 통해 이미지를 점진적으로 개선할 수 있습니다. GPT‑4o는 채팅 컨텍스트의 이미지와 텍스트를 기반으로 작업하여 전반적인 일관성을 유지합니다. 예를 들어 비디오 게임 캐릭터를 디자인할 경우 반복적인 수정과 실험 과정에서도 캐릭터의 외형이 일관되게 유지됩니다.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

지침 준수

GPT‑4o의 이미지 생성은 세부 사항에 주의를 기울이며 상세한 프롬프트를 충실히 따릅니다. 다른 시스템이 약 5~8개의 객체 처리에 어려움을 겪는 반면 GPT‑4o는 10~20개의 다양한 객체를 처리할 수 있습니다. 객체와 그 특성 및 관계를 더 긴밀하게 연결함으로써 제어력이 향상됩니다.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

컨텍스트 내 학습

GPT‑4o는 사용자가 업로드한 이미지를 분석하고 학습하여 그 세부 정보를 컨텍스트에 자연스럽게 통합해 이미지 생성에 반영합니다.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

세계 지식

네이티브 이미지 생성은 텍스트와 이미지 간의 지식을 연결하여 더욱 똑똑하고 효율적인 모델 경험을 제공합니다.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

포토리얼리즘과 스타일

다양한 이미지 스타일을 반영한 학습을 통해 모델은 이미지를 설득력 있게 생성하거나 변형할 수 있습니다.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

제약사항

이 모델은 완벽하지 않습니다. 현재 여러 한계점을 인지하고 있으며 초기 출시 이후 모델 개선을 통해 이를 해결해 나갈 예정입니다.

GPT‑4o는 포스터와 같은 긴 이미지를 생성할 때 특히 하단부가 과도하게 잘리는 경우가 있습니다.

안전

모델 사양에 따라 게임 개발, 역사적 탐구, 교육과 같은 가치 있는 사용 사례를 지원함으로써 창의적 자유를 최대화하는 동시에 강력한 안전 기준을 유지하는 것을 목표로 합니다. 동시에 이러한 기준을 위반하는 요청을 차단하는 일은 여전히 매우 중요합니다. 아래는 안전하고 활용도 높은 콘텐츠를 제공하고 사용자의 폭넓은 창의적 표현을 지원하기 위해 현재 대응 중인 추가 위험 영역에 대한 평가입니다.

C2PA 및 내부 가역 검색을 통한 출처 확인
모든 생성된 이미지는 GPT‑4o에서 생성되었음을 식별할 수 있는 C2PA⁠ 메타데이터를 포함하여 투명성을 제공합니다. 또한 생성 결과의 기술적 속성을 활용해 해당 콘텐츠가 우리 모델에서 생성되었는지를 검증할 수 있는 내부 검색 툴을 구축했습니다.

유해 콘텐츠 차단
아동 성적 학대 자료나 성적 딥페이크와 같이 콘텐츠 정책을 위반할 수 있는 이미지 생성 요청은 계속해서 차단하고 있습니다. 실존 인물이 포함된 이미지의 경우 생성 가능한 이미지 유형에 대해 더 엄격한 제한을 적용하며, 특히 노출과 잔인한 폭력에 대해서는 매우 강력한 보호 장치를 두고 있습니다. 모든 출시와 마찬가지로 안전은 결코 완성되는 것이 아니라 지속적으로 투자해야 하는 영역입니다. 이 모델의 실제 사용 사례에 대해 더 많이 알게 될수록 정책도 그에 맞게 조정할 예정입니다.

자세한 내용은 GPT‑4o 시스템 카드의 이미지 생성 부록⁠을 참고하세요.

추론을 활용한 안전성 강화
당사의 숙고형 정렬⁠ 작업과 유사하게, 사람이 작성하고 해석 가능한 안전 사양을 직접 기반으로 작동하도록 추론 LLM을 학습시켰습니다. 개발 과정에서 이 추론 LLM을 활용해 정책 내의 모호한 부분을 식별하고 해결했습니다. 이러한 접근 방식은 ChatGPT와 Sora를 위해 개발된 기존 안전 기법 및 멀티모달 기술 발전과 결합되어 입력 텍스트와 출력 이미지 모두를 정책에 따라 조정⁠할 수 있게 합니다.

액세스 및 제공 범위

4o image generation rolls out starting today to Plus, Pro, Team, and Free users as the default image generator in ChatGPT, with access coming soon to Enterprise and Edu. It’s also available to use in Sora. For those who hold a special place in their hearts for DALL·E, it can still be accessed through a dedicated DALL·E GPT.

Developers will soon be able to generate images with GPT‑4o via the API, with access rolling out in the next few weeks.

Creating and customizing images is as simple as chatting using GPT‑4o - just describe what you need, including any specifics like aspect ratio, exact colors using hex codes, or a transparent background. Because this model creates more detailed pictures, images take longer to render, often up to one minute.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

라이브 스트리밍 다시 보기

작성자

OpenAI

리더십

Gabriel Goh: 이미지 생성

Jackie Shannon: ChatGPT 제품

Mengchao Zhong, Wayne Chang: ChatGPT 엔지니어링

Rohan Sahai: Sora 제품 및 엔지니어링

Brendan Quinn, Tomer Kaftan: 추론

Prafulla Dhariwal: 멀티모달 조직

리서치

기초 연구

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

핵심 연구

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

연구 기여자

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

모델 동작

Laurentia Romaniuk

멀티모달 조직

Andrew Gibiansky, Yang Lu

데이터

데이터 리드

Gildas Chabot, James Park Lennon

데이터

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

중재자

Hazel Byrne, Jennifer Luckenbill, Mariano López

휴먼 데이터 어드바이저

Long Ouyang

스케일링

추론 리드

Brendan Quinn, Tomer Kaftan

추론

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

적용됨

ChatGPT 제품 리드

Jackie Shannon

ChatGPT 엔지니어링 리드

Mengchao Zhong, Wayne Chang

제품 디자인 리드

Matt Chan

데이터 과학

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

Sora 제품 리드

Rohan Sahai, Wesam Manassra

Sora 제품 및 엔지니어링

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

안전

안전 리드

Somay Jain

안전

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

전략

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

마케팅 및 커뮤니케이션

커뮤니케이션 및 마케팅 리드

Minnia Feng, Natalie Summers, Taya Christianson

커뮤니케이션

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

디자인 및 크리에이티브

리드

Kendra Rimbach, Veit Moeller

디자인

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

도움을 주신 이들

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco