2025年3月25日

4o Image Generation が登場

精密で正確、写真さながらの出力の能力がネイティブ組み込みのマルチモーダルモデルを使用して、有用かつ価値のある画像生成を実現しましょう。

読み込んでいます...

OpenAI の長年の信念は、画像生成を当社言語モデルの第一の能力とするべきだということです。それが、GPT‑4o に現時点で最も高度な画像ジェネレーターを統合した理由です。その成果として、画像生成は美しいものを生み出すだけではなく、有用なものとなりました。

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

有用な画像生成

原初の洞窟壁画から現代のインフォグラフィックまで、人類は視覚イメージを単なる装飾のためだけでなく、コミュニケーション、説得、分析のために使用してきました。現在の生成モデルは超現実的で息をのむようなシーンを生み出すことができますが、情報の共有や作成のために使用する有用性のある画像の生成は得意ではありません。ロゴや図表のように、共通の言葉や経験を表すシンボルが加えられた画像は、意図することを正確に伝えることができます。

GPT‑4o の画像生成は、テキストの正確なレンダリング、プロンプトへの正確な追従、および4o 固有のナレッジベースとチャットコンテキストの活用（アップロードされた画像の変換や視覚的インスピレーションとしての使用など）に優れています。これらの機能は、思い描いた通りのイメージ作成を容易にし、ビジュアルを介したコミュニケーションをより効果なものとすることに役立ち、画像生成を精度とパワーを備えた実用的ツールに進化させます。

機能の向上

本モデルの学習は、オンライン上の画像とテキストの同時分布に基づいて行い、画像と言語の関連だけではなく、画像同士の関連についても学習させました。積極的な事後学習と組み合わせた結果、モデルの視覚的流暢性は驚くほどのものとなり、有用で、一貫性があり、コンテキストを認識した画像を生成できるようになりました。

テキストレンダリング

「A picture is worth a thousand words（1枚の絵は1000もの言葉に値する）」という英語のことわざがありますが、適切な場所にいくつかの言葉を加えることで、画像の意味を高めることができる場合もあります。4o は正確なシンボルとイメージを融合できるため、画像生成が視覚的コミュニケーションのツールとなります。

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

マルチターン生成

現在、画像生成機能は GPT‑4o にネイティブに組み込まれているため、自然な会話で画像の改良ができます。GPT‑4o は、チャットのコンテキスト内の画像とテキストに基づいて構築ができるため、全体の一貫性が確保されます。例えばビデオゲームのキャラクターをデザインする場合、複数のイテレーションにわたる改良や実験においても、キャラクターの見た目の一貫性が保たれます。

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

指示追従

GPT‑4o の画像生成は、詳細なプロンプトに従い、細部にまで注意を払って行われます。他のシステムは5～8個ほどのオブジェクトの処理にも苦労しますが、GPT‑4o は最大10～20個の異なるオブジェクトの処理ができます。オブジェクトを特性や関係性とより緊密に結び付けると、より適切な制御が可能になります。

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here’s the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

インコンテキスト学習

GPT‑4o は、ユーザーがアップロードした画像を分析・学習し、その情報をシームレスにコンテキストに統合して画像生成に使うことができます。

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

世界の知識

ネイティブ組み込みの画像生成機能により、4o は知識をテキストと画像間でリンクできるようになり、よりスマートかつ効率的なモデルとなりました。

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

フォトリアリズムとスタイル

多種多様な画像スタイルが反映された画像で学習することで、本モデルは説得力のある画像の作成または変換が可能となりました。

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

制限事項

本モデルは完璧なものではありません。現時点では複数の制限があることを認識しており、最初のリリース後にモデルの改善を行っていくことで、これらの制限に対処するべく取り組んでいきます。

GPT‑4o は、ポスターなどの長い画像（特にその一番下の部分）が、デザイン的に必要な余白を持たないクロップとなる場合があることが分かっています。

安全性

当社のModel Specに基づき、ゲーム開発、歴史探索、教育といった価値あるユースケースをサポートすることで、強力な安全基準を維持しつつ、創造の自由を最大限に高めることを目指しています。同時に、この安全基準に違反するリクエストをブロックすることの重視も変わらず続けていきます。コンテンツを安全で有用性の高いものとし、かつユーザーの幅広い創造的表現をサポートするために当社が取り組んでいるその他のリスク領域の評価法を以下に記載します。

C2PA と内部可逆検索によるデータ出自確認
生成されたすべての画像には C2PA メタデータが付与され、画像が GPT‑4o 生成のものであることが特定され、透明性を提供します。また、コンテンツが当社モデルから生成されたものかどうかを確認できるように、生成の技術的特性を用いる内部検索ツールも構築しました。

不適切なコンテンツのブロック
当社は、児童の性的虐待や性的ディープフェイクの画像など、当社のコンテンツポリシーに違反する可能性のある画像生成のリクエストをブロックすることを継続します。実在の人物の画像が関わる場合、作成可能な画像に対する制限が強められ、特にヌードや暴力描写については厳重な保護措置が講じられています。いかなるリリースの場合でもそうですが、安全性確保は終わりのあるものではなく、継続的投資の対象です。本モデルの実際の使用状況がより明らかになるにつれ、それに応じてポリシーを調整します。

当社アプローチの詳細は、「GPT‑4o System Card 追記⁠：4o Image Generation」をご覧ください。

リーズニングを活用した安全性強化
当社の熟慮的アライメント⁠アプローチと同様に、人間が記述した解釈可能な安全性の仕様から直接作業できるようにリーズニング LLM を学習させています。開発中には、このリーズニング LLM を使用してポリシーにおける曖昧な点を特定して、それに対処しました。当社のマルチモーダルの進歩および ChatGPT と Sora 向けに開発された既存の安全技術を組み合わせることで、入力テキストと出力画像の両方を当社のポリシーに照らし合わせるモデレート⁠が可能となりました。

アクセスと可用性

ChatGPT デフォルトの画像ジェネレーターとしての4o Image Generation の展開は、Plus、Pro、Team プランおよび無料プランのユーザーの方を対象に本日開始されます。Enterprise および Edu プランご利用の方にも、近日中にアクセス可能となります。また、Sora においてもご利用いただけます。DALL·E をご愛用いただき、ご利用の継続を希望される方には、専用の DALL·E GPT からのアクセスが可能となっております。

開発者の方には、まもなく API 経由の GPT‑4o での画像生成が可能となり、アクセス展開は今後数週間の内に行われます。

画像の作成とカスタマイズは、GPT‑4o とのチャットと同じくらい簡単で、アスペクト比、色（16進数コードで正確に指定）、背景透過などの必要なことを説明するだけで行えます。本モデルは、より詳細な画像を作成するため、画像のレンダリングに時間がかかり、多くの場合で1分程度となります。

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

ライブ配信の再生

著者

OpenAI

リーダーシップ

Gabriel Goh：画像生成

Jackie Shannon：ChatGPT 製品

Mengchao Zhong、Wayne Chang：ChatGPT エンジニアリング

Rohan Sahai：Sora 製品・エンジニアリング

Brendan Quinn、Tomer Kaftan：推論

Prafulla Dhariwal：マルチモーダル組織

研究

基礎研究

Allan Jabri、David Medina、Gabriel Goh、Kenji Hata、Lu Liu、Prafulla Dhariwal

中核研究

Aditya Ramesh、Alex Nichol、Casey Chu、Cheng Lu、Dian Ang Yap、Heewoo Jun、James Betker、Jianfeng Wang、Long Ouyang、Li Jing、Wesam Manassra

研究協力者

Aiden Low、Brandon McKinzie、Charlie Nash、Huiwen Chang、Ishaan Gulrajani、Jamie Kiros、Ji Lin、Kshitij Gupta、Yang Song

モデル動作

Laurentia Romaniuk

マルチモーダル組織

Andrew Gibiansky、Yang Lu

データ

データ責任者

Gildas Chabot、James Park Lennon

データ

Arshi Bhatnagar、Dragos Oprica、Rohan Kshirsagar、Spencer Papay、Szi-chieh Yu、Wesam Manassra、Yilei Qian

モデレーター

Hazel Byrne、Jennifer Luckenbill、Mariano López

人間データアドバイザー

Long Ouyang

スケーリング

推論責任者

Brendan Quinn、Tomer Kaftan

推論

Alyssa Huang、Jacob Menick、Nick Stathas、Ruslan Vasilev、Stanley Hsieh

アプライ

ChatGPT 製品責任者

Jackie Shannon

ChatGPT エンジニアリング責任者

Mengchao Zhong、Wayne Chang

製品デザイン責任者

Matt Chan

データサイエンス

Xiaolin Hao

ChatGPT

Andrew Sima、Annie Cheng、Benjamin Goh、Boyang Niu、Dian Ang Yap、Duc Tran、Edede Oiwoh、Eric Zhang、Ethan Chang、Jeffrey Dunham、Jay Chen、Kan Wu、Karen Li、Kelly Stirman、Mengyuan Xu、Michelle Qin、Ola Okelola、Pedro Aguilar、Rocky Smith、Rohit Ramchandani、Sara Culver、Sean Fitzgerald、Vlad Fomenko、Wanning Jiang、Wesam Manassra、Xiaolin Hao、Yilei Qian

Sora

Sora 製品責任者

Rohan Sahai、Wesam Manassra

Sora 製品・エンジニアリング

Boyang Niu、David Schnurr、Gilman Tolle、Joe Taylor、Joey Flynn、Mike Starr、Rajeev Nayak、Rohan Sahai、Wesam Manassra

安全性

安全性責任者

Somay Jain

安全性

Alex Beutel、Andrea Vallone、Botao Hao、Brendan Quinn、Cameron Raymond、Chong Zhang、David Robinson、Eric Wallace、Filippo Raso、Huiwen Chang、Ian Kivlichan、Irina Kofman、Keren Gu-Lemberg、Kristen Ying、Madelaine Boyd、Meghan Shah、Michael Lampe、Owen Campbell-Moore、Rohan Sahai、Rodrigo Riaza Perez、Sam Toizer、Sandhini Agarwal、Troy Peterson

戦略

Adam Cohen、Adam Wells、Ally Bennett、Ashley Pantuliano、Carolina Paz、Claudia Fischer、Declan Grabb、Gaby Sacramone-Lutz、Lauren Jonas、Ryan Beiermeister、Shiao Lee、Tom Stasi、Tyce Walters、Ziad Reslan、Zoe Stoll

マーケティング＆コミュニケーション

コミュニケーション・マーケティング責任者

Minnia Feng、Natalie Summers、Taya Christianson

コミュニケーション

Alex Baker-Whitcomb、Ashley Tyra、Bailey Richardson、Gaby Raila、Marselus Cayton、Scott Ethersmith、Souki Mansoor

デザイン&クリエイティブ

責任者

Kendra Rimbach、Veit Moeller

デザイン

Adam Brandon、Adam Koppel、Angela Baek、Cary Hudson、Dana Palmie、Freddie Sulit、Jeffrey Sabin Matsumoto、Leyan Lo、Matt Nichols、Thomas Degry、Vanessa Antonia Schefke、Yara Khakbaz

謝辞

Aditya Ramesh、Aidan Clark、Alex Beutel、Ben Newhouse、Ben Rossen、Che Chang、Greg Brockman、Hannah Wong、Ishaan Singal、Jason Kwon、Jiacheng Feng、Jiahui Yu、Joanne Jang、Johannes Heidecke、Kevin Weil、Mark Chen、Mia Glaese、Nick Turley、Raul Puri、Reiichiro Nakano、Rui Shu、Sam Altman、Shuchao Bi、Vinnie Monaco

4o Image Generation が登場

有用な画像生成

機能の向上

テキストレンダリング

マルチターン生成

指示追従

インコンテキスト学習

世界の知識

HTML

フォトリアリズムとスタイル

制限事項

安全性

アクセスと可用性

ライブ配信の再生

著者

リーダーシップ

研究

データ

スケーリング

アプライ

Sora

安全性

戦略

マーケティング＆コミュニケーション

デザイン&amp;クリエイティブ

謝辞

デザイン&クリエイティブ