25. март 2025.

Представљамо 4o генерисање слика

Откључавање корисног и вредног генерисања слика уз изворно мултимодални модел способан за прецизне, тачне и фотореалистичне резултате.

Испробајте у ChatGPT-у

Учитавање…

У OpenAI-у већ дуго верујемо да генерисање слика треба да буде примарна могућност наших језичких модела. Зато смо наш најнапреднији генератор слика до сада уградили у GPT‑4o. Резултат — генерисање слика које није само лепо, већ и корисно.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Корисно генерисање слика

Од првих пећинских слика до савремених инфографика, људи користе визуелне приказе да комуницирају, убеђују и анализирају — не само да украшавају. Данашњи генеративни модели могу да дочарају надреалне, задивљујуће сцене, али имају потешкоћа са практичним сликама које људи користе за размену и стварање информација. Од логотипа до дијаграма, слике могу пренети прецизно значење када су допуњене симболима који упућују на заједнички језик и искуство.

GPT‑4o генерисање слика истиче се по тачном приказу текста, прецизном праћењу инструкција и коришћењу 4o-ове урођене базе знања и контекста ћаскања — укључујући трансформацију отпремљених слика или њихово коришћење као визуелне инспирације. Ове могућности олакшавају креирање управо оне слике коју замишљате, помажући вам да ефикасније комуницирате кроз визуеле и унапређујући генерисање слика у практичан алат са прецизношћу и снагом.

Побољшане могућности

Наше моделе смо обучавали на заједничкој расподели слика и текста са интернета, учећи не само како су слике повезане са језиком, већ и како су повезане међусобно. У комбинацији са интензивним дообучавањем, добијени модел има изненађујућу визуелну течност и може да генерише слике које су корисне, доследне и свесне контекста.

Приказ текста

Слика вреди хиљаду речи, али понекад генерисање неколико речи на правом месту може да подигне значење слике. Способност 4o да спаја прецизне симболе са сликама претвара генерисање слика у алат за визуелну комуникацију.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Вишекорачко генерисање

Пошто је генерисање слика сада изворно у GPT‑4o, можете да дорађујете слике кроз природан разговор. GPT‑4o може да надограђује слике и текст у контексту ћаскања, обезбеђујући доследност током целог процеса. На пример, ако дизајнирате лик за видео-игру, изглед лика остаје усклађен кроз више итерација док га дорађујете и експериментишете.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Праћење инструкција

Генерисање слика у GPT‑4o прати детаљне инструкције уз пажњу према детаљима. Док други системи имају потешкоћа са око 5–8 објеката, GPT‑4o може да обради до 10–20 различитих објеката. Чвршће повезивање објеката са њиховим особинама и односима омогућава бољу контролу.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Учење у контексту

GPT‑4o може да анализира и учи из слика које корисници отпреме, неприметно укључујући њихове детаље у свој контекст како би усмерио генерисање слика.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Знање о свету

Изворно генерисање слика омогућава 4o да повезује своје знање између текста и слика, што резултира моделом који делује паметније и ефикасније.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Фотореализам и стил

Обука на сликама које одражавају огромну разноликост стилова слика омогућава моделу да уверљиво креира или трансформише слике.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Ограничења

Наш модел није савршен. Свесни смо више ограничења у овом тренутку, на чијем ћемо отклањању радити кроз побољшања модела након почетног лансирања.

Приметили смо да GPT‑4o повремено може сувише тесно да исече дуже слике, попут постера, нарочито при дну.

Безбедност

У складу са нашим спецификацијама модела, настојимо да максимално проширимо креативну слободу подржавањем вредних случајева употребе као што су развој игара, историјска истраживања и образовање — уз одржавање високих безбедносних стандарда. Истовремено, и даље је подједнако важно блокирати захтеве који крше те стандарде. У наставку су процене додатних области ризика на којима радимо како бисмо омогућили безбедан, високо користан садржај и подржали шире креативно изражавање корисника.

Порекло путем C2PA и интерне реверзибилне претраге
Све генерисане слике долазе са C2PA⁠ метаподацима, који ће означити слику као насталу из GPT‑4o, ради транспарентности. Направили смо и интерни алат за претрагу који користи техничке атрибуте генерација како би помогао да се провери да ли садржај потиче од нашег модела.

Блокирање лоших ствари
Настављамо да блокирамо захтеве за генерисане слике који могу кршити наше политике садржаја, као што су материјали о сексуалном злостављању деце и сексуални дипфејкови. Када су слике стварних људи у контексту, имамо пооштрена ограничења у погледу тога каква врста приказа може бити креирана, уз нарочито снажне заштитне мере око нагости и експлицитног насиља. Као и код сваког лансирања, безбедност никада није завршен посао, већ стална област улагања. Како будемо сазнавали више о стварној употреби овог модела, у складу с тим ћемо прилагођавати и наше политике.

Више о нашем приступу можете прочитати у документу о генерисању слика додатак GPT‑4o системској картици⁠.

Коришћење резоновања за унапређење безбедности
Слично нашем раду на пажљивом поравнању⁠, обучили смо велики језички модел (LLM) за резоновање да ради директно на основу безбедносних спецификација које су написали људи и које су разумљиве. Овај LLM за резоновање користили смо током развоја како би нам помогао да уочимо и разрешимо нејасноће у нашим политикама. Заједно са нашим мултимодалним напретком и постојећим безбедносним техникама развијеним за ChatGPT и Sora, то нам омогућава да модерирамо⁠ и улазни текст и излазне слике у складу са нашим политикама.

Приступ и доступност

Генерисање слика у 4o почиње да се уводи од данас за кориснике Plus, Pro, Team и Free као подразумевани генератор слика у ChatGPT‑у, а приступ ускоро стиже и за Enterprise и Edu. Доступно је и за коришћење у Sora. За оне којима DALL·E заузима посебно место у срцу, и даље му се може приступити преко наменског DALL·E GPT‑а.

Програмери ће ускоро моћи да генеришу слике помоћу GPT‑4o преко API-ја, а приступ ће се уводити током наредних неколико недеља.

Прављење и прилагођавање слика једноставно је као ћаскање уз GPT‑4o — само опишите шта вам треба, укључујући детаље као што су однос страница, тачне боје помоћу хекс-кодова или провидна позадина. Пошто овај модел прави детаљније слике, за приказивање је потребно више времена, често и до једног минута.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Снимак преноса уживо

Аутор

OpenAI

Руководство

Gabriel Goh: Генерисање слика

Jackie Shannon: ChatGPT производ

Mengchao Zhong, Wayne Chang: ChatGPT инжењеринг

Rohan Sahai: Sora производ и инжењеринг

Brendan Quinn, Tomer Kaftan: Инференција

Prafulla Dhariwal: Мултимодална организација

Истраживање

Основно истраживање

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Језгро истраживања

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Истраживачки сарадници

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Понашање модела

Laurentia Romaniuk

Мултимодална организација

Andrew Gibiansky, Yang Lu

Подаци

Водећи за податке

Gildas Chabot, James Park Lennon

Подаци

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Модератори

Hazel Byrne, Jennifer Luckenbill, Mariano López

Саветници за људске податке

Long Ouyang

Скалирање

Водећи за инференцију

Brendan Quinn, Tomer Kaftan

Инференција

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

Примењено

Водећи за ChatGPT производ

Jackie Shannon

Водећи за ChatGPT инжењеринг

Mengchao Zhong, Wayne Chang

Водећи за дизајн производа

Matt Chan

Наука о подацима

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

Водећи за Sora производ

Rohan Sahai, Wesam Manassra

Sora производ и инжењеринг

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

Безбедност

Водећи за безбедност

Somay Jain

Безбедност

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Стратегија

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

Маркетинг и комуникације

Водећи за комуникације и маркетинг

Minnia Feng, Natalie Summers, Taya Christianson

Комуникације

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

Дизајн и креатива

Водећи

Kendra Rimbach, Veit Moeller

Дизајн

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

Посебна захвалност

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco