25. ožujka 2025.

Predstavljamo generiranje slika 4o

Otključavamo korisno i vrijedno generiranje slika s nativno multimodalnim modelom sposobnim za precizne, točne i fotorealistične izlaze.

Isprobajte u ChatGPT-ju

Učitavanje…

U OpenAI-ju dugo vjerujemo da bi generiranje slika trebalo biti primarna sposobnost naših jezičnih modela. Zato smo u GPT‑4o ugradili naš dosad najnapredniji generator slika. Rezultat – generiranje slika koje nije samo lijepo, već i korisno.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Korisno generiranje slika

Od prvih špiljskih crteža do modernih infografika, ljudi su koristili vizualne prikaze za komunikaciju, uvjeravanje i analizu – ne samo za ukrašavanje. Današnji generativni modeli mogu dočarati nadrealne, zapanjujuće scene, ali se bore s uobičajenim slikama koje ljudi upotrebljavaju za dijeljenje i davanje informacija. Od logotipa do dijagrama, slike mogu prenijeti precizno značenje kada se nadopune simbolima koji se odnose na zajednički jezik i iskustvo.

Generiranje slika pomoću GPT‑4o ističe se u preciznom prikazivanju teksta, točnom praćenju odzivnika i upotrebi inherentne baze znanja i konteksta čavrljanja GPT‑4o – uključujući transformaciju učitanih slika ili njihovu upotrebu kao vizualne inspiracije. Te mogućnosti olakšavaju točno stvaranje slike koju zamišljate, pomažući vam da učinkovitije komunicirate putem vizualnih elemenata i unaprjeđujući generiranje slika u praktičan, precizan i snažan alat.

Poboljšane mogućnosti

Obučili smo naše modele na zajedničkoj distribuciji mrežnih slika i teksta, učeći ne samo kako se slike odnose na jezik, već i kako se odnose jedna na drugu. U kombinaciji s agresivnom naknadnom obukom, rezultirajući model ima iznenađujuću vizualnu fluentnost, sposoban smisliti slike koje su korisne, dosljedne i svjesne konteksta.

Renderiranje teksta

Slika vrijedi tisuću riječi, ali ponekad smisliti nekoliko riječi na pravom mjestu može uzdići značenje slike. Sposobnost modela 4o u spajanju preciznih simbola sa slikama pretvara generiranje slika u alat za vizualnu komunikaciju.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Generiranje u više koraka

Budući da je generiranje slika sada nativno modelu GPT‑4o, možete poboljšati slike kroz prirodan razgovor. GPT‑4o može se nadovezati na slike i tekst u kontekstu čavrljanja, osiguravajući dosljednost tijekom cijelog procesa. Na primjer, u dizajnu lika za videoigru, izgled lika ostaje koherentan kroz više iteracija dok ga usavršavate i eksperimentirate.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Praćenje uputa

Generiranje slika u GPT‑4o slijedi detaljne odzivnike s pažnjom prema detaljima. Dok se drugi sustavi muče s ~ 5 – 8 objekata, GPT‑4o može obraditi do 10 – 20 različitih objekata. Čvršće povezivanje objekata s njihovim osobinama i odnosima omogućuje bolju kontrolu.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Učenje u kontekstu

GPT‑4o može analizirati i učiti iz slika koje su korisnici učitali, besprijekorno integrirajući njihove detalje u svoj kontekst kako bi informirao generiranje slika.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Svjetsko znanje

Izvorno generiranje slika omogućuje 4o povezivanje znanja između teksta i slika, što rezultira modelom koji djeluje pametnije i učinkovitije.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Fotorealizam i stil

Obuka na slikama koje odražavaju širok raspon stilova omogućuje modelu uvjerljivo stvaranje ili transformaciju slika.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Ograničenja

Naš model nije savršen. Svjesni smo višestrukih ograničenja u ovom trenutku koje ćemo nastojati riješiti poboljšanjima modela nakon početnog objavljivanja.

Primijetili smo da GPT‑4o povremeno može preusko obrezati duže slike, poput postera, osobito pri dnu.

Sigurnost

U skladu s našim specifikacijama modela, cilj nam je maksimirati kreativnu slobodu podržavajući vrijedne slučajeve upotrebe poput razvoja igara, povijesnog istraživanja i obrazovanja – održavajući visoke sigurnosne standarde. Istovremeno, ostaje jednako važno kao i uvijek blokirati zahtjeve koji krše te standarde. U nastavku su procjene dodatnih rizičnih područja na kojima radimo kako bismo omogućili siguran, visokokvalitetan sadržaj i podržali šire kreativno izražavanje za korisnike.

Podrijetlo putem C2PA i internog reverzibilnog pretraživanja
Sve smišljene slike dolaze s C2PA⁠ metapodacima, koji će identificirati sliku kao dolazeću iz GPT‑4o radi osiguranja transparentnosti. Izradili smo i interni alat za pretraživanje koji upotrebljava tehničke atribute generiranja i pomaže u provjeri valjanosti dolaska sadržaja iz našeg modela.

Blokiranje loših stvari
Nastavljamo blokirati zahtjeve za smišljenim slikama koje mogu prekršiti naš pravilnik o sadržaju, kao što su materijali o seksualnom zlostavljanju djece i deepfake seksualni materijali. Kada su slike stvarnih ljudi u kontekstu, imamo pojačana ograničenja u pogledu vrste slika koje se mogu stvoriti, s posebno snažnim zaštitnim mjerama oko golotinje i grafičkog nasilja. Kao i kod svakog lansiranja, sigurnost nikad nije potpuna i predstavlja područje kontinuiranog ulaganja. Kako saznajemo više o stvarnoj upotrebi ovog modela, prilagođavat ćemo svoja pravila.

Za više informacija o našem pristupu, posjetite dodatak za generiranje slika uz sustavnu karticu GPT‑4o⁠.

Upotreba rasuđivanja za pokretanje sigurnosti
Slično našem radu na proizvoljnom usklađivanju⁠, obučili smo LLM za prosuđivanje izravno iz sigurnosnih specifikacija koje su napisali ljudi i koje se mogu protumačiti. Tijekom razvoja koristili smo taj LLM model prosuđivanja kako bismo identificirali i riješili nejasnoće u našim pravilima. Uz naš multimodalni napredak i postojeće sigurnosne tehnike razvijene za modele ChatGPT i Sora, to nam omogućuje moderiranje⁠ upisnog teksta i izlaznih slika prema našim pravilima.

Pristup i dostupnost

Generiranje slika 4o uvodi se danas za korisnike modela Plus, Pro, Team i Free kao zadani generator slika u ChatGPT‑ju, a pristup će uskoro biti dostupan korisnicima modela Enterprise i Edu. Dostupno je i za upotrebu u modelu Sora. Za one koji imaju posebno mjesto u srcu za DALL·E, još uvijek mu se može pristupiti putem namjenskog DALL·E GPT‑ja.

Razvojni inženjeri uskoro će moći smisliti slike uz GPT‑4o putem API-ja, s pristupom koji će se uvesti u sljedećih nekoliko tjedana.

Stvaranje i prilagodba slika jednostavna je čavrljanje putem GPT‑4o – samo opišite što vam treba, uključujući specifičnosti poput omjera slike, točnih boja pomoću heksadecimalnih kodova ili prozirne pozadine. Budući da taj model stvara detaljnije slike, potrebno je više vremena za njihovo renderiranje, često i do jedne minute.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Repriza prijenosa uživo

Autor

OpenAI

Vodstvo

Gabriel Goh: generiranje slika

Jackie Shannon: proizvod za model ChatGPT

Mengchao Zhong, Wayne Chang: inženjerstvo za model ChatGPT

Rohan Sahai: proizvod i inženjerstvo za model Sora

Brendan Quinn, Tomer Kaftan: inferencija

Prafulla Dhariwal: multimodalna organizacija

Istraživanje

Osnovna istraživanja

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Temeljno istraživanje

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Istraživački suradnici

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Ponašanje modela

Laurentia Romaniuk

Multimodalna organizacija

Andrew Gibiansky, Yang Lu

Podaci

Voditelji podataka

Gildas Chabot, James Park Lennon

Podaci

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Moderatori

Hazel Byrne, Jennifer Luckenbill, Mariano López

Savjetnici za ljudske podatke

Long Ouyang

Skaliranje

Voditelji inferencije

Brendan Quinn, Tomer Kaftan

Inferencija

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

Primijenjeno

Voditeljica proizvoda modela ChatGPT

Jackie Shannon

Voditelji inženjeringa modela ChatGPT

Mengchao Zhong, Wayne Chang

Voditelj dizajna proizvoda

Matt Chan

Znanost o podacima

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

Voditelji proizvoda za model Sora

Rohan Sahai, Wesam Manassra

Proizvod i inženjering za model Sora

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

Sigurnost

Voditelj sigurnosti

Somay Jain

Sigurnost

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Strategija

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

Marketing i komunikacije

Voditelji komunikacija i marketinga

Minnia Feng, Natalie Summers, Taya Christianson

Komunikacije

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

Dizajn i kreativnost

Voditelji

Kendra Rimbach, Veit Moeller

Dizajn

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

Posebne zahvale

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco