25. mart 2025.

Predstavljamo 4o generisanje slika

Otključavanje korisnog i vrijednog generiranja slika s izvorno multimodalnim modelom sposobnim za precizne, tačne, fotorealistične izlaze.

Isprobajte u ChatGPT-u

Učitavanje…

U OpenAI-u, dugo smo vjerovali da generiranje slika treba biti primarna sposobnost naših jezičnih modela. Zato smo u GPT‑4o ugradili naš najnapredniji generator slika do sada. Rezultat—generisanje slika koje nisu samo lijepe, već i korisne.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Korisno generisanje slika

Od prvih pećinskih crteža do modernih infografika, ljudi su koristili vizualne slike za komunikaciju, uvjeravanje i analizu—ne samo za ukrašavanje. Današnji generativni modeli mogu dočarati nadrealne, zadivljujuće scene, ali se bore s osnovnim slikama koje ljudi koriste za dijeljenje i kreirati informacije. Od logotipa do dijagrama, slike mogu prenijeti precizno značenje kada se dopune simbolima koji se odnose na zajednički jezik i iskustvo.

GPT‑4o generisanje slika se ističe u preciznom prikazivanju teksta, tačnom praćenju upita i korištenju 4o-ove inherentne baze znanja i konteksta chata – uključujući transformaciju otpremljenih slika ili njihovo korištenje kao vizualne inspiracije. Ove mogućnosti olakšavaju kreirati upravo onu sliku koju zamislite, pomažući Vam da efikasnije komunicirate putem vizuala i unapređujući generisanje slika u praktičan alat s preciznošću i snagom.

Poboljšane mogućnosti

Obučili smo naše modele na zajedničkoj distribuciji online slika i teksta, učeći ne samo kako se slike odnose prema jeziku, već i kako se međusobno odnose. U kombinaciji s agresivnom post-obukom, rezultirajući model ima iznenađujuću vizualnu fluentnost, sposoban generirati slike koje su korisne, konzistentne i svjesne konteksta.

Prikazivanje teksta

Slika vrijedi hiljadu riječi, ali ponekad generirati nekoliko riječi na pravom mjestu može uzdići značenje slike. 4o-ova sposobnost da spoji precizne simbole sa slikama pretvara generisanje slika u alat za vizualnu komunikaciju.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Generacija sa više koraka

Budući da je generisanje slika sada izvorno u GPT‑4o, slike možete poboljšati kroz prirodan razgovor. GPT‑4o može graditi na slikama i tekstu u kontekstu chata, osiguravajući dosljednost kroz cijeli proces. Na primjer, ako dizajnirate karakter iz videoigre, izgled karaktera ostaje dosljedan kroz više iteracija dok usavršavate i eksperimentirate.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Praćenje uputstava

Generiranje slika GPT‑4o slijedi detaljne upite s pažnjom na detalje. Dok drugi sistemi imaju problema sa ~5-8 objekata, GPT‑4o može obraditi do 10-20 različitih objekata. Čvršće povezivanje objekata s njihovim osobinama i odnosima omogućava bolju kontrolu.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Učenje u kontekstu

GPT‑4o može analizirati i učiti iz slika koje su otpremili korisnici, neprimjetno integrirajući njihove detalje u svoj kontekst kako bi informirao generisanje slika.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Svjetsko znanje

Izvorno generiranje slika omogućava 4o da uspostavi poveznicu između svog znanja iz teksta i slika, što rezultira modelom koji djeluje pametnije i efikasnije.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Fotorealizam i stil

Obuka na slikama koje odražavaju širok spektar stilova omogućava modelu da uvjerljivo kreira ili transformiše slike.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Ograničenja

Naš model nije savršen. Svjesni smo višestrukih ograničenja u ovom trenutku koje ćemo nastojati riješiti kroz poboljšanja modela nakon početnog lansiranja.

Primijetili smo da GPT‑4o povremeno može preusko izrezati duže slike, poput postera, naročito pri dnu.

Sigurnost

U skladu s našim specifikacijama modela, cilj nam je maksimizirati kreativnu slobodu podržavajući vrijedne slučajeve upotrebe poput razvoja igara, istraživanja historije i obrazovanja—dok održavamo visoke sigurnosne standarde. Istovremeno, ostaje jednako važno kao i uvijek blokirati zahtjeve koji prekrše te standarde. U nastavku su procjene dodatnih rizičnih područja u kojima radimo na omogućavanju sigurnog, visoko korisnog sadržaja i podršci širem kreativnom izražavanju za korisnike.

Porijeklo putem C2PA i interne reverzibilne pretrage
Sve generirane slike dolaze s C2PA⁠ metadata, koji će identificirati sliku kao da dolazi iz GPT‑4o, radi osiguranja transparentnosti. Također smo izgradili interni alat za pretraživanje koji koristi tehničke atribute generacija kako bi pomogao potvrditi da li sadržaj potiče iz našeg modela.

Blokiranje loših stvari
Nastavljamo blokirati zahtjeve za generirane slike koje mogu prekršiti naše politike sadržaja, kao što su materijali o seksualnom zlostavljanju djece i seksualni deepfakeovi. Kada su slike stvarnih ljudi u kontekstu, imamo pojačana ograničenja u pogledu vrste slika koje se mogu kreirati, s posebno snažnim zaštitnim mjerama oko golotinje i grafičkog nasilja. Kao i kod svakog lansiranja, sigurnost nikad nije završena i predstavlja područje stalnog ulaganja. Kako budemo saznavali više o stvarnoj upotrebi ovog modela, prilagodit ćemo naše politike u skladu s tim.

Za više informacija o našem pristupu, posjetite dodatak o generiranju slika za sistemsku karticu GPT‑4o⁠.

Korištenje rezonovanja za jačanje sigurnosti
Slično našem radu na promišljenom usklađivanju⁠, obučili smo LLM za rezonovanje da radi direktno na osnovu sigurnosnih specifikacija koje su napisali ljudi i koje je moguće interpretirati. Tokom razvoja koristili smo ovaj LLM model rezonovanja kako bismo identificirali i riješili nejasnoće u našim politikama. Zajedno s našim multimodalnim napretkom i postojećim sigurnosnim tehnikama razvijenim za ChatGPT i Sora, ovo nam omogućava da umjereno⁠ i unos teksta i izlazne slike u skladu s našim pravilima.

Pristup i dostupnost

4o generiranje slika počinje danas za korisnike Plus, Pro, Team i Free kao predodređeni generator slika u ChatGPT‑u, a uskoro će biti dostupan i korisnicima Enterprise i Edu. Također je dostupan za korištenje u Sora. Za one koji imaju posebno mjesto u svojim srcima za DALL·E, još uvijek mu se može pristupiti putem namjenskog DALL·E GPT‑a.

Programeri će uskoro moći generirati slike pomoću GPT‑4o putem API-ja, a pristup će biti dostupan u sljedećih nekoliko sedmica.

Kreirati i prilagođavati slike je jednostavno kao chat pomoću GPT‑4o - samo opišite što vam je potrebno, uključujući sve specifičnosti poput omjera stranica, tačnih boja pomoću heksadecimalnih kodova ili prozirne pozadine. Budući da ovaj model kreira detaljnije slike, slikama je potrebno više vremena za renderiranje, često i do jedne minute.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Repriza prijenosa uživo

Autor

OpenAI

Liderstvo

Gabriel Goh: Generisanje slika

Jackie Shannon: ChatGPT proizvod

Mengchao Zhong, Wayne Chang: ChatGPT Inženjering

Rohan Sahai: Sora - proizvod i inženjering

Brendan Quinn, Tomer Kaftan: Inferenca

Prafulla Dhariwal: Multimodalna organizacija

Istraživanje

Temeljna istraživanja

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Osnovna istraživanja

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Istraživački saradnici

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Ponašanje modela

Laurentia Romaniuk

Višemodalna organizacija

Andrew Gibiansky, Yang Lu

Podaci

Voditelji za podatke

Gildas Chabot, James Park Lennon

Podaci

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Moderatori

Hazel Byrne, Jennifer Luckenbill, Mariano López

Savjetnici za ljudske podatke

Long Ouyang

Širenje

Vodeći za inferencu

Brendan Quinn, Tomer Kaftan

Inferenca

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

Primijenjeno

ChatGPT voditelj proizvoda

Jackie Shannon

ChatGPT inženjerski voditelji

Mengchao Zhong, Wayne Chang

Voditelj dizajna proizvoda

Matt Chan

Nauka o podacima

Xiaolin Hao

ChatGPT

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

Voditelji za Sora proizvod

Rohan Sahai, Wesam Manassra

Sora - proizvod i inženjering

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

Sigurnost

Voditelj sigurnosti

Somay Jain

Sigurnost

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Strategija

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

Marketingi& komunikacije

Voditelji komunikacija i marketinga

Minnia Feng, Natalie Summers, Taya Christianson

Komunikacije

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

Dizajn i kreativnost

Voditelji

Kendra Rimbach, Veit Moeller

Dizajn

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

Posebna zahvala

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco

Predstavljamo 4o generisanje slika

Korisno generisanje slika

Poboljšane mogućnosti

Prikazivanje teksta

Generacija sa više koraka

Praćenje uputstava

Učenje u kontekstu

Svjetsko znanje

HTML

Fotorealizam i stil

Ograničenja

Sigurnost

Pristup i dostupnost

Repriza prijenosa uživo

Autor

Liderstvo

Istraživanje

Podaci

Širenje

Primijenjeno

Sora

Sigurnost

Strategija

Marketingi&amp; komunikacije

Dizajn i kreativnost

Posebna zahvala

Marketingi& komunikacije