2025 m. kovo 25 d.

Pristatome „4o“ vaizdų generavimą

Atveriamos naudingo ir vertingo vaizdų generavimo galimybės naudojant interguotąjį multimodalinį modelį, galintį pateikti tikslius, preciziškus, fotorealistinius vaizdus.

Išbandykite programoje „ChatGPT“

Įkeliama...

Mes, „OpenAI“, jau seniai tikime, kad vaizdų generavimas turėtų būti pagrindinė mūsų kalbos modelių galimybė. Štai kodėl į „GPT‑4o“ integravome pažangiausią iki šiol sukurtą vaizdų generatorių. Rezultatas – vaizdų generavimas, kuris yra ne tik estetiškas, bet ir naudingas.

A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The text reads:

(left)
"Transfer between Modalities:

Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.

Pros:
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack

Cons:
* varying bit-rate across modalities
* compute not adaptive"

(Right)
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"

On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

^{Best of 8}

selfie view of the photographer, as she turns around to high five him

^{Best of 8}

Naudingas vaizdų generavimas

Nuo pirmųjų urvų piešinių iki šiuolaikinių infografikų žmonės naudojo vaizdinius ne tik puošti, bet ir bendrauti, įtikinti ir analizuoti. Šiandieniai generatyvieji modeliai gali sukurti siurrealias, kvapą gniaužiančias scenas, tačiau jiems sunkiau sekasi kurti darbinius vaizdus, kuriuos žmonės naudoja informacijai dalytis ir kurti. Nuo logotipų iki diagramų – vaizdai gali perteikti tikslią prasmę, kai yra papildyti simboliais, nurodančiais bendrą kalbą ir patirtį.

„GPT‑4o“ vaizdų generavimas pasižymi tiksliu teksto atvaizdavimu, precizišku raginimų laikymusi ir „4o“ žinių bazės bei pokalbio konteksto panaudojimu, įskaitant įkeltų vaizdų transformavimą arba jų naudojimą kaip vizualų įkvėpimo šaltinį. Šios galimybės leidžia lengviau sukurti tiksliai tokį vaizdą, kokį įsivaizduojate, padeda efektyviau bendrauti pasitelkiant vaizdines priemones ir paverčia vaizdų generavimą praktišku, tiksliu bei galingu įrankiu.

Patobulintos galimybės

Mokėme savo modelius naudodami bendrą internetinių vaizdų ir teksto rinkinį, kad jie suprastų ne tik ryšį tarp vaizdų ir kalbos, bet ir tarpusavio ryšius tarp pačių vaizdų. Po intensyvaus papildomo derinimo gautas modelis pasižymi stebėtinu vizualiniu sklandumu ir gali generuoti naudingus, nuoseklius bei kontekstą atitinkančius vaizdus.

Teksto atvaizdavimas

Vaizdas vertas tūkstančio žodžių, tačiau kartais tinkamoje vietoje sugeneruoti keli žodžiai gali sustiprinti vaizdo prasmę. „4o“ gebėjimas derinti tikslius simbolius su vaizdiniais paverčia vaizdų generavimą vizualinės komunikacijos įrankiu.

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)
Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

^{Best of ~8}

Generavimas keliais etapais

Kadangi vaizdų generavimas dabar yra integruotas į „GPT‑4o“, vaizdus galite tobulinti natūralaus pokalbio metu. „GPT‑4o“ gali remtis vaizdais ir tekstu pokalbio kontekste, užtikrindamas nuoseklumą. Pavyzdžiui, jei kuriate vaizdo žaidimo personažą, jo išvaizda išlieka nuosekli skirtingose iteracijose, kurias kuriate tobulindami ir eksperimentuodami.

Give this cat a detective hat and a monocle

^{Best of 1}

turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

^{Best of 1}

update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

^{Best of 2}

create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

^{Best of 8}

credit creator: Manuel Sainsily

Instrukcijų laikymasis

„GPT‑4o“ vaizdų generavimo funkcija kruopščiai laikosi raginimų ir atkreipia dėmesį į detales. Kitoms sistemoms sunkiai sekasi apdoroti 5–8 objektus, o „GPT‑4o“ gali susitvarkyti su 10–20 skirtingų objektų. Glaudesnis objektų susiejimas su jų savybėmis ir ryšiais leidžia geriau juos valdyti.

A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

^{Best of 5}

Mokymasis kontekste

„GPT‑4o“ gali analizuoti naudotojų įkeltus vaizdus ir mokytis iš jų, sklandžiai integruodamas jų detales į savo kontekstą, kad pagrįstų vaizdų generavimą.

draw a design for a vehicle with triangular wheels, using these images as reference.
label the front wheel, the back wheel, and at the of the diagram say (in small caps)
TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

^{Best of ~16}

now put this in a photo taken in new york city.

^{Best of ~16}

Pasaulio žinios

Dėl organiško vaizdų generavimo „4o“ geba susieti teksto ir vaizdų žinias, todėl atrodo protingesnis ir efektyvesnis.

Code Example (Three.js)

HTML

1<!DOCTYPE html>
2<html lang="en">
3  <head>
4    <meta charset="UTF-8" />
5    <title>OpenAI Banner</title>
6    <style>
7      body { margin: 0; overflow: hidden; }
8      canvas { display: block; }
9    </style>
10  </head>
11  <body>
12    <script type="module">
13      import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js';
14      import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js';
15      import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js';
16      import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js';
17
18      const scene = new THREE.Scene();
19      const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000);
20      const renderer = new THREE.WebGLRenderer({ antialias: true });
21      renderer.setSize(window.innerWidth, window.innerHeight);
22      document.body.appendChild(renderer.domElement);
23
24      // Lighting
25      const light = new THREE.AmbientLight(0xffffff, 1);
26      scene.add(light);
27
28      const dirLight = new THREE.DirectionalLight(0xffffff, 1);
29      dirLight.position.set(0, 5, 10);
30      scene.add(dirLight);
31
32      // Camera position
33      camera.position.z = 20;
34
35      // Controls
36      const controls = new OrbitControls(camera, renderer.domElement);
37
38      // Banner background
39      const bannerGeometry = new THREE.PlaneGeometry(20, 10);
40      const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a });
41      const banner = new THREE.Mesh(bannerGeometry, bannerMaterial);
42      scene.add(banner);
43
44      // OpenAI Logo texture (placeholder)
45      const loader = new THREE.TextureLoader();
46      loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => {
47        const logoGeometry = new THREE.PlaneGeometry(4, 4);
48        const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true });
49        const logo = new THREE.Mesh(logoGeometry, logoMaterial);
50        logo.position.set(-5, 0, 0.1); // Slightly in front of the banner
51        scene.add(logo);
52      });
53
54      // Load font and add text
55      const fontLoader = new FontLoader();
56      fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => {
57        const textGeometry = new TextGeometry("I am 4-o", {
58          font: font,
59          size: 1,
60          height: 0.2,
61          curveSegments: 12,
62          bevelEnabled: true,
63          bevelThickness: 0.02,
64          bevelSize: 0.02,
65          bevelOffset: 0,
66          bevelSegments: 5
67        });
68
69        textGeometry.center();
70
71        const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc });
72        const textMesh = new THREE.Mesh(textGeometry, textMaterial);
73        textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo
74        scene.add(textMesh);
75      });
76
77      // Resize handler
78      window.addEventListener('resize', () => {
79        camera.aspect = window.innerWidth / window.innerHeight;
80        camera.updateProjectionMatrix();
81        renderer.setSize(window.innerWidth, window.innerHeight);
82      });
83
84      // Render loop
85      function animate() {
86        requestAnimationFrame(animate);
87        controls.update();
88        renderer.render(scene, camera);
89      }
90
91      animate();
92    </script>
93  </body>
94</html>

make an image of what this means to you

Fotorealizmas ir stilius

Mokydamasis iš vaizdų, atspindinčių daugybę įvairių stilių, modelis gali įtikinamai kurti arba transformuoti vaizdus.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water — A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Apribojimai

Mūsų modelis nėra tobulas. Žinome daug jo apribojimų, kuriuos stengsimės pašalinti tobulindami modelį po pirminio paleidimo.

Pastebėjome, kad „GPT‑4o“ kartais gali per daug apkirpti ilgesnius vaizdus, pavyzdžiui, plakatus, ypač apačioje.

Sauga

Vadovaudamiesi savo modelio specifikacija, siekiame maksimaliai padidinti kūrybinę laisvę palaikydami vertingus naudojimo atvejus, tokius kaip žaidimų kūrimas, istoriniai tyrinėjimai ir švietimas, kartu išlaikydami griežtus saugos standartus. Taip pat kaip niekad svarbu blokuoti užklausas, kurios pažeidžia šiuos standartus. Toliau pateikiami papildomų rizikos sričių vertinimai; jose stengiamės įgalinti saugų, labai naudingą turinį ir palaikyti platesnę naudotojų kūrybinę raišką.

Kilmės nustatymas naudojant C2PA ir vidinę grįžtamąją paiešką
Visi sugeneruoti vaizdai turi C2PA metaduomenis, kurie identifikuoja vaizdą kaip gautą iš „GPT‑4o“, kad būtų užtikrintas skaidrumas. Taip pat sukūrėme vidinį paieškos įrankį, kuris naudoja techninius generacijų atributus, padedančius patikrinti, ar turinys gautas iš mūsų modelio.

Netinkamo turinio blokavimas
Toliau blokuojame prašymus generuoti vaizdus, kurie gali pažeisti mūsų turinio politiką, pavyzdžiui, vaizdus, kuriuose vaizduojamas vaikų seksualinis išnaudojimas ar seksualinio pobūdžio giliąsias klastotes. Kai kontekste yra realių žmonių vaizdai, taikome griežtesnius apribojimus dėl to, kokio pobūdžio vaizdus galima kurti, o ypač stiprios apsaugos numatytos nuogybėms ir atviroms smurto scenoms. Kaip ir su bet kuriuo nauju produktu, sauga yra nesibaigiantis procesas – tai nuolatinė investicijų sritis. Sužinoję daugiau apie realų šio modelio naudojimą, atitinkamai pakoreguosime savo politiką.

Daugiau apie mūsų požiūrį skaitykite „GPT‑4o“ sistemos kortelės⁠ vaizdų generavimo priede.

Samprotavimu grindžiama sauga
Panašiai kaip ir mūsų apgalvoto suderinimo⁠ (angl. deliberative alignment) darbe, apmokėme samprotaujantį LLM dirbti tiesiogiai pagal žmogaus parašytas ir interpretuojamas saugos specifikacijas. Kūrybos procese šis samprotaujantis LLM padėjo nustatyti ir pašalinti mūsų politikos dviprasmybes. Naudodamiesi savo multimodalinio modelio pasiekimais ir esamais saugos metodais, sukurtais „ChatGPT“ ir „Sora“ programoms, galime moderuoti⁠ tiek įvesties tekstą, tiek išvesties vaizdus pagal mūsų politiką.

Prieiga ir prieinamumas

„4o“ vaizdų generavimas nuo šiandien pradedamas teikti „Plus“, „Pro“, „Team“ ir „Free“ naudotojams – jiems tai bus numatytasis vaizdų generatorius „ChatGPT“ programoje, o „Enterprise“ ir „Edu“ naudotojai prieigą gaus netrukus. Funkciją taip pat galima naudoti „Sora“ programoje. Tiems, kurie jaučia nostalgiją DALL·E, jis vis dar pasiekiamas per specialų DALL·E GPT.

Programuotojai netrukus galės generuoti vaizdus naudodamiesi „GPT‑4o“ per API; prieiga bus pradėta teikti per kelias ateinančias savaites.

Kurti ir pritaikyti vaizdus taip pat paprasta, kaip ir kalbėtis naudojant „GPT‑4o“ – tiesiog aprašykite, ko jums reikia, įskaitant bet kokią specifiką, pavyzdžiui, kraštinių santykį, tikslias spalvas naudojant šešioliktainius kodus arba permatomą foną. Kadangi šis modelis kuria detalesnius paveikslėlius, vaizdų atvaizdavimas užtrunka ilgiau, dažnai iki vienos minutės.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#) — credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [Alex Duffy](https://every.to/@AlxAi)

Tiesioginės transliacijos pakartojimas

Autorius

OpenAI

Vadovybė

Gabriel Goh: vaizdų generavimas

Jackie Shannon: „ChatGPT“ produktas

Mengchao Zhong, Wayne Chang: „ChatGPT“ inžinerija

Rohan Sahai: „Sora“ produktai ir inžinerija

Brendan Quinn, Tomer Kaftan: modelio vykdymas

Prafulla Dhariwal: multimodalinis padalinys

Moksliniai tyrimai

Fundamentiniai tyrimai

Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal

Pagrindiniai tyrimai

Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra

Tyrimų bendraautoriai

Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song

Modelio elgsena

Laurentia Romaniuk

Multimodalinis padalinys

Andrew Gibiansky, Yang Lu

Duomenys

Duomenų vadovai

Gildas Chabot, James Park Lennon

Duomenys

Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian

Moderatoriai

Hazel Byrne, Jennifer Luckenbill, Mariano López

Patarėjai žmogiškųjų duomenų klausimais

Long Ouyang

Mastelio keitimas

Modelio vykdymo vadovai

Brendan Quinn, Tomer Kaftan

Modelio vykdymas

Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh

Taikomoji veikla

„ChatGPT“ produkto vadovas

Jackie Shannon

„ChatGPT“ inžinerijos vadovai

Mengchao Zhong, Wayne Chang

Produkto dizaino vadovas

Matt Chan

Duomenų mokslas

Xiaolin Hao

„ChatGPT“

Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian

Sora

„Sora“ produktų vadovai

Rohan Sahai, Wesam Manassra

„Sora“ produktai ir inžinerija

Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra

Sauga

Saugos vadovas

Somay Jain

Sauga

Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson

Strategija

Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll

Rinkodara ir komunikacija

Komunikacijos ir rinkodaros vadovai

Minnia Feng, Natalie Summers, Taya Christianson

Komunikacija

Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor

Dizainas ir kūryba

Vadovai

Kendra Rimbach, Veit Moeller

Dizainas

Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz

Ypatinga padėka

Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco