我們正在讓人工智能學習如何理解並模擬現實世界中的動態,目標是訓練出的模型能夠協助人類解決需要與現實世界互動的問題。
我們隆重推出 Sora。這個文字轉視訊模型能夠產生長達一分鐘的視訊,同時保持視覺品質並嚴格遵從用戶的提示詞。
提示詞:A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage.She wears a black leather jacket, a long red dress, and black boots, and carries a black purse.She wears sunglasses and red lipstick.She walks confidently and casually.The street is damp and reflective, creating a mirror effect of the colorful lights.Many pedestrians walk about.
提示詞:Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.
提示詞:A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.
提示詞:Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach.The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore.A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge.The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea.This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.
提示詞:Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle.The art style is 3D and realistic, with a focus on lighting and texture.The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth.Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time.The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.
提示詞:A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.
提示詞:This close-up shot of a Victoria crowned pigeon showcases its striking blue plumage and red chest.Its crest is made of delicate, lacy feathers, while its eye is a striking red color.The bird’s head is tilted slightly to the side, giving the impression of it looking regal and majestic.The background is blurred, drawing attention to the bird’s striking appearance.
提示詞:Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.
提示詞:A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.
Sora 現已開放給紅隊測試人員使用,用於評估關鍵領域的損害或風險。我們也讓多位視覺藝術家、設計師和電影製作人使用,以獲得他們的意見,了解可以如何提升此模型,為創意專業人士發揮出最大的功效。
我們提早公開研究進展,是為了與 OpenAI 以外的各界人士展開合作,收集大家的意見,同時讓公眾了解即將面世的人工智能技術。
提示詞:Historical footage of California during the gold rush.
提示詞:A close up view of a glass sphere that has a zen garden within it.There is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand.
提示詞:Extreme close up of a 24 year old woman’s eye blinking, standing in Marrakech during magic hour, cinematic film shot in 70mm, depth of field, vivid colors, cinematic
提示詞:A cartoon kangaroo disco dances.
提示詞:A beautiful homemade video showing the people of Lagos, Nigeria in the year 2056.Shot with a mobile phone camera.
提示詞:A petri dish with a bamboo forest growing within it that has tiny red pandas running around.
提示詞:The camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery.
提示詞:3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant, enchanted forest.The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and a bushy, striped tail.It hops along a sparkling stream, its eyes wide with wonder.The forest is alive with magical elements: flowers that glow and change colors, trees with leaves in shades of purple and silver, and small floating lights that resemble fireflies.The creature stops to interact playfully with a group of tiny, fairy-like beings dancing around a mushroom ring.The creature looks up in awe at a large, glowing tree that seems to be the heart of the forest.
Sora 能夠產生複雜的場景,包含多個角色、特定動作類型,以及主體與背景的準確細節。不僅理解用戶在提示詞中的要求,更確切地掌握這些事物在現實世界中的存在方式。
提示詞:The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene.The dirt road curves gently into the distance, with no other cars or vehicles in sight.The trees on either side of the road are redwoods, with patches of greenery scattered throughout.The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain.The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.
提示詞:Reflections in the window of a train traveling through the Tokyo suburbs.
提示詞:A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.
提示詞:A large orange octopus is seen resting on the bottom of the ocean floor, blending in with the sandy and rocky terrain.Its tentacles are spread out around its body, and its eyes are closed.The octopus is unaware of a king crab that is crawling towards it from behind a rock, its claws raised and ready to attack.The crab is brown and spiny, with long legs and antennae.The scene is captured from a wide angle, showing the vastness and depth of the ocean.The water is clear and blue, with rays of sunlight filtering through.The shot is sharp and crisp, with a high dynamic range.The octopus and the crab are in focus, while the background is slightly blurred, creating a depth of field effect.
提示詞:A flock of paper airplanes flutters through a dense jungle, weaving around trees as if they were migrating birds.
提示詞:A cat waking up its sleeping owner demanding breakfast.The owner tries to ignore the cat, but the cat tries new tactics and finally the owner pulls out a secret stash of treats from under the pillow to hold the cat off a little longer.
提示詞:Borneo wildlife on the Kinabatangan River
提示詞:A Chinese Lunar New Year celebration video with Chinese Dragon.
Sora 非常了解語言,因此能夠準確地解讀提示詞,並產生表達豐富情感的生動角色。此外,它也可以在單一產生的視訊中建立多個鏡頭,準確保持角色和視覺風格。
提示詞:Tour of an art gallery with many beautiful works of art in different styles.
提示詞:Beautiful, snowy Tokyo city is bustling.The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls.Gorgeous sakura petals are flying through the wind along with snowflakes.
提示詞:A stop motion animation of a flower growing out of the windowsill of a suburban house.
提示詞:The story of a robot’s life in a cyberpunk setting.
提示詞:An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt , he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film.
提示詞:A beautiful silhouette animation shows a wolf howling at the moon, feeling lonely, until it finds its pack.
提示詞:New York City submerged like Atlantis.Fish, whales, sea turtles and sharks swim through the streets of New York.
提示詞:A litter of golden retriever puppies playing in the snow.Their heads pop out of the snow, covered in.
現有模型仍有改進空間。它可能難以模擬複雜場景的物理特性,也可能無法理解因果關係的具體實例(例如:一個角色咬曲奇後,曲奇上可能不會留下痕跡)。此外,還可能混淆提示詞中包含的空間細節,如辨別左右,或難以準確描述經過一段時間而展開的事件,如特定的攝影機軌跡。
提示詞:Step-printing scene of a person running, cinematic film shot in 35mm.
缺點:Sora 有時會製作出在物理上不可能做到的動作。
提示詞:Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass.The pups run and leap, chasing each other, and nipping at each other, playing.
缺點:動物或人可能會突然出現,特別是在包含許多實體的場景中。
提示詞:Basketball through hoop then explodes.
缺點:不準確的實物模型和不自然的物體「變形」的例子。
提示詞:Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care.
缺點:在這個例子中,Sora 未能為椅子建立出堅固物體的模型,導致物理互動不準確。
提示詞:A grandmother with neatly combed grey hair stands behind a colorful birthday cake with numerous candles at a wood dining room table, expression is one of pure joy and happiness, with a happy glow in her eye.She leans forward and blows out the candles with a gentle puff, the cake has pink frosting and sprinkles and the candles cease to flicker, the grandmother wears a light blue blouse adorned with floral patterns, several happy friends and family sitting at the table can be seen celebrating, out of focus.The scene is beautifully captured, cinematic, showing a 3/4 view of the grandmother and the dining room.Warm color tones and soft lighting enhance the mood..
缺點:模擬物體和多個角色之間的複雜互動對 Sora 來說通常是一種挑戰,有時會產生出滑稽的影像。
在將 Sora 套用至 OpenAI 產品之前,我們將採取多項重要的安全措施。我們正在與紅隊測試人員(研究誤導資訊、仇恨內容和偏見等領域的專家)合作,他們將對 Sora 進行對抗性測試。
此外,我們正在建立工具,幫助偵測誤導性內容,例如可以識別由 Sora 產生的視訊的偵測分類器。如果我們將此模型套用到 OpenAI 產品中,我們便會計劃在未來包含 C2PA 元數據(在新視窗中開啟)。
除了開發新技術為應用做好準備之外,我們也運用現有的安全措施(在新視窗中開啟),這是專為使用 DALL·E 3 的產品而建立的,也適用於 Sora。
例如,一旦在 OpenAI 產品中使用,我們的文字分類器將檢查並拒絕違反使用政策的文字輸入提示詞,例如要求極端暴力、色情內容、仇恨圖像、名人肖像或他人知識產權的提示詞。我們也開發出強大的圖像分類器,用於審核產生的每段視訊的影格,以確保視訊符合我們的使用政策,然後才向用戶展示。
我們將與世界各地的政策制定者、教育工作者和藝術家交流,了解他們的顧慮,並為這項新技術找出具建設性的用例。儘管我們進行了廣泛的研究和測試,但無法完全預測人們將如何善用這項技術,或可能如何濫用。因此,我們相信,從實際使用中學習是建立和發佈日益安全的人工智能系統的重要流程。
提示詞:The camera directly faces colorful buildings in Burano Italy.An adorable dalmation looks through a window on a building on the ground floor.Many people are walking and cycling along the canal streets in front of the buildings.
提示詞:An adorable happy otter confidently stands on a surfboard wearing a yellow lifejacket, riding along turquoise tropical waters near lush tropical islands, 3D digital render art style.
提示詞:This close-up shot of a chameleon showcases its striking color changing capabilities.The background is blurred, drawing attention to the animal’s striking appearance.
提示詞:A corgi vlogging itself in tropical Maui.
提示詞:A white and orange tabby cat is seen happily darting through a dense garden, as if chasing something.Its eyes are wide and happy as it jogs forward, scanning the branches, flowers, and leaves as it walks.The path is narrow as it makes its way between all the plants. the scene is captured from a ground-level angle, following the cat closely, giving a low and intimate perspective.The image is cinematic with warm tones and a grainy texture.The scattered daylight between the leaves and plants above creates a warm contrast, accentuating the cat’s orange fur.The shot is clear and sharp, with a shallow depth of field.
提示詞:Aerial view of Santorini during the blue hour, showcasing the stunning architecture of white Cycladic buildings with blue domes.The caldera views are breathtaking, and the lighting creates a beautiful, serene atmosphere.
提示詞:Tiltshift of a construction site filled with workers, equipment, and heavy machinery.
提示詞:A giant, towering cloud in the shape of a man looms over the earth.The cloud man shoots lighting bolts down to the earth.
提示詞:A Samoyed and a Golden Retriever dog are playfully romping through a futuristic neon city at night.The neon lights emitted from the nearby buildings glistens off of their fur.
提示詞:The Glenfinnan Viaduct is a historic railway bridge in Scotland, UK, that crosses over the west highland line between the towns of Mallaig and Fort William.It is a stunning sight as a steam train leaves the bridge, traveling over the arch-covered viaduct.The landscape is dotted with lush greenery and rocky mountains, creating a picturesque backdrop for the train journey.The sky is blue and the sun is shining, making for a beautiful day to explore this majestic spot.
Sora 屬於擴散模型,會先產生一段類似靜態雜訊的視訊,然後透過多個步驟去除雜訊,再逐步進行轉換,最終產生出視訊。
Sora 可以一次性產生整段視訊,或者擴展已產生的視訊,延長視訊的時間。我們賦予此模型一次預測多個影格的能力,藉此解決了一個具挑戰性的問題:即使主體暫時離開畫面,也能保持其原始狀態。
Sora 與 GPT 模型類似,採用變換器架構來實現卓越的擴展效能。
我們將視訊和圖像表示為一系列更小的資料單元,稱為「區塊」,每個區塊都類似於 GPT 中的詞元。我們可以透過統一資料表示方式,運用比以往更廣泛的視覺資料來訓練擴散變換器,涵蓋不同的長度、解析度和寬高比。
Sora 以過去的 DALL·E 和 GPT 模型的研究成果為基礎,採用 DALL·E 3 中的重新標記技術,為視覺訓練資料產生描述性高的説明文字。因此,此模型能更忠實地遵從用戶在產生視訊中的文字指示。
除了可以只根據文字指示產生視訊以外,此模型也能利用現有的靜態圖像產生視訊,將圖像內容化為動畫,效果既準確又注重細節。此模型也可以利用現有的視訊,進行擴展或填補缺少的影格。閱讀我們的技術報告以了解更多資訊。
Sora 為理解和模擬現實世界的模型奠立基礎,我們相信這種能力將成為實現通用人工智能的重要里程碑。