OpenAI's Sora Turns AI Prompts Into Photorealistic Videos

jeudi 15 février 2024, 21:50 , par Slashdot

An anonymous reader quotes a report from Wired: We already know thatOpenAI's chatbots can pass the bar exam without going to law school. Now, just in time for the Oscars, a new OpenAI app called Sora hopes to master cinema without going to film school. For now a research product, Sora is going out to a few select creators and a number of security experts who will red-team it for safety vulnerabilities. OpenAI plans to make it available to all wannabe auteurs at some unspecified date, but it decided to preview it in advance. Other companies, from giants like Google to startups likeRunway, have already revealed text-to-video AI projects. But OpenAI says that Sora is distinguished by its striking photorealism -- something I haven't seen in its competitors -- and its ability to produce longer clips than the brief snippets other models typically do, up to one minute. The researchers I spoke to won't say how long it takes to render all that video, but when pressed, they described it as more in the 'going out for a burrito' ballpark than 'taking a few days off.' If the hand-picked examples I saw are to be believed, the effort is worth it.

OpenAI didn't let me enter my own prompts, but it shared four instances of Sora's power. (None approached the purported one-minute limit; the longest was 17 seconds.) The first came from a detailed prompt that sounded like an obsessive screenwriter's setup: 'Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.' The result is a convincing view of what is unmistakably Tokyo, in that magic moment when snowflakes and cherry blossoms coexist. The virtual camera, as if affixed to a drone, follows a couple as they slowly stroll through a streetscape. One of the passersby is wearing a mask. Cars rumble by on a riverside roadway to their left, and to the right shoppers flit in and out of a row of tiny shops.

It's not perfect. Only when you watch the clip a few times do you realize that the main characters -- a couple strolling down the snow-covered sidewalk -- would have faced a dilemma had the virtual camera kept running. The sidewalk they occupy seems to dead-end; they would have had to step over a small guardrail to a weird parallel walkway on their right. Despite this mild glitch, the Tokyo example is a mind-blowing exercise in world-building. Down the road, production designers will debate whether it's a powerful collaborator or a job killer. Also, the people in this video -- who are entirely generated by a digital neural network -- aren't shown in close-up, and they don't do any emoting. But the Sora team says that in other instances they've had fake actors showing real emotions. 'It will be a very long time, if ever, before text-to-video threatens actual filmmaking,' concludes Wired. 'No, you can't make coherent movies by stitching together 120 of the minute-long Sora clips, since the model won't respond to prompts in the exact same way -- continuity isn't possible. But the time limit is no barrier for Sora and programs like it to transform TikTok, Reels, and other social platforms.'

'In order to make a professional movie, you need so much expensive equipment,' says Bill Peebles, another researcher on the project. 'This model is going to empower the average person making videos on social media to make very high-quality content.'

Further reading: OpenAI Develops Web Search Product in Challenge To Google

Read more of this story at Slashdot.

Lire la suite sur Slashdot