Everything About OpenAI’s Video Generator, Sora, Which Can Create Videos From Text


Advertisements
 

Step aside, ChatGPT. OpenAI has a new baby in town.

On 15 February, OpenAI announced a new AI text-to-video model called Sora.

It’s not available to the public yet, but the company decided to share its research progress early to get feedback from people outside OpenAI.

If you’ve always wanted to make videos without much effort, Sora seems like the perfect solution.

Here’s everything that is known about Sora so far.

Sora Can Create Video From Text

The new model is named after the Japanese word for “sky”, and can produce realistic footage up to a minute long.

Moreover, it adheres to a user’s instructions on subject matter and style.

In a blog post, OpenAI wrote, “We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.”

According to the blog post, Sora can create complex scenes with multiple characters, specific types of motion, and accurate details.

OpenAI said Sora understands what a user has asked for in a prompt and how those things exist in the physical world.

In a post on X, formerly Twitter, OpenAI demonstrated Sora’s capabilities.

Using the prompt “beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes”, Sora created a 17-second video depicting this scene.

The company posted the results of various other prompts on its Twitter account, including one involving woolly mammoths frolicking in a snowy meadow and another featuring a spaceman.

The blog post further explained that Sora deeply comprehends language, allowing it to interpret prompts correctly and generate characters.


Advertisements
 

It also added that the AI model can create multiple shots within a single generated video.

Who Can Use Sora?

Currently, Sora isn’t open to everyone.

OpenAI announced that only a few researchers and video creators can access Sora. 

Experts would also “red team” the product.

This means they will test the model for its potential to bypass OpenAI’s terms of service, which prohibits “extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others”.


Advertisements
 

OpenAI gave some visual artists, designers, and filmmakers access to the model so that the company may gain better insights into advancing the model and making it useful for creative professionals. 

OpenAI said it is also building the necessary tools to detect misleading information. 

Sora’s Weaknesses

Of course, the current model isn’t perfect yet.

OpenAI acknowledged that Sora “may struggle with accurately simulating the physics of a complex scene”.

Moreover, Sora has difficulty understanding the concept of cause and effect.

The company gave the example of a video depicting a person taking a bite of a cookie. 


Advertisements
 

As Sora currently struggles to comprehend cause and effect, the cookie may not have a bite mark afterwards.

Text-to-video AI models are not a new concept. 

In 2023, Meta improved its image generation model Emu and added two AI features that could edit and generate videos based on written commands.

Nonetheless, the announcement of Sora has incited excitement in many.

Reece Hayden, a senior analyst at ABI Research, said that Sora is set apart from similar existing models due to its length and accuracy.


Advertisements
 

He added that text-to-video AI models can significantly impact digital entertainment markets.

For instance, new personalised content could be streamed across channels.

He explained that such AI models could help create short scenes to support narratives on television.

Arvind Narayanan, a professor of computer science at Princeton University, acknowledged that Sora seems more advanced than similar models.

However, he noted that the videos created with Sora that OpenAI posted still have inconsistencies.

For instance, in the Sora-generated video of a Tokyo street, a woman’s right and left legs switch places.

In the same video, people in the background vanish after something passes in front of them.

Although a casual viewer may not notice these inconsistencies, he highlighted that this flaw could necessitate adapting to the idea that realism is “no longer a marker of authenticity”.


Advertisements
 

Criticisms of Sora

Remember when everyone tried to sue OpenAI for alleged copyright infringement in training its generative AI tools?

If you didn’t know, generative AI scrapes massive amounts of material from the internet to imitate the images or text requested. 

However, the material had to come from somewhere, and the owners of these works were unhappy as they were not compensated.

There may be a possibility of this criticism repeating itself with the introduction of Sora.

OpenAI did not disclose how much footage was used to train Sora.

It also did not reveal where the training videos may have originated from.

However, it told the New York Times that the corpus consisted of publicly available videos or licenced ones from copyright owners.

Furthermore, the recent strikes by writers and actors guilds questioned the use of AI language tools in screenwriting and the use of actors’ likeness in AI-generated scenes.

More specifically, these questions were tailored to the livelihoods of actors and writers. 

After all, with AI, would human extras or a team of writers be needed to produce a film?

Thus, AI could impact the livelihoods of such professionals.

Mutale Nkonde, a visiting policy fellow at the Oxford Internet Institute, added that policymakers must consider humans “in the loop” when such tools are involved.

Prof Narayanan also noted that high-quality tools like Sora could potentially result in deepfake videos.

In Singapore, scams involving deepfake videos are on the rise.

Scammers use AI tools to create deepfake videos to fool people into transferring money. 

With Sora, an AI model that produces higher-quality videos, people may find distinguishing between a deepfake and an authentic video more challenging.

Other OpenAI Updates

Although AI has some risks, it’s still an exciting development in the world of technology.

It can make people’s lives more convenient.

For instance, many users rely on it for coding work or generative ideas for creative writing. 

However, tools like ChatGPT are still not perfect.

Using the technology for prolonged periods would lead one to discover that ChatGPT has the memory of a goldfish.

Sometimes, it forgets earlier parts of a conversation and other times, it crashes.

Fortunately, on 13 February, OpenAI gave an early Valentines’ Day gift to its ChatGPT users.

It announced that it is testing a new feature allowing ChatGPT to remember specific details. 

According to an X post by Joanne Jang, Product Lead of DALL-E and OpenAI, there are now a few ways to manage memory:

  • Users can tell ChatGPT to remember or forget something in the chat.
  • There is now a “temporary chat” feature for one-off conversations that involve topics that users don’t want the AI to pick up on.
  • ChatGPT can delete individual memory snippers, delete all memories, or turn off memory altogether.

Notably, not everyone has access to the feature yet as “memory, almost by definition, will take a bit longer to build”.

However, Jang added, “We can’t wait to learn from this experiment and iterate on the feedback, so that personalisation features like this can be truly useful to everyone.”