0413老師文章

From: https://www.nytimes.
com/2023/04/04/technology/runway-ai-
videos.html
Instant Videos Could Represent the Next Leap in A.I. Technology

A start-up in New York is among a group of companies working on systems that can produce
short videos based on a few words typed into a computer.
By Cade Metz
April 4, 2023
Ian Sansavera, a software architect at a New York start-up called Runway AI, typed a short
description of what he wanted to see in a video. “A tranquil river in the forest,” he wrote.
Less than two minutes later, an experimental internet service generated a short video of a
tranquil river in a forest. The river’s running water glistened in the sun as it cut between
trees and ferns, turned a corner and splashed gently over rocks.
Runway, which plans to open its service to a small group of testers this week, is one of
several companies building artificial intelligence technology that will soon let people
generate videos simply by typing several words into a box on a computer screen.
They represent the next stage in an industry race — one that includes giants like Microsoft
and Google as well as much smaller start-ups — to create new kinds of artificial intelligence
systems that some believe could be the next big thing in technology, as important as web
browsers or the iPhone.
The new video-generation systems could speed the work of moviemakers and other digital
artists, while becoming a new and quick way to create hard-to-detect online misinformation,
making it even harder to tell what’s real on the internet.
The systems are examples of what is known as generative A.I., which can instantly create
text, images and sounds. Another example is ChatGPT, the online chatbot made by a San
Francisco start-up, OpenAI, that stunned the tech industry with its abilities late last year.
Google and Meta, Facebook’s parent company, unveiled the first video-generation systems
last year, but did not share them with the public because they were worried that the
systems could eventually be used to spread disinformation with newfound speed and
efficiency.
1
But Runway’s chief executive, Cristóbal Valenzuela, said he believed the technology was too
important to keep in a research lab, despite its risks. “This is one of the single most
impressive technologies we have built in the last hundred years,” he said. “You need to have
people actually using it.”
The ability to edit and manipulate film and video is nothing new, of course. Filmmakers have
been doing it for more than a century. In recent years, researchers and digital artists have
been using various A.I. technologies and software programs to create and edit videos that
are often called deepfake videos.
But systems like the one Runway has created could, in time, replace editing skills with the
press of a button.
Runway’s technology generates videos from any short description. To start, you simply type
a description much as you would type a quick note.
That works best if the scene has some action — but not too much action — something like
“a rainy day in the big city” or “a dog with a cellphone in the park.” Hit enter, and the system
generates a video in a minute or two.
The technology can reproduce common images, like a cat sleeping on a rug. Or it can
combine disparate concepts to generate videos that are strangely amusing, like a cow at a
birthday party.
The videos are only four seconds long, and the video is choppy and blurry if you look closely.
Sometimes, the images are weird, distorted and disturbing. The system has a way of merging
animals like dogs and cats with inanimate objects like balls and cellphones. But given the
right prompt, it produces videos that show where the technology is headed.
“At this point, if I see a high-resolution video, I am probably going to trust it,” said Phillip
Isola, a professor at the Massachusetts Institute of Technology who specializes in A.I. “But
that will change pretty quickly.”
Like other generative A.I. technologies, Runway’s system learns by analyzing digital data —
in this case, photos, videos and captions describing what those images contain. By training
this kind of technology on increasingly large amounts of data, researchers are confident they
can rapidly improve and expand its skills. Soon, experts believe, they will generate
2
professional-looking mini-movies, complete with music and dialogue.
It is difficult to define what the system creates currently. It’s not a photo. It’s not a cartoon.
It’s a collection of a lot of pixels blended together to create a realistic video. The company
plans to offer its technology with other tools that it believes will speed up the work of
professional artists.
Several start-ups, including OpenAI, have released similar technology that can generate still
images from short prompts like “photo of a teddy bear riding a skateboard in Times Square.”
And the rapid advancement of A.I.-generated photos could suggest where the new video
technology is going.
Last month, social media services were teeming with images of Pope Francis in a white
Balenciaga puffer coat — surprisingly trendy attire for an 86-year-old pontiff. But the images
were not real. A 31-year-old construction worker from Chicago had created the viral
sensation using a popular A.I. tool called Midjourney.
Dr. Isola has spent years building and testing this kind of technology, first as a researcher at
the University of California, Berkeley, and at OpenAI, and then as a professor at M.I.T. Still, he
was fooled by the sharp, high-resolution but completely fake images of Pope Francis.
“There was a time when people would post deepfakes and they wouldn’t fool me, because
they were so outlandish or not very realistic,” he said. “Now, we can’t take any of the images
we see on the internet at face value.”
Midjourney is one of many services that can generate realistic still images from a short
prompt. Others include Stable Diffusion and DALL-E, an OpenAI technology that started this
wave of photo generators when it was unveiled a year ago.
Midjourney relies on a neural network, which learns its skills by analyzing enormous
amounts of data. It looks for patterns as it combs through millions of digital images as well
as text captions that describe the images depict.
When someone describes an image for the system, it generates a list of features that the
image might include. One feature might be the curve at the top of a dog’s ear. Another
might be the edge of a cellphone. Then, a second neural network, called a diffusion model,
creates the image and generates the pixels needed for the features. It eventually transforms
the pixels into a coherent image.
3
Companies like Runway, which has roughly 40 employees and has raised $95.5 million, are
using this technique to generate moving images. By analyzing thousands of videos, their
technology can learn to string many still images together in a similarly coherent way.
“A video is just a series of frames — still images — that are combined in a way that gives
the illusion of movement,” Mr. Valenzuela said. “The trick lies in training a model that
understands the relationship and consistency between each frame.”
Like early versions of tools such as DALL-E and Midjourney, the technology sometimes
combines concepts and images in curious ways. If you ask for a teddy bear playing
basketball, it might give a kind of mutant stuffed animal with a basketball for a hand. If you
ask for a dog with a cellphone in the park, it might give you a cellphone-wielding pup with
an oddly human body.
But experts believe they can iron out the flaws as they train their systems on more and more
data. They believe the technology will ultimately make creating a video as easy as writing a
sentence.
“In the old days, to do anything remotely like this, you had to have a camera. You had to
have props. You had to have a location. You had to have permission. You had to have
money,” said Susan Bonser, an author and a publisher in Pennsylvania who has been
experimenting with early incarnations of generative video technology. “You don’t have to
have any of that now. You can just sit down and imagine it.”

0413老師文章

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

0413老師文章

Uploaded by

Copyright:

Available Formats

From: https://www.nytimes.

Instant Videos Could Represent the Next Leap in A.I. Technology

You might also like