MIT Researchers Develop an AI that Can Predict Future Scenes

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed an AI that can predict future scenes based on a single still image.

In the study “Generating Videos with Scene Dynamics,” MIT researchers led by Carl Vondrick showed that the deep-learning algorithm that they have developed can create a short video that predicts future scenes when given a still image from a scene. In order to predict scenes simply based on a single still image, the MIT research team trained the AI on 2 million unlabeled videos – an equivalent to a year’s worth of footage. Vondrick told MIT News that the videos created by the newly developed AI “show us what computers think can happen in a scene.” The first author of the study added, “If you can predict the future, you must have understood something about the present.”

Multiple researchers have worked on the topic of scenes prediction. What differentiates this latest study by the MIT research team is that while previous models have focused on extrapolating videos into the future, the present model creates completely new videos or scenes that have not been seen before. While previous models’ predictions have a large margin for error, the present model’s predictions are more accurate.

In generating more accurate scenes, the latest MIT researchers’ model predicts all frames simultaneously. The researchers taught their model to separate the foreground from the background, and then put the objects in the scene in order for the model to determine which objects are moving and which objects are static.

The researchers said that the deep-learning model that they have developed is not only limited to predicting the future. Generative videos, they said, can be used for adding animation to still images, help detect anomalies in security footages and compress data for storing and sending longer videos. “In the future, this will let us scale up vision systems to recognize objects and scenes without any supervision, simply by training them on video,” Vondrick said.