MENU

MIT model exhibits human-like understanding of object behavior

MIT model exhibits human-like understanding of object behavior

Technology News |
By Rich Pell



Similar to how infants hold expectations for how objects should move and interact with each other, the model, called ADEPT, registers “surprise” when objects in simulations move in unexpected ways, such as rolling behind a wall and not reappearing on the other side. The model, say the researchers, could be used to help build smarter artificial intelligence (AI) and, in turn, provide information to help scientists understand infant cognition.

“By the time infants are 3 months old, they have some notion that objects don’t wink in and out of existence, and can’t move through each other or teleport,” says Kevin A. Smith, a research scientist in the Department of Brain and Cognitive Sciences (BCS) and a member of the Center for Brains, Minds, and Machines (CBMM). “We wanted to capture and formalize that knowledge to build infant cognition into artificial-intelligence agents. We’re now getting near human-like in the way models can pick apart basic implausible or plausible scenes.”

The model observes objects moving around a scene and makes predictions about how the objects should behave, based on their underlying physics. While tracking the objects, the model outputs a signal at each video frame that correlates to a level of “surprise.” The bigger the signal, the greater the surprise – for example, if an object dramatically mismatches the model’s predictions by vanishing or teleporting across a scene the model’s surprise levels will spike.

ADEPT relies on two modules: an “inverse graphics” module that captures object representations – such as shape, pose, and velocity – from raw images, and a “physics engine” that predicts the objects’ future representations from a distribution of possibilities. ADEPT requires only some approximate geometry of each shape to function, which helps the model generalize predictions to new objects, not just those it’s trained on.

“It doesn’t matter if an object is rectangle or circle, or if it’s a truck or a duck,” says Smith. “ADEPT just sees there’s an object with some position, moving in a certain way, to make predictions. Similarly, young infants also don’t seem to care much about some properties like shape when making physical predictions.”

The coarse object descriptions are then fed into the physics engine, which “pushes the objects forward in time” to creates a range of predictions, or a “belief distribution” for what will happen to those objects in the next frame. The model then observes the actual next frame and once again captures the object representations, which it then aligns to one of the predicted object representations from its belief distribution.

If the object obeyed the laws of physics, there won’t be much mismatch between the two representations. On the other hand, if the object did something implausible – such as moving behind a wall and then not being there when the wall drops – there will be a major mismatch.

In the latter case, ADEPT resamples from its belief distribution and notes a very low probability that the object had simply vanished. If there’s a low enough probability, the model registers great “surprise” as a signal spike. Basically, say the researchers, surprise is inversely proportional to the probability of an event occurring – if the probability is very low, the signal spike is very high.

“If an object goes behind a wall, your physics engine maintains a belief that the object is still behind the wall,” says CBMM investigator Tomer D. Ullman, a co-author of a paper on the research. “If the wall goes down, and nothing is there, there’s a mismatch. Then, the model says, ‘There’s an object in my prediction, but I see nothing. The only explanation is that it disappeared, so that’s surprising.'”

In response to videos showing objects moving in physically plausible and implausible ways, say the researchers, the model registered levels of surprise that matched levels reported by humans who had watched the same videos – especially in cases where objects moved behind walls and disappeared when the wall was removed. The model also matched surprise levels on videos that humans weren’t surprised by but maybe should have been, say the researchers.

For example, in a video where an object moving at a certain speed disappears behind a wall and immediately comes out the other side, the object might have sped up dramatically when it went behind the wall or it might have teleported to the other side. In general, humans and ADEPT were both less certain about whether that event was or wasn’t surprising. The researchers also found traditional neural networks that learn physics from observations – but don’t explicitly represent objects – are far less accurate at differentiating “surprising” from “unsurprising” scenes, and their picks for surprising scenes don’t often align with humans.

Looking ahead, the researchers say they plan to delve further into how infants observe and learn about the world, with the aim of incorporating any new findings into their model. Studies, for example, show that infants up until a certain age actually aren’t very surprised when objects completely change in some ways – such as if a truck disappears behind a wall, but reemerges as a duck.

“We want to see what else needs to be built in to understand the world more like infants, and formalize what we know about psychology to build better AI agents,” says Smith.

For more, see “Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations.”

Related articles:
DARPA looks to teach machines common sense
Common sense AI is goal of new research project
Machine Learning: New method enables accurate extrapolation
Self-learning robot a step closer to machine self-awareness

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s