Quatar, where QCRI is based, is “not a priority for the large companies building digital maps,” Madden says. Yet, it’s constantly building new roads and improving old ones, especially in preparation for hosting the 2022 FIFA World Cup.
“While visiting Qatar, we’ve had experiences where our Uber driver can’t figure out how to get where he’s going, because the map is so off,” Madden says. “If navigation apps don’t have the right information, for things such as lane merging, this could be frustrating or worse.”
RoadTagger relies on a novel combination of a convolutional neural network (CNN) — commonly used for images-processing tasks — and a graph neural network (GNN). GNNs model relationships between connected nodes in a graph and have become popular for analyzing things like social networks and molecular dynamics. The model is “end-to-end,” meaning it’s fed only raw data and automatically produces output, without human intervention.
The CNN takes as input raw satellite images of target roads. The GNN breaks the road into roughly 20-meter segments, or “tiles.” Each tile is a separate graph node, connected by lines along the road. For each node, the CNN extracts road features and shares that information with its immediate neighbours. Road information propagates along the whole graph, with each node receiving some information about road attributes in every other node. If a certain tile is occluded in an image, RoadTagger uses information from all tiles along the road to predict what’s behind the occlusion.
This combined architecture represents a more human-like intuition, the researchers say. Say part of a four-lane road is occluded by trees, so certain tiles show only two lanes. Humans can easily surmise that a couple lanes are hidden behind the trees. Traditional machine-learning models — say, just a CNN — extract features only of individual tiles and most likely predict the occluded tile is a two-lane road.
“Humans can use information from adjacent tiles to guess the number of lanes in the occluded tiles, but networks can’t do that,” Madden explains. “Our approach tries to mimic the natural behavior of humans, where we capture local information from the CNN and global information from the GNN to make better predictions.”