By zeroing in on humans’ gait, body symmetry and foot placement, researchers at the University of Michigan (U-M) are teaching self-driving cars to recognize and predict pedestrian movements with greater precision than current technologies.
Data collected by vehicles through cameras, lidar and GPS allows the U-M researchers to capture video snippets of humans in motion and then recreate them in 3D computer simulation.
With that, they have created a ‘biomechanically inspired recurrent neural network’ (Bio-LSTM) that catalogs human movements. With it, they can predict poses and future locations for one or several pedestrians up to about 150ft (45.7m) from the vehicle, which is about the scale of a city intersection.
Equipping vehicles with the necessary predictive power requires the network to dive into the minutiae of human movement: the pace of a human’s gait, the mirror symmetry of limbs, and the way in which foot placement affects stability during walking.
Much of the machine learning used to bring autonomous vehicle (AV) technology to its current level has dealt with two dimensional images – still photos. A computer shown several million photos of a stop sign will eventually come to recognize them in the real world and in real time.
But by using video clips that run for several seconds, the U-M system can study the first half of the snippet to make its predictions, and then verify their accuracy with the second half.
Supported by a grant from the Ford Motor Company, the team’s results have shown that this new system improves upon a driverless vehicle’s capacity to recognize what is most likely to happen next.
To reduce the number of options for predicting the next movement, the researchers applied the physical constraints of the human body, such as our fastest possible speed on foot. To create the dataset used to train U-M’s neural network, researchers parked a vehicle with SAE Level 4 autonomous features at several Ann Arbor intersections.
With the car’s cameras and lidar facing the intersection, the AV could record multiple days of data at a time. Researchers bolstered that real-world, ‘in the wild’ data from traditional pose data sets captured in a lab. The result is a system that will raise the bar for what driverless vehicles are capable of.
“Prior work in this area has typically only looked at still images. It wasn’t really concerned with how people move in three dimensions,” explained Ram Vasudevan, U-M assistant professor of mechanical engineering.
“But if these vehicles are going to operate and interact in the real world, we need to make sure our predictions of where a pedestrian is going doesn’t coincide with where the vehicle is going next.”
U-M associate professor Matthew Johnson-Roberson noted, “Now, we’re training the system to recognize motion and making predictions of not just one single thing, but where that pedestrian’s body will be at the next step and the next and the next.
“The median translation error of our prediction was approximately 10cm after one second and less than 80cm after six seconds. All other comparison methods were up to 7m off. We’re better at figuring out where a person is going to be.”