Car Talk: U of I Research May Improve Your Car's Hearing

1/1/2005 Doug Peterson, Freelance Writer

University of Illinois researchers are trying to improve the ability of smart cars to hear verbal commands from drivers by combining visual information with the audio. ECE Professor Mark Hasegawa-Johnson compares this visual/auditory approach to what people naturally do while trying to hear in a noisy room.

Written by Doug Peterson, Freelance Writer

Some day, your car might just become a good listener.

University of Illinois researchers are trying to improve the ability of smart cars to hear verbal commands from drivers by combining visual information with the audio. ECE Professor Mark Hasegawa-Johnson compares this visual/auditory approach to what people naturally do while trying to hear in a noisy room.

“When humans listen to speech in a very noisy environment, there is a tendency for the eyes to go down to the lip region,” he said. “By watching the lips while listening to the speech, you can correct possible mistakes that arise due to the noise level. We’re trying to make it possible for an automatic speech recognizer to do the same thing in the noisy environment of a car.”

To combine the visual with the audio, researchers have mounted an array of four cameras on the dashboard and eight microphones on the sun visor of test cars.

“We get the best audio recognition if the microphones are as close to the user’s lips as possible, and the sun visor was the closest reasonable place to put them,” Hasegawa-Johnson pointed out. In addition, mounting cameras on the dashboard provides the clearest view of a person’s mouth.

Hasegawa-Johnson said the idea is for drivers to be able to tell their cars to dial a telephone—the ultimate hands-free phone system. Another possibility is for drivers to be able to give verbal commands that control non-critical functions of the car, such as the radio or air conditioner.

With funding from the Motorola Corporation, researchers have embarked on this project by training the audio/visual speech recognition system to listen to different “talkers.” The goal is to eventually record 100 talkers so the system can understand variations in speech patterns.

The research team, which includes Hasegawa-Johnson, ECE Professors Tom Huang and Steve Levinson, and Motorola’s Mike McLaughlin, is also developing and applying robust audio-visual extraction algorithms that make it possible for the system to recognize speech in a noisy car.
The researchers are collecting data by asking subjects to read two sets of scripts under various levels of noise—while the car is idling, while it is running at 35 and 55 mph with the windows rolled up, and while it is running at 35 and 55 mph with the windows rolled down.

Hasegawa-Johnson expects to find that the visual component will boost the system’s accuracy considerably. He bases this expectation on earlier work by Huang, who has studied speech recognition in noisy rooms using visual information.

Huang’s research shows that if you have no visual information, the accuracy of a speech recognition system drops off at about 20 decibels (dB) signal-to-noise ratio (SNR). But if you add visual information, you can postpone the degradation of speech recognition until about 5 or 10dB SNR.

According to Hasegawa-Johnson, some luxury cars, such as the Jaguar, already have built-in single microphones, but he’s hoping to demonstrate ways to boost the accuracy. Adding more microphones, as his team has done, is a fairly straightforward solution, he said, although putting cameras in cars might be more difficult because of the expense involved.

Hasegawa-Johnson anticipates that multi-microphone systems will appear in cars within four or five years.


Share this story

This story was published January 1, 2005.