Optical human-to-machine communication sensors promise 'near-perfect' speech recognition

October 25, 2016 // By Julien Happich
Optical human-to-machine communication sensors promise 'near-perfect' speech recognition
Headquartered in Israel with a sales office in California, startup VocalZoom was founded in 2010 to focus on Human-to-Machine Communication speech enhancement challenges. It started product development in 2013 shortly after it had signed strategic agreements a large automotive OEM and with 3M Corporation, sampling its first engineering samples of its VocalZoom HMC sensor last summer.

The sensor consists of an off-the-shelf VCSEL 850nm laser with an embedded photodiode, packaged with the company's proprietary ASIC for processing the soundwaves that are detected optically from reading out the speakers' skin vibrations. Interviewed by eeNews Europe, Rammy Bahalul, Vice President of Sales and Business Development for VocalZoom gave more details about the technology.

"When someone speaks, the sound propagates all over the skin too, and we can measure these vibrations by detecting the laser's reflection on the skin through an interferometer. The way the interferometer works is that any back reflections interfere with the stabilized laser wavelength in the cavity, and that impacts the laser power".

The ASIC monitors the laser power fluctuations as read by the built-in photodiode, and turns it into a noise-free “audio” signal that can then be fused with the real audio signal recorded by a microphone, either through an audio processor or cloud software.

"It is similar to bone conduction, but without contact, we can measure vibrations up to 1.5kHz", continued Bahalul, "we are not reading lips but actual facial vibrations, these can be detected from the cheeks, all around the neck and even behind the ears."

The optical sensor can be placed a few millimetres away up to a meter, making it practical for applications in headsets, wearables, smartphones or laptops, but also in automotive applications where it could be mounted into the rear-view mirror or in ATMs.

How VocalZoom's optical sensor cleans up acoustic audio.

When tested with leading speech recognition providers, the startup claims its HMC sensor makes all the difference in noisy environments, (even in strong and complex noise), reducing almost all errors and making speech recognition more widely usable. In a high noise environment, the company is able to revive original speech from -10dB (inaudible voice versus high noise) to 20dB when VocalZoom enabled.

As well as improving speech recognition, audio signal fusion from the optical sensor and a microphone could enable many features currently served by discrete sensors. It could be used to perform more robust voice identification through multi-factor biometrics (each individual having a unique facial "sound signature"), but also serve as an accurate and low power voice wakeup solution. The sensor is accurate enough to detect the speaker's heart rate from the skin, doubling as a liveness sensor, since it can make the difference between a sound speaker and a live person.

Consolidating multiple sensors and features into one.

Vous êtes certain ?

Si vous désactivez les cookies, vous ne pouvez plus naviguer sur le site.

Vous allez être rediriger vers Google.