Voice processors, dev kit for far-field voice capture

Voice processors, dev kit for far-field voice capture

New Products |
Fabless chip maker XMOS (Bristol, UK) has introduced two voice interface processors, and a smart-speaker development kit. The company has created “VocalFusion” as its branding for the voice space, and looks to a future where voice is the primary HMI input to many domestic, and other, systems.
By Rich Pell


The XVF3000 devices and the VocalFusion Speaker development kit enable far-field (range in meters) voice capture. XMOS’ fundamental expertise is in I/O centric microcontrollers, and it has been building a specialty in audio processing.

The task of aggregating several microphone streams into a single digitised stream is a natural fit, the company says; and it has also been adding DSP operations to its core to enable the required processing. XMOS aims to provide a “front-end” facility – that is, to identify the source voice, lock on to it with microphone beamforming, carry out echo, reverberation and noise cancellation, and capture the essential spoken content.

Its customer, the product designer, would be where the interface to speech recognition – either locally or to a cloud service such as Amazon Alexa – would be implemented. XMOS does, however, provide the option of on-chip trigger-word-recognition so that this function can be performed ‘at source’. (the XVF3100 has this facility; the 3000 variant does not.)

XMOS chips, the company says, offer an integration path forward from today’s designs which are typically employing multiple ICs, including DSPs, to effect voice capture; this is a, “flexible, programmable solution … a cost effective always-on voice interface in a single device.” There is the option adding voice-trigger functions with Sensory’s TrulyHandsfree technology.

In the same release is the VocalFusion Speaker development kit (XK-VF3100-C43), which includes an XVF3000 processor card and a 4-mic circular microphone array. This kit provides a quick way to start developing far-field voice capture applications.

XVF3000 devices include speech enhancement algorithms that include an adaptive beamformer, which uses signals from four microphones to track a talker as they move, coupled with high performance full-duplex, acoustic echo cancellation. XVF3000 devices can be integrated with an applications processor or host PC via either USB for data and control or a combination of I2S and I2C.

Developers can add custom voice and audio processing using the XMOS free development tools. XMOS notes that the captured voice profile that best suits ASR (automated speech recognition) software doesn’t sound good to the human ear, so a ‘communications’ out put is also provided.

The speaker development kit – XMOS alludes to it as the “puck” – is intended as a proof-of-concept tool to demonstrate voice capture in the presence of music being played, for example. As well as the circular microphone array, there is also a linear array option. The complete package moves XMOS, “away from being [only] a silicon-plus-software supplier to a voice solution provider,” a spokesman commented.

XVF3000 devices are available, the VocalFusion Speaker development kit will be available in July 2017, and there is a beta-development programme; more at; www.xmos.com/xcorevocalfusion

Linked Articles