Embedded Audio & Robotic Ear Marc HERVIEU IoT Marketing Manager Marc.Hervieu@st.com
Voice Communication: key driver of innovation since 1800 s 2
IoT Evolution of Voice Automation: the IoT Voice Assistant 3 How can I help you?
Indoor Voice Capture: the Problem reflections, diffusion, Voice Acoustic Echo background noise Reference signal (same as Audio Input) Audio output (e.g. speaker s voice, clean) Audio input (e.g. music, or far-end speaker)
Indoor Voice Capture: the Problem reflections, diffusion, Voice Acoustic Echo background noise Reference signal (same as Audio Input) Audio output (e.g. speaker s voice, clean) Audio input (e.g. music, or far-end speaker)
the Robotic Ear 6 Augmented hearing Acoustic Beamforming Voice over Bluetooth Low Energy Embedded Processing Sound Localization
Entry Level to Advanced Audio in 3 steps 7 MEMS microphones on Function Packs Advanced Free-Licensed Premium libraries Audio Over BLE Beamforming Source Localization Acoustic Echo-Cancellation Integrated Development Platforms
MEMS microphone array Audio Front End 8 Example of Signal Processing Architecture Source Localization reference Speech Recognition embedded cloud Beamforming Acoustic Echo Cancellation Trigger ASR Audio Analytics Noise Reduction Statistical Dereverberation Auto Gain Control - Voice Activity Detection - Statistical moments - Noise estimation -...
MEMS microphone array Audio Front End Example of Signal Processing Architecture 9 Source Localization reference Speech Recognition embedded cloud Beamforming Acoustic Echo Cancellation Trigger ASR Audio Analytics Noise Reduction Statistical Dereverberation Auto Gain Control - Voice Activity Detection - Statistical moments - Noise estimation -...
MEMS microphone array Audio SW IP and ecosystem 10 reference 3 rd party ASR osxacousticsl embedded cloud osxacousticbf osxacousticec Trigger ASR Audio Analytics Noise Reduction Statistical Dereverberation Auto Gain Control - Voice Activity Detection - Statistical moments - Noise estimation -... Each osxacoustic library may be easily replaced by 3 rd party SW IP All are released under free evaluation and production licensing
MEMS microphone array Spatial Audio Processing 11 Source Localization osxacousticsl Freely licensed FW Libraries for STM32 Beamforming osxacousticbf www.st.com/openaudio Beamforming: osxacousticbf Spatial Filter Outputs the Audio that comes from a given direction - Voice Activity Detection - Statistical moments - Noise estimation -... Adaptively cancels audio signals coming from other directions Sound Localization: osxacousticsl Estimates the Direction of Arrival of the Main sound source Independent from beamforming May control the beam direction
Beamforming osxacousticbf Differential MEMS Microphone Array 12 Small and Compact geometry, powered by MEMS microphone technology Figure of 8 osxacousticbf Endfire cardioid beamforming with 2 digital MEMS microphones Scalable performance Vs MIPS to match application requirements Basic Cardioid Cardioid Strong Cardioid 170 60 Polar pattern shape is independent of frequency
Beamforming osxacousticbf Beamforming Algorithm Options 13 Cardioid basic: 1st -order Differential Microphone Array (DMA) d Delay = + - out = d c ; c = speed of sound Cardioid denoise: a denoise filter is added to the end fire beam forming output d Delay = + - out Denoise = d c ; c = speed of sound Strong: back to back cardioid and adaptive noise removal filter ASR ready: same as the Strong, without the denoise filter. Best performance for Automatic Speech Recognition applications. d + - - + Enhance Remove out
Beamforming osxacousticbf Beamforming: Polar Pattern Tests 14 Test setup Microphone Array mounted on a rotating support Inter-microphone distance: 4mm Rotation in steps of 10 degrees Gaussian White Noise played by high quality loudspeaker BlueCoin eval platform Integrated MEMS micro-array Resulting beam pattern Blue: omnidirectional microphone Red: «Basic cardioid» mode Green: «Strong» mode
Beamforming osxacousticbf BlueCoin eval platform Beamforming: ASR Test Setup 15 Integrated MEMS micro-array Test setup Inputs Male and female spoken words - at 0 Output WORDS 4 synchronous output channels : Omnidirectional microphone Basic Cardioid ASR Ready Strong Cardioid Gaussian White Noise - at 90 NOISE Recorded words are sent to Google ASR and recognition data are collected
ASR confidence Beamforming osxacousticbf Beamforming: ASR Test Results 16 omnidirectional cardioid ASR strong Signal to Noise ratio
Beamforming profiling MIPS footprint has been computed when input signal is in PDM or PCM format. Both generic distance and the special case when d = 2.1 cm are shown. PCM input is possible only when d = 2.1 cm PDM clock is considered equal to 1.024 MHz, different clock speed may result in different MIPS count Audio frame is equal to 8 ms. RAM footprints varies with the algorithm level used Profiling done using IAR Embedded workbench, version 7.70; optimization set to High Speed, based on target STM32F446 @ 168MHz
Sound Source Localization Signals are acquired by one or two couples of microphones in order to estimate the sound Direction of Arrival (DoA) Angle α = Direction of Arrival α
Beamforming osxacousticsl Sound Localization Algorithm Options 19 Scalable library allows MIPS vs resolution trade-off Selectable angle resolution, up to 1 degree theoretical Selectable Algorithm Two algorithms implemented XCORR: Supports cm-sized microphone arrays low-mips and low-resolution GCC-PHAT: Supports mm-sized Differential Arrays A simple Voice Activity Detector is included, based on energy threshold. Avoids false recognitions in case of low signal energy
Beamforming osxacousticsl Source Localization 20 Application considerations Range: Due to spatial symmetry 2 microphones cover a range of 180 4 microphones cover a range of 360 MIPS Performance: On a typical Home application source localization may run as a low priority task. Depending on the use case, localization info may not require continuous updates (e.g. few times per second)
Source Localization Profiling RAM, FLASH and MIPS footprint has been computed for a continuous execution of the source localization library at every input frame In typical applications a simple and effective optimization can be implemented by reducing the frequency of localization update, leading to a lower MIPS count. Profiling was done using IAR Embedded Workbench, version 7.70; optimization set to High Speed, based on target STM32F446 @ 168MHz
Beamforming osxacousticec Acoustic Echo Cancellation 22 Removes echo of playback audio in speech capture application Single Microphone application STM32 is connected to both the microphone and the loudspeaker Known Audio Source e. g. music / voice AEC (estimates room reverberation) Reverberant Room The Open.AUDIO AEC library is an optimized STM32 port based on the Open Source project Speex: http://www.speex.org/
AEC - profiling Profiling has been done in order to estimate MIPS, RAM and FLASH consumption. RAM utilization depends on the chosen tail length and on the activated options. Profiling done using IAR Embedded workbench, version 7.70; optimization set to High Speed, based on STM32F446 @ 168MHz
BlueVoice: Audio Over BLE 24
BlueVoice at a Glance 25 The ST HW and SW solution for Ultra low-power Voice streaming over Bluetooth Low-Energy Bluetooth Low Energy MEMS Microphone Microcontroller BlueNRG RF connectivity STM32 Signal Processing and Application Firmware & Software BLUEVOICELINK BLE and microphone reference application based on STM32Cube OSXBLUEVOICE BLUEMICROSYSTEM BlueMS, BlueST-SDK Voice-over-BLE vendor-specific profile library for STM32 and BlueNRG Voice and Sensor data over a BLE link to an Android - ios Smartphone App Bluetooth Smart and Sensors Technology Application for Android and ios
BlueVoice Mapping over Standard BlueVoice Vendor Specific Profile Bluetooth 4.0 protocol stack 26 Central Unit (Master) Client Audio processing GAP configuration GATT configuration Peripheral Unit (Slave) Server Application: BlueVoice Profile (vendor specific) Generic Attribute Profile (GATT) Attribute Protocol Generic Access Profile (GAP) Security Manager Logical Link Control and Adaptation Protocol Server Client Host-Controller Interface Audio is exported as a Service Service Characteristic Service Characteristic Link Layer Direct Test Mode Physical Layer Characteristic Descriptor Characteristic Descriptor Bluetooth Low Energy Stack
BlueVoice Architecture: 27 Audio processing and transmission Server - TX Audio Acquisition PDM PDM to PCM conversion PCM Audio Compression Raw Data symmetrical architecture for bi-directional communication Audio Decompression Serial Audio Out USB, I2S, Client - RX
BlueVoice Applications Cloud based ASR Service 28 From version 4.4 KITKAT API 19 From version 8 BlueVoiceLink1 2.0.0 Central data rates 32kbps to maximize compatibility ADPCM @ 8 khz ST BlueMS 3.0.0 (or higher) ADPCM @8 or 16 khz osxbluevoice 2.0.0 BlueMicrosystem 2.2.0 BlueVoiceLink1 2.0.0 Peripheral osxbluevoice 2.0.0 osxbluevoice 2.0.0 NUCLEO F401 NUCLEO L476 NUCLEO F401 NUCLEO L476 SensorTile BlueCoin NUCLEO F401 NUCLEO L476 NUCLEO L053
ST BlueMS + Bluemicrosystem2 BlueVoice Service Integration in BlueMicrosystem environment 29 ST BlueMS available on Google Play and App Store Cloud-based ASR Service BlueVoiceLink1 2.0.0 - Peripheral osxbluevoice 2.0.0 8 khz ADPCM
BlueVoice Solutions 30 <64 kbps Demo >64 kbps R&D R&D 32/64 kbps 8-16 kbps Demo 32/64 kbps 32/64 kbps 32/64 kbps 8-16 kbps Demo <64 kbps R&D 8-10 kbps <64 kbps Demo R&D Compression scheme ADPCM Opus R&D R&D 32/64 kbps Simplex Half-duplex Full-duplex Full band music High quality music
Embedded HW Demo Solutions 31
Putting together SW libraries SmartAcoustic1 Example project in source code built on STM32Cube software technology Includes acoustic Beam Forming, Echo Cancellation, and Source Localization. Immediate test and performance evaluation 4-MEMS microphones array Source Localization User-selectable angle resolution User-selectable activation treshold Based on 4 MEMS microphones 360 localization range 32 reference audio Beamforming Acoustic Echo Cancellation User-selectable neam direction User-selectable beamforming algorithm Based on 4 MEMS microphones GUI highlights the chosen microphone couple Based on a single MEMS microphone Reference audio is stored on STM32 FLASH Uses Audio OUT to play back audio while streaming cleaned speech on USB
Blue coin 33
SmartAcoustic1 Evaluation system Software reference design Multi platform support Supports ODE expansion boards X-NUCLEO-CCA01M1 X-NUCLEO-CCA02M1 connected to a NUCLEO-F446RE board Supports BlueCoin Integrated Audio and Sensors platform
Support for BlueVoice 35 Embedded Voice Terminal Mobile Device Cloud based Services «Natural Language» Platform Signals Comm Interface
Samantha VUI Conversational Interface dev with Android and BlueCoin 36 CoinStation+BlueCoin 2x VL53L0X Asymmetric BlueVoice communication Your question transcription Your Question Voice input Beamforming 8 khz ADPCM 24 khz OPUS Your answer Cloud-based ASR Service Gesture Recognition To start communication Your Answer Voice output Computational Knowledge Engine
Marc HERVIEU IoT Marketing Manager Marc.Hervieu@st.com Questions? 37