Professional Documents
Culture Documents
Making an on-device
personal assistant a reality
Qualcomm Technologies, Inc.
Reasoning
Learn, infer context,
understanding and
behaviors to the
machines
Perception Action
Act intuitively, interact
Hear, see, and
naturally, and protect
observe
privacy
2
Advancing AI research to make on-device AI ubiquitous
A common platform is fundamental to scaling AI internally and across the industry
System architecture
Multi-task and multi-modal learning, sensor fusion, and cloud-edge systems
3
A true personal assistant
One of many use cases requiring a broad set of AI capabilities
System architecture
Multi-task and multi-modal learning, sensor fusion, and cloud-edge systems
4
v
Voice is the
transformative user
interface (UI) we’ve
been waiting for
Designed to be:
Always-on
Conversational
Personal
Private
Critical to create a
true virtual assistant
5
Voice UI components required for an end-to-end solution
Machine speech chain: listener and speaker
Text-to-speech Signal acquisition and playback
Speech Front-end
synthesis processing
95%
Human accuracy
70%
60% 62%
55%
50%
GMM: Gaussian Mixture Model, CNN: Convolutional Neural Network, RNN: Recurrent Neural Network
“As speech recognition accuracy goes from say 95% to 99%, all of us in the room will go from barely using it today to using it all
the time. Most people underestimate the difference between 95% and 99% accuracy — 99% is a gamechanger. No one wants
to wait 10 seconds for a response. Accuracy, followed by latency, are the two key metrics for a production speech system.”
— Andrew Ng 7
Voice UI is proliferating
across product categories
IoT XR
Smartphones Smart
and tablets speakers
In-car
entertainment
systems
Headphones TV
and headsets
Natural language
understanding News
(NLU)
Music
Text-to-speech
(TTS)
Weather Stocks
Natural language
understanding News
(NLU)
Music
Text-to-speech
(TTS)
Weather Stocks
On-device
Cloud tasks processing On-device tasks
Complex voice fallback of voice UI Automatic speech recognition
Training and model update Provides unique benefits Natural language processing
complementing the cloud
Knowledge base Always-on audio cognition
Services On-device training
Challenge
Providing the voice UI functionality Benefits
within the power/thermal envelope
Privacy
Instant response
Always-on
Device context
Offline raw data
Queries
11
4000
Frequency
3000
denoising
• Single or multiple mics
• Applicable for
◦ Two-way conversation DL-based DL-based denoising model
denoising trained with extensive speech
◦ Voice/speaker recognition
noise databases
◦ Keyword spotting
3000
• Robust in challenging Clean speech spectrogram
2000
interference and noise 1000
“If people were more generous,
scenarios there would be no need for welfare”
0
0.5 1 1.5 2 2.5 3
Time
12
WCD9330 Qualcomm® Voice
Activation (VA) Qualcomm
Voice
High accuracy, robust to background
noise, and supports multiple languages Activation
- 47% supports:
Qualcomm VA power consumption
Google Assistant
2014 2015 2016 2017
13
Qualcomm Voice Activation, Qualcomm WCD9330, Qualcomm WCD9335, and Qualcomm WCD9340 are products of Qualcomm Technologies, Inc. and/or its subsidiaries.
Automatic On-device automatic speech recognition (ASR)
speech
recognition
“Turn on the light”
Personalization—adaptation
Natural language
to individual accent and understanding (NLU) User intention
acoustic environment
Allows the same intention
to be expressed in multiple
ways
Adapted to each user’s
intent expressions
14
An end-to-end on-device voice UI example for smart homes
Demo of automatic speech recognition and natural language understanding
Large command set Intent understanding
Turn on the living room lights Turn on the kitchen light
Click the kitchen lights off Click kitchen light on
Turn off all lights Switch on light in the kitchen
Switch on the ceiling fan Turn the light on in the kitchen
Shut off the sprinklers
Start music NLU: These four phrases
Pause song
map to the same intent
Next track
Go back one
Play previous song
Turn speaker off
Increase temperature
16
Contextual intelligence is required for personalization
The fusion of many types of sensors and personal information
Sensor data On-device data Off-device data
Sensor fusion
Gyroscope Pulse C-V2X Apps
Live sentiment
Visual analysis analysis
A sunset over the ocean Strolling on the beach at sunset
in La Jolla in La Jolla talking with my son
and laughing
“Remember the time I was strolling with “I noticed that you are tired and stressed, I’m turning
my son after the party at La Jolla beach?” on the Rocky III soundtrack and navigating you
to the gym for a workout and sauna.”
“Yes I do, here is a picture you took of the sunset. “This music gets my blood going and a
Should I share it with your family group on WeChat?” workout and sauna will help me relieve stress.”
19
The first step to an on-device virtual assistant
Enabling on-device voice UI
Natural language
understanding (NLU) News
Music
Text-to-speech
(TTS)
Weather Stocks
20
Adding an “AI agent” to create a true virtual assistant
The on-device AI agent continuously learns personal knowledge and acts intuitively
Sensors
AI agent
21
Adding an “AI agent” to create a true virtual assistant
Contextualization allows personalization at acoustic, intent, and behavior levels
Sensors
22
Various kitchen
noise samples
Acoustic
event ML-based
acoustic event
detection classification
• ML techniques 50
0.9
Posterior-gram
are used to 100
0.8
0.7
◦ Classify acoustic 150
0.6
signals into a set of 200
0.5
predefined events 250
0.4
always-on
450
2 4 6 8 10 12
Object rustling Cupboard Dishes Glass jingling Walking Water tap running
24
Thank you!
Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Nothing in these materials is an offer to sell any of the References in this presentation to “Qualcomm” may mean Qualcomm
components or devices referenced herein. Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries
or business units within the Qualcomm corporate structure, as
©2018 Qualcomm Technologies, Inc. and/or its affiliated
applicable. Qualcomm Incorporated includes Qualcomm’s licensing
companies. All Rights Reserved.
business, QTL, and the vast majority of its patent portfolio. Qualcomm
Qualcomm and Snapdragon are trademarks of Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm
Incorporated, registered in the United States and other Incorporated, operates, along with its subsidiaries, substantially all of
countries. Other products and brand names may be Qualcomm’s engineering, research and development functions, and
trademarks or registered trademarks of their respective substantially all of its product and services businesses, including its
owners. semiconductor business, QCT.