Verbmobil – The Nuance Perspective

Transcrição

Verbmobil – The Nuance Perspective
Verbmobil – The Nuance Perspective
Stephan Kanthak, Automotive R&D
1
About Me
• 1993: joined Hermann Ney’s group
• 1998: joined Verbmobil project, responsible for:
– “Fast” German recognizer (10k)
– German recognizer, large vocabulary (30k, aka scenario 3/C)
• 2000: missed final Verbmobil meeting due to illness 
• 2002: co-founded AIXPLAIN AG
• 2005: freelancer
• 2006: joined AT&T labs in Florham Park, NJ, USA
• 2008: joined NUANCE Automotive in Aachen
2
Nuance Organization & Product Portfolio
Healthcare
Medical record
management and manual
transcription services in
healthcare
Enterprise
Mobile
Imaging
Customer service and call
center applications
Command and control
capabilities, and voice
search and messaging
applications for mobile
phones and automobiles
MFP scanning, PDF and
document automation
solutions
3
Global Embedded R&D Centers
Montréal
Boston
Detroit
Ithica
Merelbeke
Tokyo
Aachen
Ulm
Seoul
Shanghai
A global team about 170 engineers serving
Automotive/Embedded (ASR & TTS)
Research & Development
Professional Services
4
Embedded Speech Recognition – What for?
Most feature-rich mass-market deployments: automotive
Mobile Phones
CD
Navigation
DVD
Telematics
MP3
Rear-seat
Entertainment
Industrial,
Warehousing,
Military
Connected
Services
Games
consoles
Radio
Cellphones
(largest volumes)
5
Driving Safety
Deviation
2
0.5
0.4
1
Seconds
ed
Reduc n
io
Deviat
Faster
on
Reacti
0.3
0.2
0.1
PO
I
le
Si
ng
le
M
ul
tip
on
e
Ph
Au
di
o
10
PO
I
Si
ng
le
le
Navigation
0
tip
PO
I
Si
ng
le
tip
on
e
Ph
Au
di
o
le
Navigation
0
20
M
ul
3
Less
t io n
Distrac
30
on
e
6
40
Ph
9
50
Au
di
o
ed
Reduc d
a
Worklo
M
ul
high
15
gaze duration [%]
Distraction
Subjective Workload
12
Navigation
0.0
PO
I
Si
ng
le
M
ul
tip
on
e
Ph
Au
di
o
le
Navigation
0
low
No IVIS
Meters
Manual
Speech
Reaction Time
Source: 2008 In-car Distraction Study, University of Brunswick, Germany
6
VoCon Family Tree – A long history
2002
ASR1600
ASR1600
2003
2004
ASR3200v1
ASR3200v1
CREC
CREC
2005
2006
VoCon3200v2
VoCon3200v2
2007
2008
2009
VoCon3200v3
VoCon3200v3
Speech2Go
Speech2Go
2010
VoCon3200v4
VoCon3200v4
eVV4.5
eVV4.5
StarRec
StarRec
VoCon
VoConSF
SF
VoCon
VoConX3
X3
smARTspeak
smARTspeakXGT
XGT
Mobile
VoCon
VoConXGT
XGT
ELVIS/VSuite3
ELVIS/VSuite3
ELVIS/VSuite4
ELVIS/VSuite4
7
Summary of Nuance Embedded Technology Offers
• ASR: 24 languages
• TTS: 35 languages
• NLU
• “Light-Weight” Dialogue Management
• Connected services:
– Message (SMS / E-Mail) Dictation
– Open Voice Search: Voice-enabled Web Search
– Device Command & Control: Name Dialing, Music Selection, ...
8
ASR: Task Size Explosion
Trend: shift towards large scale problems
•
Command and Control (C&C), Phone Dialing: 1K
•
Music (MP3): 1K-100K
•
Points-of-Interest (POI): 100K-10M
•
Voice Destination Entry (VDE): 1M-100M
35
30
25
20
15
10
5
0
2005
2006
2007
2008
2009
2010
9
State-of-the-Art Speech Recognizer Architecture
• Most research and commercial speech recognizers today use a
variant of the same architecture, incl. VoCon
Language-specific
Acoustic
Model
Speech
signal
Noise
Reduction
Feature
Extraction
Search
Result
Spelling
Front End
Back End
Dictionary
G2P
CFG
Grammars
Application-specific
Compiler
NLU
SLM
Matcher
Post-processors
10
TTS: Nuance Vocalizer Product Family
Network
Automotive
Features
Vocalizer
Core
Engine
Voice
Models
Mobile
Language
Models
•
Family of products using flexible core engine
•
Feature set scales to different markets requirements
•
Voices and languages configurable by data only,
offering different sampling rates and quality levels
•
Voice and language development work fully shared
across different product builds
•
Support fluent mixing of recorded and generate
prompts
•
Largest voice and language portfolio in the industry
11
Nuance Vocalizer for Automotive Solution
• Based on state-of-the art technology and a strong service offer
Nuance Vocalizer provides solutions for all speech output needs
• German portfolio voice
Bitte geben Sie die Adresse noch einmal an. Fahren Sie nach Berlin?
Meinten Sie Albrechtstrasse? Ist Berlin, Albrechtstrasse 237 korrekt?
• US English custom voice
In 500 meters, turn right.
In six hundred meters slight right turn onto,
Proceed about four tenths of a mile to,
/+'hE.R+$dz.'2bE0R+g_'R+o&Ud /+.
/+m$.'dR+o&U.n$_'@.v$.nu /+.
Go straight onto, South State Street.
12
Vocalizer for Automotive Innovation
Multi-Lingual Speech Output | Examples
•
•
Navigation
–
–
Sie fahren nach « Wakefield »
Nach 100 metern rechts abbiegen, « King George Street »
–
–
Sie fahren nach «Saint-Etienne »
In zwei Meilen scharf nach links abbiegen, « Rue de Saint-Julien »
–
–
Sie fahren nach « Jérez de la Frontera »
Nach 500 metern rechts abbiegen, « Avenida Mediterráneo »
–
–
Ihr Ziel ist « San Giorgio a Cremano »
Nach 100 metern rechts abbiegen, « Via Francesco Petrarca »
Music
–
–
Der Künstler: « Enrique Iglesias »
Hier ist der Titel: « Il mare calmo della sera »
13
Vocalizer for Automotive Innovation
Multi-Lingual Speech Output | Examples
•
Travel
– Heute fahren wir nach Frankreich, «Paris, les Champs-Elysées, la Tour
Eiffel, le Louvre»
– Morgen reisen wir nach Spanien, «Barcelona, la Rambla, la Sagrada
Familia, el Museo Picasso»
– Vielleicht wollen wir auch gerne Italien sehen : « Firenze, il Duomo, il
Ponte Vecchio, la Galleria dell‘Academia »
– Auf unserer nächsten Reise zeige ich England, «London, the Houses of
Parliament, Trafalgar Square, Buckingham Palace»
– Da freue ich mich jetzt schon drauf
14
Where to Find & Test Nuance Mobile Products
Cars (>33M)
Personal Navigation Devices (>30M)
TomTom
Magellan
Garmin
Medion
Blaupunkt
Mio
Sony
Navigon
Falk
Apps (Million downloads)
15
Experience Nuance
experiencecommitment
experiencesatisfaction
experienceresults
experienceleadership
experiencespeech
what can nuance do for you?
16