Verbmobil – The Nuance Perspective
Transcrição
Verbmobil – The Nuance Perspective
Verbmobil – The Nuance Perspective Stephan Kanthak, Automotive R&D 1 About Me • 1993: joined Hermann Ney’s group • 1998: joined Verbmobil project, responsible for: – “Fast” German recognizer (10k) – German recognizer, large vocabulary (30k, aka scenario 3/C) • 2000: missed final Verbmobil meeting due to illness • 2002: co-founded AIXPLAIN AG • 2005: freelancer • 2006: joined AT&T labs in Florham Park, NJ, USA • 2008: joined NUANCE Automotive in Aachen 2 Nuance Organization & Product Portfolio Healthcare Medical record management and manual transcription services in healthcare Enterprise Mobile Imaging Customer service and call center applications Command and control capabilities, and voice search and messaging applications for mobile phones and automobiles MFP scanning, PDF and document automation solutions 3 Global Embedded R&D Centers Montréal Boston Detroit Ithica Merelbeke Tokyo Aachen Ulm Seoul Shanghai A global team about 170 engineers serving Automotive/Embedded (ASR & TTS) Research & Development Professional Services 4 Embedded Speech Recognition – What for? Most feature-rich mass-market deployments: automotive Mobile Phones CD Navigation DVD Telematics MP3 Rear-seat Entertainment Industrial, Warehousing, Military Connected Services Games consoles Radio Cellphones (largest volumes) 5 Driving Safety Deviation 2 0.5 0.4 1 Seconds ed Reduc n io Deviat Faster on Reacti 0.3 0.2 0.1 PO I le Si ng le M ul tip on e Ph Au di o 10 PO I Si ng le le Navigation 0 tip PO I Si ng le tip on e Ph Au di o le Navigation 0 20 M ul 3 Less t io n Distrac 30 on e 6 40 Ph 9 50 Au di o ed Reduc d a Worklo M ul high 15 gaze duration [%] Distraction Subjective Workload 12 Navigation 0.0 PO I Si ng le M ul tip on e Ph Au di o le Navigation 0 low No IVIS Meters Manual Speech Reaction Time Source: 2008 In-car Distraction Study, University of Brunswick, Germany 6 VoCon Family Tree – A long history 2002 ASR1600 ASR1600 2003 2004 ASR3200v1 ASR3200v1 CREC CREC 2005 2006 VoCon3200v2 VoCon3200v2 2007 2008 2009 VoCon3200v3 VoCon3200v3 Speech2Go Speech2Go 2010 VoCon3200v4 VoCon3200v4 eVV4.5 eVV4.5 StarRec StarRec VoCon VoConSF SF VoCon VoConX3 X3 smARTspeak smARTspeakXGT XGT Mobile VoCon VoConXGT XGT ELVIS/VSuite3 ELVIS/VSuite3 ELVIS/VSuite4 ELVIS/VSuite4 7 Summary of Nuance Embedded Technology Offers • ASR: 24 languages • TTS: 35 languages • NLU • “Light-Weight” Dialogue Management • Connected services: – Message (SMS / E-Mail) Dictation – Open Voice Search: Voice-enabled Web Search – Device Command & Control: Name Dialing, Music Selection, ... 8 ASR: Task Size Explosion Trend: shift towards large scale problems • Command and Control (C&C), Phone Dialing: 1K • Music (MP3): 1K-100K • Points-of-Interest (POI): 100K-10M • Voice Destination Entry (VDE): 1M-100M 35 30 25 20 15 10 5 0 2005 2006 2007 2008 2009 2010 9 State-of-the-Art Speech Recognizer Architecture • Most research and commercial speech recognizers today use a variant of the same architecture, incl. VoCon Language-specific Acoustic Model Speech signal Noise Reduction Feature Extraction Search Result Spelling Front End Back End Dictionary G2P CFG Grammars Application-specific Compiler NLU SLM Matcher Post-processors 10 TTS: Nuance Vocalizer Product Family Network Automotive Features Vocalizer Core Engine Voice Models Mobile Language Models • Family of products using flexible core engine • Feature set scales to different markets requirements • Voices and languages configurable by data only, offering different sampling rates and quality levels • Voice and language development work fully shared across different product builds • Support fluent mixing of recorded and generate prompts • Largest voice and language portfolio in the industry 11 Nuance Vocalizer for Automotive Solution • Based on state-of-the art technology and a strong service offer Nuance Vocalizer provides solutions for all speech output needs • German portfolio voice Bitte geben Sie die Adresse noch einmal an. Fahren Sie nach Berlin? Meinten Sie Albrechtstrasse? Ist Berlin, Albrechtstrasse 237 korrekt? • US English custom voice In 500 meters, turn right. In six hundred meters slight right turn onto, Proceed about four tenths of a mile to, /+'hE.R+$dz.'2bE0R+g_'R+o&Ud /+. /+m$.'dR+o&U.n$_'@.v$.nu /+. Go straight onto, South State Street. 12 Vocalizer for Automotive Innovation Multi-Lingual Speech Output | Examples • • Navigation – – Sie fahren nach « Wakefield » Nach 100 metern rechts abbiegen, « King George Street » – – Sie fahren nach «Saint-Etienne » In zwei Meilen scharf nach links abbiegen, « Rue de Saint-Julien » – – Sie fahren nach « Jérez de la Frontera » Nach 500 metern rechts abbiegen, « Avenida Mediterráneo » – – Ihr Ziel ist « San Giorgio a Cremano » Nach 100 metern rechts abbiegen, « Via Francesco Petrarca » Music – – Der Künstler: « Enrique Iglesias » Hier ist der Titel: « Il mare calmo della sera » 13 Vocalizer for Automotive Innovation Multi-Lingual Speech Output | Examples • Travel – Heute fahren wir nach Frankreich, «Paris, les Champs-Elysées, la Tour Eiffel, le Louvre» – Morgen reisen wir nach Spanien, «Barcelona, la Rambla, la Sagrada Familia, el Museo Picasso» – Vielleicht wollen wir auch gerne Italien sehen : « Firenze, il Duomo, il Ponte Vecchio, la Galleria dell‘Academia » – Auf unserer nächsten Reise zeige ich England, «London, the Houses of Parliament, Trafalgar Square, Buckingham Palace» – Da freue ich mich jetzt schon drauf 14 Where to Find & Test Nuance Mobile Products Cars (>33M) Personal Navigation Devices (>30M) TomTom Magellan Garmin Medion Blaupunkt Mio Sony Navigon Falk Apps (Million downloads) 15 Experience Nuance experiencecommitment experiencesatisfaction experienceresults experienceleadership experiencespeech what can nuance do for you? 16