VITHEA - INESC-ID

Transcrição

VITHEA - INESC-ID
VITHEA
An online system for distance treatment of aphasia
Annamaria Pompili, Alberto Abad, Isabel Trancoso,
Jose Fonseca, Isabel P. Martins, Gabriela Leal, Luisa Farrajota
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
Outline
Introduction
Aphasia language disorder
Classic therapeutic approaches
Motivations and goals
The Vithea System
Architectural overview
Client side
Server side
Evaluations and future work
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
2
Aphasia language disorder
Broca's area
Wernicke's area
Types of Aphasias
Non-fluent (a.k.a. Broca's aphasia):
Fluent (a.k.a. Wernicke's aphasia):
Example
”Walk dog”,
meaning:
“I will take the dog for a walk”
Example
“You know that smoodle pinkered and that I want
to get him round like you want before”,
meaning:
”The dog needs to go out so I will take him for a
walk”
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
3
Aphasia Language Disorder
Major causes:
➔
CVA, brain tumors, brain infections, car or work accidents
Increasingly frequent:
➔
estimated 200.000 new cases in UE each year
Economical impact:
communication disorders cost the US from $154 to $186 billion per year
➔ 2.5% to 3% of the G.N.P.
➔
Social impact:
➔
interpersonal relationships alteration, loss of autonomy, social restrictions
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
4
Classical therapeutical approaches
Common disorder in all aphasia syndromes:
➔
Word-retrieval problem
Figure 1
Figure 2
Word–picture matching exercises
Figure 1: Some images from
the original Snodgrass &
Vanderwart set
Figure 2: Example of the object–
colour decision task
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
5
Motivations and goals
Frequency of therapy is essential, but...
high costs of therapy
➔ reaching therapy centers can be uncomfortable and/or time-consuming
➔
Development of a Virtual Therapist for Aphasia Treatment
focused on word-retrieval problem
improve patients' quality of life
➔ lessen the cost for cares
➔
Main challenges:
People with physical impairments
➔
simple and intuitive User Interface
Complexity of ASR is exacerbated with aphasic speech:
hesitation
➔ repetitions
➔
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
6
The Vithea System: architectural overview
Web Browser/
Flash Application
Client
computer
TOMCAT
Server
Internet
AUDIMUS
Engine
Web Application
Server
(JSP/Servlet)
Automatic Speech
Recognition System
MySql
Database Management
System
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
7
The Vithea System: Patient side
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
8
The Vithea System: Patient side
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
9
The Vithea System: Patient side
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
10
The Vithea System
Web Browser/
Flash Application
Client
computer
TOMCAT
Server
Internet
AUDIMUS
Engine
Web Application
Server
(JSP/Servlet)
Automatic Speech
Recognition System
MySql
Database Management
System
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
11
The Vithea System
Web Browser/
Flash Application
Client
computer
TOMCAT
Server
Internet
AUDIMUS
Engine
Web Application
Server
(JSP/Servlet)
Automatic Speech
Recognition System
MySql
Database Management
System
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
12
The Vithea System
Web Browser/
Flash Application
Client
computer
TOMCAT
Server
Internet
AUDIMUS
Engine
Web Application
Server
(JSP/Servlet)
Automatic Speech
Recognition System
MySql
Database Management
System
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
13
The Vithea System
Web Browser/
Flash Application
Client
computer
TOMCAT
Server
Internet
AUDIMUS
Engine
Web Application
Server
(JSP/Servlet)
Automatic Speech
Recognition System
MySql
Database Management
System
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
14
The Vithea System
Web Browser/
Flash Application
Client
computer
TOMCAT
Server
Internet
AUDIMUS
Engine
Web Application
Server
(JSP/Servlet)
Automatic Speech
Recognition System
MySql
Database Management
System
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
15
The Vithea System: Patient side
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
16
The Vithea System: Patient side
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
17
The Vithea System: Clinician side
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
18
The Vithea System: Clinician side
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
19
The Vithea System: Clinician side
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
20
The Vithea System: Clinician side
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
21
The Vithea System: Clinician side
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
22
The Vithea System: Clinician side
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
23
AUDIMUS, the speech recognition module
Structure of the Audimus recognizer
hybrid recognizer: combines Hidden Markov Models (HMM) with
Multilayer Perceptrons (MLP)
➔ trained on 3 distinct feature sets (PLP, Rasta, MSG)
➔ acoustic models trained with
➔
57 hours of Broadcast News downsampled at 8 kHz
58 hours of mixed mobile and fixed telephone data
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
24
Keyword Spotting approaches
Acoustic match of the audio data with keyword models in
contrast to a background (BG) model
Large vocabulary continuous speech recognition (LVCSR)
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
25
Keywords model in contrast to BG model
BG model must provide:
low recognition likelihoods for keywords
➔ high likelihoods for out-of-vocabulary words
➔
2 possible acoustic matches:
phoneme loop network
➔ a-posteriori probability
➔
phoneme classification network
posterior probability of other phones
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
26
Large vocabulary continuous speech recognition
Search for the target keyword in the recognition result
➔
it is possible to search in several hypothesis in parallel (n-bests lists,
lattices, confusion networks)
allows improved performance compared to searching in the raw output
result
training process requires large amounts of data
➔ use fixed large vocabularies, when a keyword is not in the dictionary it
is never detected
➔
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
27
Preliminary evaluations
2 sub-sets of the Portuguese Speech Dat II corpus:
Development set – 3334 utterances
➔ Evaluation set – 481 utterances
➔ N. of keywords is 27
➔
➔
promising performance indicators
achieved by 1 approach in terms of
Equal Error Rate (EER),
False Alarm (FA),
False Rejection (FR)
False Rejection probability (in %)
Experimental results:
Detection Error Trade-off (DET) curves
False Alarm probability (in %)
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
28
Evaluations - corpus
Evaluation data:
collected from therapy sessions
➔ 8 patients
➔ each session consists of naming exercises with 103 objects per patient
➔ 2 inexpensive microphones: built-in headset and table-top microphone
➔
only the sessions recorded with the headset were considered
segmentation and word-level transcriptions manually produced, totaling
996 segments
➔ the complete evaluation corpus has a duration of approximately
1 hour and 20 minutes
➔
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
29
Evaluations - criteria
Correctness:
word naming exercise is considered to be completed correctly whenever
the target word is spoken
➔ no matter of its position or amount of silence before the valid answer
➔
Extended word list
in addition to the canonical valid answer
➔ contains most frequent synonyms and diminutives
➔ total KWS vocabulary of 252 words
➔
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
30
Evaluations - results
1
➔
Average word naming score
Preliminary evaluations
Global evaluation
Pearson’s coefficient between
human and automatic evaluation:
0.9043
Human
0.9
Auto
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
➔
Individual evaluation
Remarkable performance variability in
terms of FA, FR depending from the
specific patient
most common cause for FA:
presence of many nonexistent words
phonetically close to the target ones,
the stressed syllable often pronounced
right
False alarm / false rejection rate
1
2
3
4
Patient
5
0.5
6
7
8
False alarm
0.45
False rejection
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
Patient
5
6
7
8
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
31
Evaluations - results
1
Average word naming score
Customized approach:
based on the user profile
➔ word detector calibrated following
a 5-fold cross-validation strategy
➔
Global evaluation
Pearson’s coefficient between
human and automatic evaluation:
0.9652
➔
Individual evaluation
More balanced performance (in
terms of FA and FR ratios) is
observed for most patients
Auto
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
False alarm / false rejection rate
➔
Human
0.9
2
3
4
Patient
5
0.5
6
7
8
False alarm
0.45
False rejection
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
Patient
5
6
7
8
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
32
Conclusions
Speech recognition technology contributed to build up a system
designed to support the recovery from a particular communication disorder.
The virtual therapist has been designed following relevant accessibility
principles tailored to the particular category of users targeted by the
system.
Early experiments conducted to evaluate ASR performance with speech
from aphasic patients yielded quite promising results.
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
33
Future work
Implement new exercises, incorporate tools like goodness of
pronunciation
Providing help to the patient, both semantic and phonological
Integrating Text To Speech synthesis
Incorporating intelligent animated agent
Extend the system for the treatment of other forms of speech disorders
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
34
Thanks for the attention
http://vithea.l2f.inesc-id.pt
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
L2 F - Spoken Language Systems Laboratory
http://vithea.l2f.inesc-id.pt
35