Onto.PT: towards the automatic construction of a lexical ontology for

Transcrição

Onto.PT: towards the automatic construction of a lexical ontology for
Onto.PT: towards the automatic construction of a
lexical ontology for Portuguese
Hugo Gonçalo Oliveira1
Paulo Gomes
{hroliv,pgomes}@dei.uc.pt
Cognitive & Media Systems Group
CISUC, Universidade de Coimbra
June 8, 2012
1
supported by the FCT scholarship grant SFRH/BD/44955/2008, co-funded by FSE
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
1 / 46
Introduction
Request: Músicos famosos com carreira no cinema
Snippet A
Snippet B
Snippet C
Snippet D
David Bowie é um músico famoso que também fez carreira no
cinema.
Elvis Presley foi um musicista e actor célebre, conhecido
como o Rei do Rock.
Amália Rodrigues foi talvez a mais ilustre fadista
portuguesa. Para além de cantar, durante o seu percurso
profissional participou em vários filmes.
Jo~
ao apanhou a carreira, famosa por chegar atrasada, para
ir ao cinema, na cidade.
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
2 / 46
Introduction
Request: Músicos famosos com carreira no cinema
Snippet A
Snippet B
Snippet C
Snippet D
David Bowie é um músico famoso que também fez carreira
no cinema .
Rob Zombie é um musicista americano e realizador de
filmes de terror.
Amália Rodrigues foi talvez a mais ilustre fadista
portuguesa. Para além de cantar, durante o seu percurso
profissional participou em várias pelı́culas.
Jo~
ao apanhou a carreira , famosa por chegar atrasada,
para ir ao cinema , na cidade.
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
3 / 46
Introduction
Request: Músicos famosos com carreira no cinema
Snippet A
Snippet B
Snippet C
Snippet D
David Bowie é um músico famoso que também fez carreira
no cinema .
Rob Zombie é um musicista americano e realizador de
filmes de terror.
Amália Rodrigues foi talvez a mais ilustre fadista
portuguesa. Para além de cantar, durante o seu percurso
profissional participou em várias pelı́culas.
Jo~
ao apanhou a carreira , famosa por chegar atrasada,
para ir ao cinema , na cidade.
musicista synonym-of músico
realizador producer-of cinema
filme part-of cinema
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
3 / 46
Introduction
Request: Músicos famosos com carreira no cinema
Snippet A
Snippet B
Snippet C
Snippet D
David Bowie é um músico famoso que também fez carreira
no cinema .
Rob Zombie é um musicista americano e realizador de
filmes de terror.
Amália Rodrigues foi talvez a mais ilustre fadista
portuguesa. Para além de cantar, durante o seu percurso
profissional participou em várias pelı́culas.
Jo~
ao apanhou a carreira , famosa por chegar atrasada,
para ir ao cinema , na cidade.
musicista synonym-of músico
realizador producer-of cinema
filme part-of cinema
ilustre synonym-of famoso
fadista hyponym-of músico
pelı́cula synonym-of filme
percurso profissional synonym-of carreira
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
3 / 46
Introduction
Request: Músicos famosos com carreira no cinema
Snippet A
Snippet B
Snippet C
Snippet D
David Bowie é um músico famoso que também fez carreira
no cinema .
Rob Zombie é um musicista americano e realizador de
filmes de terror.
Amália Rodrigues foi talvez a mais ilustre fadista
portuguesa. Para além de cantar, durante o seu percurso
profissional participou em várias pelı́culas.
Jo~
ao apanhou a carreira , famosa por chegar atrasada,
para ir ao cinema , na cidade.
musicista synonym-of músico
realizador producer-of cinema
carreira hyponym-of actividade
filme part-of cinema
carreira hyponym-of transporte
ilustre synonym-of famoso
fadista hyponym-of músico
pelı́cula synonym-of filme
percurso profissional synonym-of carreira
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
3 / 46
Introduction
Request: Músicos famosos com carreira no cinema
Snippet A
Snippet B
Snippet C
Snippet D
David Bowie é um músico famoso que também fez carreira
no cinema .
Rob Zombie é um musicista americano e realizador de
filmes de terror.
Amália Rodrigues foi talvez a mais ilustre fadista
portuguesa. Para além de cantar, durante o seu percurso
profissional participou em várias pelı́culas.
Jo~
ao apanhou a carreira , famosa por chegar atrasada,
para ir ao cinema , na cidade.
musicista synonym-of músico
realizador producer-of cinema
carreira hyponym-of actividade
filme part-of cinema
carreira hyponym-of transporte
ilustre synonym-of famoso
cinema hyponym-of arte
fadista hyponym-of músico
cinema hyponym-of edifı́cio
pelı́cula synonym-of filme
percurso profissional synonym-of carreira
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
3 / 46
Introduction
Request: Músicos famosos com carreira no cinema
Snippet A
David Bowie é um músico
no cinema .
Snippet B
Rob Zombie é um musicista americano e realizador de
filmes de terror.
Snippet C
Amália Rodrigues foi talvez a mais ilustre fadista
portuguesa. Para além de cantar, durante o seu
percurso profissional participou em várias pelı́culas .
Snippet D
Jo~
ao apanhou a carreira , famosa por chegar atrasada,
para ir ao cinema , na cidade.
famoso que também fez carreira
musicista synonym-of músico
realizador producer-of cinema
carreira hyponym-of actividade
filme part-of cinema
carreira hyponym-of transporte
ilustre synonym-of famoso
cinema hyponym-of arte
fadista hyponym-of músico
cinema hyponym-of edifı́cio
pelı́cula synonym-of filme
percurso profissional synonym-of carreira
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
3 / 46
Introduction
Lexical Knowledge Bases (LKBs)
Resources for natural language processing (NLP)
Cover the whole language, not a specific domain
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
4 / 46
Introduction
Lexical Knowledge Bases (LKBs)
Resources for natural language processing (NLP)
Cover the whole language, not a specific domain
Structured on words and meanings
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
4 / 46
Introduction
Lexical Knowledge Bases (LKBs)
Resources for natural language processing (NLP)
Cover the whole language, not a specific domain
Structured on words and meanings
I
Meaning is inherently conceptual...
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
4 / 46
Introduction
Lexical Knowledge Bases (LKBs)
Resources for natural language processing (NLP)
Cover the whole language, not a specific domain
Structured on words and meanings
I
I
Meaning is inherently conceptual...
[Hirst, 2004]: Ontology + lexicon = Lexical ontology
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
4 / 46
Introduction
Lexical Knowledge Bases (LKBs)
Resources for natural language processing (NLP)
Cover the whole language, not a specific domain
Structured on words and meanings
I
I
Meaning is inherently conceptual...
[Hirst, 2004]: Ontology + lexicon = Lexical ontology
Essential in the development of NLP tools for a language
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
4 / 46
Introduction
Lexical Knowledge Bases (LKBs)
Resources for natural language processing (NLP)
Cover the whole language, not a specific domain
Structured on words and meanings
I
I
Meaning is inherently conceptual...
[Hirst, 2004]: Ontology + lexicon = Lexical ontology
Essential in the development of NLP tools for a language
See Princeton WordNet [Fellbaum, 1998]!
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
4 / 46
Introduction
WordNet: dictionary + thesaurus
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
5 / 46
Introduction
WordNet: dictionary + thesaurus
Applications
Writing aids
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
5 / 46
Introduction
WordNet: dictionary + thesaurus
Applications
Writing aids
Determining similarities
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
5 / 46
Introduction
WordNet: dictionary + thesaurus
Applications
Writing aids
Determining similarities
Word sense disambiguation
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
5 / 46
Introduction
WordNet: dictionary + thesaurus
Applications
Writing aids
Determining similarities
Word sense disambiguation
Natural language generation
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
5 / 46
Introduction
WordNet: dictionary + thesaurus
Applications
Writing aids
Question answering
Determining similarities
Automatic summarization
Word sense disambiguation
Machine translation
Natural language generation
...
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
5 / 46
Contents
1
Introduction
2
Related resources
3
Relation acquisition
4
Synset discovery
5
Ontologisation of semantic relations
6
Approach summary
7
Presenting Onto.PT v.0.3.1
8
Concluding remarks
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
6 / 46
Related resources
Portuguese LKBs
Wordnets
WordNet.PTa [Marrafa, 2002]
I
According to the EuroWordNet [Vossen, 1997] model
MWN.PTb
I
I
a
b
In the scope of MultiWordNet [Pianta et al., 2002]
About 69,000 synsets, 69,000 relations
http://www.clul.ul.pt/clg/eng/wordnetpt/index.html
http://mwnpt.di.fc.ul.pt/
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
7 / 46
Related resources
Portuguese LKBs
Wordnets
WordNet.PTa [Marrafa, 2002]
I
I
I
I
According to the EuroWordNet [Vossen, 1997] model
Not public, browseable through the Web
Handcrafted, as Princeton WordNet
Only covers some domains
MWN.PTb
I
I
a
b
In the scope of MultiWordNet [Pianta et al., 2002]
About 69,000 synsets, 69,000 relations
http://www.clul.ul.pt/clg/eng/wordnetpt/index.html
http://mwnpt.di.fc.ul.pt/
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
7 / 46
Related resources
Portuguese LKBs
Wordnets
WordNet.PTa [Marrafa, 2002]
I
I
I
I
According to the EuroWordNet [Vossen, 1997] model
Not public, browseable through the Web
Handcrafted, as Princeton WordNet
Only covers some domains
MWN.PTb
I
I
I
I
I
I
a
b
In the scope of MultiWordNet [Pianta et al., 2002]
About 69,000 synsets, 69,000 relations
Not public, browsable through the Web and purchasable
Handcrafted, as Princeton WordNet
Only covers nouns
Several lexical gaps
http://www.clul.ul.pt/clg/eng/wordnetpt/index.html
http://mwnpt.di.fc.ul.pt/
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
7 / 46
Related resources
Portuguese LKBs (cont.)
Public thesauri, structured on synsets
TePa [Maziero et al., 2008]
I
I
Synset-base of WordNet.Br [Dias da Silva et al., 2002]
About 20,000 synsets
OpenThesaurus.PTb
I
a
b
Suggestions for OpenOffice writer
http://www.nilc.icmc.usp.br/tep2/
http://openthesaurus.caixamagica.pt/
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
8 / 46
Related resources
Portuguese LKBs (cont.)
Public thesauri, structured on synsets
TePa [Maziero et al., 2008]
I
I
I
I
Synset-base of WordNet.Br [Dias da Silva et al., 2002]
About 20,000 synsets
Handcrafted
Only covers synonymy and antonymy
OpenThesaurus.PTb
I
a
b
Suggestions for OpenOffice writer
http://www.nilc.icmc.usp.br/tep2/
http://openthesaurus.caixamagica.pt/
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
8 / 46
Related resources
Portuguese LKBs (cont.)
Public thesauri, structured on synsets
TePa [Maziero et al., 2008]
I
I
I
I
Synset-base of WordNet.Br [Dias da Silva et al., 2002]
About 20,000 synsets
Handcrafted
Only covers synonymy and antonymy
OpenThesaurus.PTb
I
I
I
I
a
b
Suggestions for OpenOffice writer
Handcrafted (collaboratively)
Only covers synonymy
Too small (≈ 4,000 synsets)
http://www.nilc.icmc.usp.br/tep2/
http://openthesaurus.caixamagica.pt/
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
8 / 46
Related resources
Portuguese LKBs (cont.)
(Enhanced) public dictionaries
Wiktionary.PTa
I
About 180,000 entries (not all in Portuguese!)
Dicionário Abertob [Simões and Farinha, 2011]
I
I
a
b
Electronic version of a dictionary from 1913
About 128,000 entries
http://pt.wiktionary.org/
http://www.dicionario-aberto.net/search
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
9 / 46
Related resources
Portuguese LKBs (cont.)
(Enhanced) public dictionaries
Wiktionary.PTa
I
I
I
I
About 180,000 entries (not all in Portuguese!)
Handcrafted (collaboratively)
(Still) very incomplete
Few explicit and unambiguous semantic information
Dicionário Abertob [Simões and Farinha, 2011]
I
I
a
b
Electronic version of a dictionary from 1913
About 128,000 entries
http://pt.wiktionary.org/
http://www.dicionario-aberto.net/search
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
9 / 46
Related resources
Portuguese LKBs (cont.)
(Enhanced) public dictionaries
Wiktionary.PTa
I
I
I
I
About 180,000 entries (not all in Portuguese!)
Handcrafted (collaboratively)
(Still) very incomplete
Few explicit and unambiguous semantic information
Dicionário Abertob [Simões and Farinha, 2011]
I
I
I
I
a
b
Electronic version of a dictionary from 1913
About 128,000 entries
Static resource
Few explicit and unambiguous semantic information
http://pt.wiktionary.org/
http://www.dicionario-aberto.net/search
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
9 / 46
Related resources
Portuguese LKBs (cont.)
Other
PAPELa [Gonçalo Oliveira et al., 2010]
a
http://www.linguateca.pt/PAPEL
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
10 / 46
Related resources
Portuguese LKBs (cont.)
Other
PAPELa [Gonçalo Oliveira et al., 2010]
I
I
a
Lexical-semantic resource extracted automatically from one dictionary
About 102,000 words, 190,000 relations
http://www.linguateca.pt/PAPEL
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
10 / 46
Related resources
Portuguese LKBs (cont.)
Other
PAPELa [Gonçalo Oliveira et al., 2010]
I
I
I
a
Lexical-semantic resource extracted automatically from one dictionary
About 102,000 words, 190,000 relations
Structured on words, not sense-aware
http://www.linguateca.pt/PAPEL
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
10 / 46
Related resources
Portuguese LKBs (cont.)
Other
PAPELa [Gonçalo Oliveira et al., 2010]
I
I
I
I
Lexical-semantic resource extracted automatically from one dictionary
About 102,000 words, 190,000 relations
Structured on words, not sense-aware
Used in several NLP tasks:
F
F
F
F
F
F
a
Computing similarity between lexical items
Adaptation of textual contents for poor literacy readers
Generation of distractors for cloze questions
Creation of knowledge bases for question answering and generation
Creation of sentiment lexicons
...
http://www.linguateca.pt/PAPEL
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
10 / 46
Related resources
Onto.PT
New lexical ontology for Portuguese
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
11 / 46
Related resources
Onto.PT
New lexical ontology for Portuguese
Constructed automatically
Exploitation of public resources
I
I
I
Thesauri
Dictionaries/encyclopedias
Corpora
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
11 / 46
Related resources
Onto.PT
New lexical ontology for Portuguese
Constructed automatically
Exploitation of public resources
I
I
I
Thesauri
Dictionaries/encyclopedias
Corpora
Structure according to the wordnet model
I
I
Synsets: groups of synonym words → concepts
Connected by semantic relations
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
11 / 46
Related resources
Onto.PT
New lexical ontology for Portuguese
Constructed automatically
Exploitation of public resources
I
I
I
Thesauri
Dictionaries/encyclopedias
Corpora
Structure according to the wordnet model
I
I
Synsets: groups of synonym words → concepts
Connected by semantic relations
Three independent stages
1
2
3
Acquisition of semantic relations
Discovery of synsets
Integration (ontologisation) of relations
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
11 / 46
Relation acquisition
Relation extraction from dictionaries
Why dictionaries?
I
Earlier works
[Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980]
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
12 / 46
Relation acquisition
Relation extraction from dictionaries
Why dictionaries?
I
I
Earlier works
[Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980]
Main sources of general lexical information of a language
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
12 / 46
Relation acquisition
Relation extraction from dictionaries
Why dictionaries?
I
I
I
Earlier works
[Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980]
Main sources of general lexical information of a language
Structured on words and meanings
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
12 / 46
Relation acquisition
Relation extraction from dictionaries
Why dictionaries?
I
I
I
I
Earlier works
[Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980]
Main sources of general lexical information of a language
Structured on words and meanings
(Try to) Cover the whole language
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
12 / 46
Relation acquisition
Relation extraction from dictionaries
Why dictionaries?
I
I
I
I
I
Earlier works
[Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980]
Main sources of general lexical information of a language
Structured on words and meanings
(Try to) Cover the whole language
Created by experts
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
12 / 46
Relation acquisition
Relation extraction from dictionaries
Why dictionaries?
I
I
I
I
I
I
Earlier works
[Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980]
Main sources of general lexical information of a language
Structured on words and meanings
(Try to) Cover the whole language
Created by experts
Simple structure, (almost) predictable vocabulary
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
12 / 46
Relation acquisition
Relation extraction from dictionaries
Why dictionaries?
I
I
I
I
I
I
Earlier works
[Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980]
Main sources of general lexical information of a language
Structured on words and meanings
(Try to) Cover the whole language
Created by experts
Simple structure, (almost) predictable vocabulary
But..
I
Not ready for being used as LKBs
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
12 / 46
Relation acquisition
Relation extraction from dictionaries
Why dictionaries?
I
I
I
I
I
I
Earlier works
[Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980]
Main sources of general lexical information of a language
Structured on words and meanings
(Try to) Cover the whole language
Created by experts
Simple structure, (almost) predictable vocabulary
But..
I
I
Not ready for being used as LKBs
Static resources
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
12 / 46
Relation acquisition
Relation extraction from dictionaries
Why dictionaries?
I
I
I
I
I
I
Earlier works
[Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980]
Main sources of general lexical information of a language
Structured on words and meanings
(Try to) Cover the whole language
Created by experts
Simple structure, (almost) predictable vocabulary
But..
I
I
Not ready for being used as LKBs
Static resources
CARTÃO: Semantic relations extracted from three dictionaries!
I
I
I
Dicionário PRO da Lı́ngua Portuguesa (DLP), through PAPEL
Dicionário Aberto (DA)
Wiktionary.PT
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
12 / 46
Relation acquisition
Pattern
o mesmo que
a[c]to ou efeito de
pessoa que
aquele que
conjunto de
espécie de
género/gênero de
variedade de
[a] parte do/da
qualidade de
qualidade do que é
estado de
natural ou habitante de/da/do
instrumento[,] para
.. produzid[o/a] por/pel[o/a]
o mesmo que
fazer
tornar
ter
o mesmo que
relativo a/à/ao
que se
que tem
diz-se de
relativo ou pertencente
habitante ou natural de
que não é/está
de modo
de maneira
de forma
o mesmo que
Gonçalo Oliveira & Gomes (CISUC)
POS
Noun
Noun
Noun
Noun
Noun
Noun
Noun
Noun
Noun
Noun
Noun
Noun
Noun
Noun
Noun
Verb
Verb
Verb
Verb
Adjective
Adjective
Adjective
Adjective
Adjective
Adjective
Adjective
Adjective
Adverb
Adverb
Adverb
Adverb
CARTÃO: regularities in dictionary definitions
DLP
Frequency
DA
Wikt.PT
0
3.851
1.320
1.148
1.004
798
29
455
445
777
663
299
536
94
155
0
1.680
1.359
467
0
1.236
1.602
2.698
2.066
1.647
0
485
398
49
30
0
10.627
2.501
47
3.357
316
2.846
4.148
621
433
775
543
223
0
284
146
166
1.294
1.672
519
2.685
5.554
1.599
4.291
738
9
0
608
2.261
9
3
182
Onto.PT
1.107
645
329
545
298
223
48
52
107
126
105
73
79
25
60
97
364
266
139
197
1.063
485
477
313
61
189
98
109
36
19
21
Relation
Synonymy
Causation
Hypernymy
Hypernymy
Member-of
Hypernymy
Hypernymy
Hypernymy
Part-of
Quality-of
Quality-of
State-of
Place-of
Purpose-of
Produtor
Synonymy
Causation
Causation
Property-of
Synonymy
Property-of
Property-of
Part-of
Property-of
Member-of
Place-of
Antonı́mia
Manner-of
Manner-of
Manner-of
Synonymy
June 8, 2012
13 / 46
Relation acquisition
CARTÃO: extraction examples
Extraction examples
candeia s.f. utensı́lio doméstico rústico usado para
iluminaç~
ao, com pavio abastecido a óleo
espiga s.f. parte das gramı́neas que contém os gr~
aos
inquietar v.t. causar ansiedade
severo adj. grave, crı́tico
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
14 / 46
Relation acquisition
CARTÃO: extraction examples
Extraction examples
candeia s.f. utensı́lio doméstico rústico usado para
iluminaç~
ao, com pavio abastecido a óleo
I
I
utensı́lio hypernym-of candeia
iluminação purpose-of candeia
espiga s.f. parte das gramı́neas que contém os gr~
aos
inquietar v.t. causar ansiedade
severo adj. grave, crı́tico
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
14 / 46
Relation acquisition
CARTÃO: extraction examples
Extraction examples
candeia s.f. utensı́lio doméstico rústico usado para
iluminaç~
ao, com pavio abastecido a óleo
I
I
utensı́lio hypernym-of candeia
iluminação purpose-of candeia
espiga s.f. parte das gramı́neas que contém os gr~
aos
I
I
espiga part-of gramı́nea
espiga contains grão
inquietar v.t. causar ansiedade
severo adj. grave, crı́tico
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
14 / 46
Relation acquisition
CARTÃO: extraction examples
Extraction examples
candeia s.f. utensı́lio doméstico rústico usado para
iluminaç~
ao, com pavio abastecido a óleo
I
I
utensı́lio hypernym-of candeia
iluminação purpose-of candeia
espiga s.f. parte das gramı́neas que contém os gr~
aos
I
I
espiga part-of gramı́nea
espiga contains grão
inquietar v.t. causar ansiedade
I
inquietar causation-of ansiedade
severo adj. grave, crı́tico
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
14 / 46
Relation acquisition
CARTÃO: extraction examples
Extraction examples
candeia s.f. utensı́lio doméstico rústico usado para
iluminaç~
ao, com pavio abastecido a óleo
I
I
utensı́lio hypernym-of candeia
iluminação purpose-of candeia
espiga s.f. parte das gramı́neas que contém os gr~
aos
I
I
espiga part-of gramı́nea
espiga contains grão
inquietar v.t. causar ansiedade
I
inquietar causation-of ansiedade
severo adj. grave, crı́tico
I
I
grave synonym-of severo
crı́tico synonym-of severo
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
14 / 46
Relation acquisition
CARTÃO: extraction results
CARTÃO contains:
I
I
Relation
Synonym-of
Hypernym-of
Part-of
Member-of
Causation-of
Producer-of
Purpose-of
Has-quality
Has-state
Place-of
Manner-of
Antonym-of
Property-of
About 155,000 lexical items
About 327,000 relations, including:
Args.
Quantity
n,n
v,v
adj,adj
adv,adv
n,n
n,n
n,adj
n,n
adj,n
n,n
adj,n
v,n
n,n
adj,n
n,n
v,n
v,adj
n,n
n,adj
n,n
n,n
adv,n
adv,adj
adj,adj
adj,n
adj,v
67,620
28,108
32,364
2,286
97,924
3,893
5,872
7,328
1,071
1,423
748
10,664
1,741
515
6,978
7,824
374
1,055
1,273
376
1,483
2,172
1,854
684
10,652
27,902
Gonçalo Oliveira & Gomes (CISUC)
Example
alegria,satisfação
esticar,estender
racional,filosófico
imediatamente,já
sentimento,afecto
núcleo,átomo
vı́cio,vicioso
aluno,escola
rural,campo
vı́rus,doença
horrı́vel,horror
mover,movimento
oliveira,azeitona
fonador,som
sustentação,mastro
calcular,cálculo
comprimir,compressivo
mórbido,morbidez
assı́duo,assiduidade
exaltação,desvairo
Equador,equatoriano
ociosamente,indolência
virtualmente,virtual
direito,torto
daltónico,daltonismo
musculoso,ter músculo
Onto.PT
(joy,satisfaction)
(to extend,to stretch)
(rational,philosophical)
(immediately,now)
(feeling,affection)
(nucleus,atom)
(addiction,addictive)
(student,school)
(rural,country)
(virus,disease)
(horrible,horror)
(to move,movement)
(olive tree,olive)
(phonetic,sound)
(support,mast)
(to calculate,calculation)
(to compress,compressive)
(morbid,morbidity)
(assiduous,assiduity)
(exaltation,rant)
(Ecuador,Ecuadorian)
(idly,indolence)
(virtually,virtual)
(straight,crooked)
(daltonic,daltonism)
(beefy,to have muscle)
June 8, 2012
15 / 46
Relation acquisition
CARTÃO: manual evaluation
Results of manual evaluation
100 instances per relation type/resource (300/type)
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
16 / 46
Relation acquisition
CARTÃO: manual evaluation
Results of manual evaluation
100 instances per relation type/resource (300/type)
2 judges for each instance
I
I
I
wrong instance (0)
wrong relation (1)
correct (2)
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
16 / 46
Relation acquisition
CARTÃO: manual evaluation
Results of manual evaluation
100 instances per relation type/resource (300/type)
2 judges for each instance
I
I
I
wrong instance (0)
wrong relation (1)
correct (2)
Relation
n synonym-of n
v synonym-of v
n hypernym-of n
v causation-of n
adj property-of v
Gonçalo Oliveira & Gomes (CISUC)
Judge
J1
J2
J1
J2
J1
J2
J1
J2
J1
J2
0
2 (.01)
3 (.01)
6 (.02)
7 (.02)
11 (.04)
16 (.05)
12 (.04)
15 (.05)
67 (.22)
39 (.13)
Total
1
0
1 (≈0)
0
3 (.01)
19 (.06)
21 (.07)
14 (.05)
18 (.06)
21 (.07)
30 (.10)
Onto.PT
2
298 (.99)
296 (.99)
294 (.98)
290 (.97)
270 (.90)
263 (.88)
274 (.91)
267 (.89)
212 (.71)
231 (.77)
IAA
κ
0.99
0.66
0.98
0.68
0.93
0.64
0.93
0.60
0.81
0.56
June 8, 2012
16 / 46
Relation acquisition
Discussion
Lexical graph
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
17 / 46
Relation acquisition
Discussion
Lexical graph
massa synonym-of povo ∧ massa hypernym-of tortellini
dinheiro synonym-of cacau ∧ fruto hypernym-of cacau
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
17 / 46
Relation acquisition
Discussion
Lexical graph
massa synonym-of povo ∧ massa hypernym-of tortellini
→ povo hypernym-of tortellini
dinheiro synonym-of cacau ∧ fruto hypernym-of cacau
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
17 / 46
Relation acquisition
Discussion
Lexical graph
massa synonym-of povo ∧ massa hypernym-of tortellini
→ povo hypernym-of tortellini
dinheiro synonym-of cacau ∧ fruto hypernym-of cacau
→ fruto hypernym-of dinheiro
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
17 / 46
Synset discovery
Synonymy network
Established by synonymy pairs
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
18 / 46
Synset discovery
Synonymy network
Established by synonymy pairs
Propagate synonymy?
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
18 / 46
Synset discovery
Synonymy network
Established by synonymy pairs
Propagate synonymy?
I
Large network (≈ 40,000 nodes for nouns, ≈ 15,000 for adjs)
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
18 / 46
Synset discovery
Synonymy network
Established by synonymy pairs
Propagate synonymy?
I
I
Large network (≈ 40,000 nodes for nouns, ≈ 15,000 for adjs)
Large connected subgraphs (≈ 26,000 nodes for nouns, ≈ 11,000 for
adjectives)
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
18 / 46
Synset discovery
Synonymy network
Established by synonymy pairs
Propagate synonymy?
I
I
I
Large network (≈ 40,000 nodes for nouns, ≈ 15,000 for adjs)
Large connected subgraphs (≈ 26,000 nodes for nouns, ≈ 11,000 for
adjectives)
Problems, such a:
F
queda synonym-of ruı́na ∧ queda synonym-of habilidade
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
18 / 46
Synset discovery
Synonymy network
Established by synonymy pairs
Propagate synonymy?
I
I
I
Large network (≈ 40,000 nodes for nouns, ≈ 15,000 for adjs)
Large connected subgraphs (≈ 26,000 nodes for nouns, ≈ 11,000 for
adjectives)
Problems, such a:
F
F
queda synonym-of ruı́na ∧ queda synonym-of habilidade
→ ruı́na synonym-of habilidade
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
18 / 46
Synset discovery
Clustering for synsets
Synonymy networks extracted from dictionaries tend to have a
clustered structure [Gfeller et al., 2005]
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
19 / 46
Synset discovery
Clustering for synsets
Synonymy networks extracted from dictionaries tend to have a
clustered structure [Gfeller et al., 2005]
Clusters may be seen as synsets
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
19 / 46
Synset discovery
Clustering for synsets
Synonymy networks extracted from dictionaries tend to have a
clustered structure [Gfeller et al., 2005]
Clusters may be seen as synsets
Words with more than one sense → overlapping clusters!
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
19 / 46
Synset discovery
Clustering algorithm
Main idea: each word and its neighbourhood is a potential cluster
1
Network as a matrix M
~
v1
1
1
0
0
0
0
0
0
0
0
Gonçalo Oliveira & Gomes (CISUC)
~
v2
1
1
1
0
0
0
0
0
0
0
~
v3
0
1
1
1
0
0
0
0
0
0
~
v4
0
0
1
1
1
0
0
0
0
0
~
v5
0
0
0
1
1
1
1
0
0
0
Onto.PT
~
v6
0
0
0
0
1
1
1
1
1
1
~
v7
0
0
0
0
1
1
1
0
0
0
~
v8
0
0
0
0
0
1
0
1
0
0
~
v9
0
0
0
0
0
1
0
0
1
0
~
v10
0
0
0
0
0
1
0
0
0
1
June 8, 2012
20 / 46
Synset discovery
Clustering algorithm
Main idea: each word and its neighbourhood is a potential cluster
1
2
Network as a matrix M
Similarity matrix
|V
P|
sim(a, b) = cos(~va , ~vb ) =
v~a .v~b
= s i=0
|v~a ||v~b |
|V
P|
i=0
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
vai × vb i
(1)
vai2 ×
|V
P|
i=0
vbi2
June 8, 2012
20 / 46
Synset discovery
Clustering algorithm
Main idea: each word and its neighbourhood is a potential cluster
1
Network as a matrix M
2
Similarity matrix
~
v1
1.0
0.6
0.4
0.0
0.0
0.0
0.0
0.0
0.0
0.0
Gonçalo Oliveira & Gomes (CISUC)
, θ = 0.5
~
v2
0.6
1.0
0.7
0.3
0.0
0.0
0.0
0.0
0.0
0.0
~
v3
0.4
0.7
1.0
0.7
0.3
0.0
0.0
0.0
0.0
0.0
~
v4
0.0
0.3
0.7
1.0
0.6
0.2
0.3
0.0
0.0
0.0
~
v5
0.0
0.0
0.3
0.6
1.0
0.6
0.9
0.4
0.4
0.4
Onto.PT
~
v6
0.0
0.0
0.0
0.2
0.6
1.0
0.7
0.6
0.6
0.6
~
v7
0.0
0.0
0.0
0.3
0.9
0.7
1.0
0.4
0.4
0.4
~
v8
0.0
0.0
0.0
0.0
0.4
0.6
0.4
1.0
0.5
0.5
~
v9
0.0
0.0
0.0
0.0
0.4
0.6
0.4
0.5
1.0
0.5
~
v10
0.0
0.0
0.0
0.0
0.4
0.6
0.4
0.5
0.5
1.0
June 8, 2012
20 / 46
Synset discovery
Clustering algorithm
Main idea: each word and its neighbourhood is a potential cluster
1
Network as a matrix M
2
Similarity matrix
3
Clusters
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
20 / 46
Synset discovery
Take advantage of handcrafted thesauri
What about TeP?
TeP is...
I
I
I
Structured on synsets
Created manually
Free
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
21 / 46
Synset discovery
Take advantage of handcrafted thesauri
What about TeP?
TeP is...
I
I
I
Structured on synsets
Created manually
Free
TeP is more complementary than overlapping with PAPEL/CARTÃO
[Santos et al., 2010, Teixeira et al., 2010,
Gonçalo Oliveira et al., 2011]
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
21 / 46
Synset discovery
Take advantage of handcrafted thesauri
What about TeP?
TeP is...
I
I
I
Structured on synsets
Created manually
Free
TeP is more complementary than overlapping with PAPEL/CARTÃO
[Santos et al., 2010, Teixeira et al., 2010,
Gonçalo Oliveira et al., 2011]
Take advantage of TeP, instead of using it merely as a reference for
comparison!
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
21 / 46
Synset discovery
Take advantage of handcrafted thesauri
What about TeP?
TeP is...
I
I
I
Structured on synsets
Created manually
Free
TeP is more complementary than overlapping with PAPEL/CARTÃO
[Santos et al., 2010, Teixeira et al., 2010,
Gonçalo Oliveira et al., 2011]
Take advantage of TeP, instead of using it merely as a reference for
comparison!
1
Integrate synpairs of CARTÃO in TeP synsets
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
21 / 46
Synset discovery
Take advantage of handcrafted thesauri
What about TeP?
TeP is...
I
I
I
Structured on synsets
Created manually
Free
TeP is more complementary than overlapping with PAPEL/CARTÃO
[Santos et al., 2010, Teixeira et al., 2010,
Gonçalo Oliveira et al., 2011]
Take advantage of TeP, instead of using it merely as a reference for
comparison!
1
2
Integrate synpairs of CARTÃO in TeP synsets
Discover clusters in remaining synpairs
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
21 / 46
Synset discovery
Take advantage of handcrafted thesauri
What about TeP?
TeP is...
I
I
I
Structured on synsets
Created manually
Free
TeP is more complementary than overlapping with PAPEL/CARTÃO
[Santos et al., 2010, Teixeira et al., 2010,
Gonçalo Oliveira et al., 2011]
Take advantage of TeP, instead of using it merely as a reference for
comparison!
1
2
3
Integrate synpairs of CARTÃO in TeP synsets
Discover clusters in remaining synpairs
Add new clusters as synsets
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
21 / 46
Synset discovery
Take advantage of handcrafted thesauri
Assigning synpairs to synsets
Starting point:
I
I
Thesaurus T , with synsets S = {v1 , v2 , ..., vn }
Synonymy network N, with synpairs p = (vx , vy )
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
22 / 46
Synset discovery
Take advantage of handcrafted thesauri
Assigning synpairs to synsets
Starting point:
I
I
Thesaurus T , with synsets S = {v1 , v2 , ..., vn }
Synonymy network N, with synpairs p = (vx , vy )
Goal:
Synpair
(alimentação, mantença)
→
(escravizar, servilizar )
→
(permanente, inextinguı́vel)
→
Gonçalo Oliveira & Gomes (CISUC)
Synset
{sustento, alimento, mantimento, alimentação, mantença}
{oprimir, tiranizar, escravizar, esmagar, servilizar}
{durador, duradoiro, duradouro, durável,
permanente, perdurável, inextinguı́vel}
Onto.PT
June 8, 2012
22 / 46
Synset discovery
Take advantage of handcrafted thesauri
Assigning p = (vx , vy ) to a synset C
1
Select all synsets containing one of the elements of p,
∀(Cj ∈ C ) : vx ∈ Cj ∨ vy ∈ Cj .
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
23 / 46
Synset discovery
Take advantage of handcrafted thesauri
Assigning p = (vx , vy ) to a synset C
1
Select all synsets containing one of the elements of p,
∀(Cj ∈ C ) : vx ∈ Cj ∨ vy ∈ Cj .
2
Synpair and candidate synsets as adjacency vectors in N
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
23 / 46
Synset discovery
Take advantage of handcrafted thesauri
Assigning p = (vx , vy ) to a synset C
1
Select all synsets containing one of the elements of p,
∀(Cj ∈ C ) : vx ∈ Cj ∨ vy ∈ Cj .
2
Synpair and candidate synsets as adjacency vectors in N
3
Compute the similarity between ~p and each synset Ck ∈ C :
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
23 / 46
Synset discovery
Take advantage of handcrafted thesauri
Assigning p = (vx , vy ) to a synset C
1
Select all synsets containing one of the elements of p,
∀(Cj ∈ C ) : vx ∈ Cj ∨ vy ∈ Cj .
2
Synpair and candidate synsets as adjacency vectors in N
3
Compute the similarity between ~p and each synset Ck ∈ C :
4
~ ) ≥ σ ∧ sim(~p , Cbest
~ ) = max(sim(~p , C~k )).
p → Cbest : sim(~p , Cbest
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
23 / 46
Synset discovery
Take advantage of handcrafted thesauri
Assignment settings
250 synpairs + TeP, three gold references:
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
24 / 46
Synset discovery
Take advantage of handcrafted thesauri
Assignment settings
250 synpairs + TeP, three gold references:
I
I
I
Annotator 1 (A1)
Annotator 2 (A2)
Intersection (∩)
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
24 / 46
Synset discovery
Take advantage of handcrafted thesauri
Assignment settings
250 synpairs + TeP, three gold references:
I
I
I
Annotator 1 (A1)
Annotator 2 (A2)
Intersection (∩)
IAA(A1, A2) = 68%, κ(A1, A2) = 0.40
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
24 / 46
Synset discovery
Take advantage of handcrafted thesauri
Assignment settings
250 synpairs + TeP, three gold references:
I
I
I
Annotator 1 (A1)
Annotator 2 (A2)
Intersection (∩)
IAA(A1, A2) = 68%, κ(A1, A2) = 0.40
Best settings, cos(~p , C~k ) ≥ 0.15
Ref.
A1
A2
∩
Setting
All
Random
Best
All
Random
Best
All
Random
Best
Gonçalo Oliveira & Gomes (CISUC)
Precision
44%
60%
74%
60%
68%
82%
34%
46%
64%
Recall
100%
31%
34%
100%
34%
36%
100%
41%
48%
RRecall
100%
65%
71%
100%
80%
85%
100%
64%
74%
Onto.PT
F0.5
61%
62%
73%
75%
73%
83%
51%
53%
69%
RF0.5
50%
61%
74%
65%
70%
82%
39%
48%
66%
June 8, 2012
24 / 46
Synset discovery
Take advantage of handcrafted thesauri
Clustering evaluation
1
Select random pairs of words from discovered synsets
2
Classify each pair as correct or incorrect
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
25 / 46
Synset discovery
Take advantage of handcrafted thesauri
Clustering evaluation
1
Select random pairs of words from discovered synsets
2
Classify each pair as correct or incorrect
Using the whole synonymy network
I
I
I
440 noun pairs
Two human judges (IAA = 83%, κ = 0.43)
Correct: 75%
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
25 / 46
Synset discovery
Take advantage of handcrafted thesauri
Clustering evaluation
1
Select random pairs of words from discovered synsets
2
Classify each pair as correct or incorrect
Using the whole synonymy network
I
I
I
440 noun pairs
Two human judges (IAA = 83%, κ = 0.43)
Correct: 75%
Using only clusters of the network after assignment
I
I
330 pairs (110 nouns, 110 verbs, 110 adjectives)
Two human judges
F
F
I
IAA: 96%, 85%, 95%
κ: 0.73, 0.39, 0.37
Correct: 85%, 91%, 90%
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
25 / 46
Synset discovery
Take advantage of handcrafted thesauri
TRIP: a large thesaurus for Portuguese
Thesaurus
TeP 2.0
TRIP
POS
Noun
Verb
Adjective
Adverb
Noun
Verb
Adjective
Adverb
Gonçalo Oliveira & Gomes (CISUC)
Total
17,149
8,280
14,568
1,095
45,457
11,924
22,316
2,488
Ambiguous
5,802
4,680
3,730
227
15,392
6,607
7,782
694
Onto.PT
Words
Avg(senses)
1.71
2.69
1.46
1.30
1.80
2.87
1.83
1.42
Max(senses)
20
50
19
11
22
52
22
12
June 8, 2012
26 / 46
Synset discovery
Take advantage of handcrafted thesauri
TRIP: a large thesaurus for Portuguese
Thesaurus
TeP 2.0
TRIP
Thesaurus
TeP 2.0
TRIP
POS
Noun
Verb
Adjective
Adverb
Noun
Verb
Adjective
Adverb
POS
Noun
Verb
Adjective
Adverb
Noun
Verb
Adjective
Adverb
Gonçalo Oliveira & Gomes (CISUC)
Total
17,149
8,280
14,568
1,095
45,457
11,924
22,316
2,488
Ambiguous
5,802
4,680
3,730
227
15,392
6,607
7,782
694
Total
8,254
3,899
6,062
497
16,936
4,424
7,948
813
Avg(size)
3.56
5.71
3.5
2.87
4.84
7.75
5.14
4.34
Words
Avg(senses)
Max(senses)
1.71
20
2.69
50
1.46
19
1.30
11
1.80
22
2.87
52
1.83
22
1.42
12
Synsets
size = 2
size > 25 max(size)
3,083
0
21
907
48
53
3,032
18
43
258
0
9
5,986
226
131
873
193
132
3,127
161
117
270
1
27
Onto.PT
June 8, 2012
26 / 46
Synset discovery
Examples of large (noun) synsets
imbecile/stupid person
patamaz, boca-aberta, imbecil, lucas, malhadeiro, orate, zé-cuecas, lerdaço, tantã,
boleima, babão, jato, zambana, badó, ânsar, bolônio, chapetão, parvalhão, haule,
papa-moscas, lerdo, patau, sànona, perturbado, possidónio, babaquara, tolo, galafura,
babuı́no, zângano, inepto, badana, cabaça, andor, pax-vóbis, idiota, pascoal-bailão,
sandeu, asneirão, zé, capadócio, calino, doudivanas, pasguate, parreco, babanca, palerma,
molusco, parrana, moco, ansarinho, bajoujo, burro, truão, estulto, pexote, maninelo,
lérias, banana, banazola, patego, bobo, estúpido, asno, sonso, ignorante, troixa, otário,
simplório, pancrácio, patola, songo-mongo, toleirão, totó, burgesso, morcão, microcéfalo,
patinho, bacoco, babancas, inhenha, pàteta, néscio, matias, parvoinho, mané, anastácio,
manembro, tatamba, bobalhão, bertoldo, patavina, tonto, apedeuto, pachocho, ingênuo,
bocoió, simplacheirão, jerico, zote, sebastião, lorpa, atónito, patacão, pato, parvoeirão,
ingénuo, papalvo, pateta, tanso, cretino, bolónio, basbaque, mentecapto, pachola,
apaixonado, pasmão, pascácio, tarola, trouxa, parvo, jumento, geta, arara, gato-bravo,
pedaço-de-asno, parvajola, pacóvio, laparoto, crendeiro, loura
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
27 / 46
Synset discovery
Examples of large (noun) synsets
alcoholic intoxication
torcida, embriagamento, veneno, mona, zurca, trapisonda, lontra, rosca, perua, raposada,
rola, tertúlia, carraspana, peleira, pizorga, cabra, chuva, tachada, caroça, ardina, girgolina,
égua, carrega, zerenamora, rasca, touca, venena, gardunho, ema, porre, ebriez, carapanta,
chiba, ebriedade, bico, inebriamento, bebedeira, carrapata, penca, taçada, canja, garça,
ganso, tortelia, turca, cabrita, mela, resina, senisga, bebedice, bezana, vinhaça,
zangurrina, bêbeda, bibra, borrachice, zuca, coca, torta, doninha, piela, graxa, trabuzana,
água, cegonha, gateira, bicancra, samatra, galinhola, gata, pala, ganza, pifão, bode,
cobra, prego, zola, nêspera, narda, parrascana, vinho, gardinhola, tropecina, embriaguez,
cardina, tiorga, temulência, narceja, pisorga, grossura, dosa, trovoada, carneira, perunca,
bruega, canjica, raposa, garrana, raposeira, cartola, cachorra, entusiasmo, carpanta,
piteira, borracheira, cabeleira, carrocha, pifo, camoeca, marta, cachaceira, zangurriana,
verniz, carrada
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
28 / 46
Synset discovery
Examples of large (noun) synsets
money
jimbo, pastel, guines, baguines, parrolo, marcaureles, ouro, grana, arame, massaroca,
tutu, metal, bagalho, bilhestres, milho, jan-da-cruz, china, cum-quibus, cobre, mussuruco,
pilim, pasta, bagaço, zerzulho, painço, chelpa, finanças, calique, tostão, pecuniária,
bagalhoça, boro, dieiro, pila, gaita, pataco, verba, cacau, matambira, gimbo, cunques,
caroço, fanfa, maco, pecúnia, estilha, jibungo, roço, massa, dinheiro, maquia, bago, teca,
pecunia, quantia, espécie, guita, patacaria, carcanhol, pingo
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
29 / 46
Ontologisation of semantic relations
Moving from term-based to synset-based relations
Goal: move from a R b ∈ G to A R B, A ∈ T , B ∈ T
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
30 / 46
Ontologisation of semantic relations
Moving from term-based to synset-based relations
Goal: move from a R b ∈ G to A R B, A ∈ T , B ∈ T
I
porta part-of carro → {porta, entrada, portão} part-of {carro, automóvel}
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
30 / 46
Ontologisation of semantic relations
Moving from term-based to synset-based relations
Goal: move from a R b ∈ G to A R B, A ∈ T , B ∈ T
I
porta part-of carro → {porta, entrada, portão} part-of {carro, automóvel}
Available information:
I
I
Thesaurus T , with synsets
Relational triples between terms, in a lexical graph G
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
30 / 46
Ontologisation of semantic relations
Moving from term-based to synset-based relations
Goal: move from a R b ∈ G to A R B, A ∈ T , B ∈ T
I
porta part-of carro → {porta, entrada, portão} part-of {carro, automóvel}
Available information:
I
I
Thesaurus T , with synsets
Relational triples between terms, in a lexical graph G
Output: semantic graph, wordnet W
I
I
Same relations as in G
But between synsets
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
30 / 46
Ontologisation of semantic relations
Ontologising algorithms
Related Proportion (RP)
Number of Triples (TP)
Average Cosine (AC)
Related Proportion + Average Cosine (RP+AC)
Number of Triples + Average Cosine (NT+AC)
Minimum distance (MD)
PageRank (PR)
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
31 / 46
Ontologisation of semantic relations
Evaluating the ontologising algorithms
Gold reference
I
I
Thesaurus: TeP + OpenThesaurus.PT
Term-based triples: 452 (hypernymy, part-of, purpose-of), from PAPEL
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
32 / 46
Ontologisation of semantic relations
Evaluating the ontologising algorithms
Gold reference
I
I
I
Thesaurus: TeP + OpenThesaurus.PT
Term-based triples: 452 (hypernymy, part-of, purpose-of), from PAPEL
All possible attachments
tb-triple = (documento hypernym-of recibo)
A1 : documento, declaração (document, declara- B1 :
recibo, comprovante, nota,
tion)
quitação, senha (receipt, confirming,
note, quittance)
A2 : escritura, documento (deed, document)
tb-triple = (planta part-of floresta)
A1 : relação, quadro, planta, mapa (relation, B1 : bosque, floresta, mata, brenha, selva
frame, plant, map)
(hood, forest, jungle)
A2 : vegetal, planta (vegetable, plant)
A3 : traçado, desenho, projeto, planta, plano (design, project, plant, plan)
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
32 / 46
Ontologisation of semantic relations
Evaluating the ontologising algorithms
Compared to attachments using...
I
I
Ontologising algorithms, random candidate baseline
CARTÃO
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
33 / 46
Ontologisation of semantic relations
Evaluating the ontologising algorithms
Compared to attachments using...
I
I
Ontologising algorithms, random candidate baseline
CARTÃO
Best algorithms:
Relation
Hypernym-of
(210 tb-triples)
Part-of
(175 tb-triples)
Purpose-of
(67 tb-triples)
Gonçalo Oliveira & Gomes (CISUC)
Algorithm
Random
RP
AC
RP+AC
Random
RP
AC
RP+AC
Random
RP
AC
RP+AC
P%
42.1
53.3
60.6
55.8
47.4
56.9
58.7
63.3
44.8
51.5
63.2
63.4
Onto.PT
R%
10.7
12.4
15.8
14.8
12.6
10.6
14.9
16.3
9.0
5.1
13.0
13.6
F1 %
17.1
20.0
25.1
23.4
19.9
17.9
23.8
25.9
15.0
9.3
21.5
22.3
F0.5 %
26.5
32.1
38.7
35.9
30.6
30.4
37.0
40.1
25.0
18.3
35.6
36.5
Fr %
42.1
49.9
60.3
55.8
47.4
47.0
58.7
63.3
44.8
32.6
63.2
63.4
June 8, 2012
33 / 46
Ontologisation of semantic relations
Evaluating the ontologising algorithms
RP+AC, a R b
1
To ontologise (θ = 0.5) a/b, fix b/a
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
34 / 46
Ontologisation of semantic relations
Evaluating the ontologising algorithms
RP+AC, a R b
1
To ontologise (θ = 0.5) a/b, fix b/a
[Example] for each Ai ∈ A
I
A1 = (a, c, d, e), pa1 =
I
A2 = (a, f , g ), pa2 =
I
A3 = (a, h, i, j), pa3 =
2
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
3
4
2
3
1
4
June 8, 2012
34 / 46
Ontologisation of semantic relations
Evaluating the ontologising algorithms
RP+AC, a R b
1
To ontologise (θ = 0.5) a/b, fix b/a
[Example] for each Ai ∈ A
I
A1 = (a, c, d, e), pa1 =
I
A2 = (a, f , g ), pa2 =
I
A3 = (a, h, i, j), pa3 =
2
3
3
4
2
3
1
4
pa1 = max(pai ) ∧ pa1 ≥ θ, a → A1
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
34 / 46
Ontologisation of semantic relations
Evaluating the ontologising algorithms
RP+AC, a R b
1
To ontologise (θ = 0.5) a/b, fix b/a
[Example] for each Ai ∈ A
I
A1 = (a, c, d, e), pa1 =
I
A2 = (a, f , g ), pa2 =
I
A3 = (a, h, i, j), pa3 =
2
3
4
3
4
2
3
1
4
pa1 = max(pai ) ∧ pa1 ≥ θ, a → A1
If no suitable Ai or Bj , represent candidate synsets as matrices in N:
I
~ i = {~ai0 , ...,~ain }, n = |Ai |
A
I
~ j = {~bj0 , ..., ~bjm }, m = |Bj |
B
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
34 / 46
Ontologisation of semantic relations
Evaluating the ontologising algorithms
RP+AC, a R b
1
To ontologise (θ = 0.5) a/b, fix b/a
[Example] for each Ai ∈ A
I
A1 = (a, c, d, e), pa1 =
I
A2 = (a, f , g ), pa2 =
I
A3 = (a, h, i, j), pa3 =
2
3
4
5
3
4
2
3
1
4
pa1 = max(pai ) ∧ pa1 ≥ θ, a → A1
If no suitable Ai or Bj , represent candidate synsets as matrices in N:
I
~ i = {~ai0 , ...,~ain }, n = |Ai |
A
I
~ j = {~bj0 , ..., ~bjm }, m = |Bj |
B
Compute the average similarity of the elements of each pair of
~ i, B
~j)
synsets: cos(A
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
34 / 46
Ontologisation of semantic relations
Evaluating the ontologising algorithms
RP+AC, a R b
1
To ontologise (θ = 0.5) a/b, fix b/a
[Example] for each Ai ∈ A
I
A1 = (a, c, d, e), pa1 =
I
A2 = (a, f , g ), pa2 =
I
A3 = (a, h, i, j), pa3 =
2
3
4
3
4
2
3
1
4
pa1 = max(pai ) ∧ pa1 ≥ θ, a → A1
If no suitable Ai or Bj , represent candidate synsets as matrices in N:
I
~ i = {~ai0 , ...,~ain }, n = |Ai |
A
I
~ j = {~bj0 , ..., ~bjm }, m = |Bj |
B
5
Compute the average similarity of the elements of each pair of
~ i, B
~j)
synsets: cos(A
6
Select the most similar pair (Ax , By ) :
cos(Ax , By ) = max(cos(Ai ∈ A, Bj ∈ B))
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
34 / 46
Approach summary
From dictionaries to a wordnet in three steps
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
35 / 46
Approach summary
From dictionaries to a wordnet in three steps
1
gado s.m. conjunto de animais criados para diversos fins; rebanho
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
35 / 46
Approach summary
From dictionaries to a wordnet in three steps
1
gado s.m. conjunto de animais criados para diversos fins; rebanho
I
I
tb triple1 = rebanho SINONIMO DE gado
tb triple2 = animal MEMBRO DE gado
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
35 / 46
Approach summary
From dictionaries to a wordnet in three steps
1
gado s.m. conjunto de animais criados para diversos fins; rebanho
I
I
2
tb triple1 = rebanho SINONIMO DE gado
tb triple2 = animal MEMBRO DE gado
synset1 = (manada, rebanho, mancheia, boiada)
I
+tb triple1 = (manada, rebanho, mancheia, boiada, gado)
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
35 / 46
Approach summary
From dictionaries to a wordnet in three steps
1
gado s.m. conjunto de animais criados para diversos fins; rebanho
I
I
tb triple1 = rebanho SINONIMO DE gado
tb triple2 = animal MEMBRO DE gado
2
synset1 = (manada, rebanho, mancheia, boiada)
3
synset2 = (bicho, animal, alimal, béstia, minante)
I
I
+tb triple1 = (manada, rebanho, mancheia, boiada, gado)
sb triple1 = synset2 MEMBRO DE synset1
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
35 / 46
Presenting Onto.PT v.0.3.1
Synsets
About 150,000 lexical items
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
36 / 46
Presenting Onto.PT v.0.3.1
Synsets
About 150,000 lexical items
Organised in about 110,000 synsets
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
36 / 46
Presenting Onto.PT v.0.3.1
Synsets
About 150,000 lexical items
Organised in about 110,000 synsets
Synsets are ordered according to AC/DC [Santos and Bick, 2000]
frequency of their words
I
Words inside synsets are ordered according to their AC/DC frequency
POS
Nouns
Verbs
Adjectives
Adverbs
Gonçalo Oliveira & Gomes (CISUC)
size > 1
19.211
3.998
7.272
710
Onto.PT
Synsets
size = 1
45.654
21.344
10.680
1.283
Total
64.865
25.342
17.952
1.993
June 8, 2012
36 / 46
Presenting Onto.PT v.0.3.1
Relations (excluding inverse)
About 170,000 relations
Same types as in PAPEL/CARTÃO
Relations
Predicates
Instances
Hypernym
n hiperonimoDe n
n parteDe n
n parteDeAlgoComProp adj
adj propDeAlgoParteDe n
n membroDe n
n membroDeAlgoComProp adj
adj propDeAlgoMembroDe n
n contidoEm n
n contidoEmAlgoComProp adj
n materialDe n
n causadorDe n
n causadorDeAlgoComProp adj
adj propDeAlgoQueCausa n
n causadorDaAccao v
v accaoQueCausa n
n localOrigemDe n
adj antonimoAdjDe adj
83,552
3,672
4,911
91
5,847
106
909
355
264
835
1,347
26
619
56
8,052
1,293
538
Part
Member
Contains
Material
Causation
Place
Antonym
Gonçalo Oliveira & Gomes (CISUC)
Relations
Producer
Purpose
Quality
State
Manner
Manner
without
Property
Onto.PT
Predicates
Instances
n produtorDe n
n produtorDeAlgoComProp adj
adj propDeAlgoProdutorDe n
n fazSeCom n
n fazSeComAlgoComProp adj
v finalidadeDe n
v finalidadeDeAlgoComProp adj
n temQualidade n
n devidoAQualidade adj
n temEstado n
n devidoAEstado adj
adv maneiraPorMeioDe n
adv maneiraComProp adj
adv maneiraSem n
adv maneiraSemAccao v
adj dizSeSobre n
adj dizSeDoQue v
1,718
88
529
6,551
79
7,271
322
934
1,059
327
197
1,833
1,561
216
14
9,145
25,014
June 8, 2012
37 / 46
Presenting Onto.PT v.0.3.1
Onto.PT as a Semantic Web model
Adaptation of the W3C WordNet RDF/OWL Basic
[van Assem et al., 2006]
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
38 / 46
Presenting Onto.PT v.0.3.1
OntoBusca: Onto.PT’s interface
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
39 / 46
Presenting Onto.PT v.0.3.1
Usage example
Onto.PT for query expansion
1
Disambiguate the head of the query
I
I
I
WSD algorithm, e.g. Personalized PageRank [Agirre and Soroa, 2009]
Use the words in the query as context
Select a suitable synset for the head
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
40 / 46
Presenting Onto.PT v.0.3.1
Usage example
Onto.PT for query expansion
1
Disambiguate the head of the query
I
I
I
2
WSD algorithm, e.g. Personalized PageRank [Agirre and Soroa, 2009]
Use the words in the query as context
Select a suitable synset for the head
Use the words of the synset as search alternatives
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
40 / 46
Presenting Onto.PT v.0.3.1
Usage example
Onto.PT for query expansion
1
Disambiguate the head of the query
I
I
I
2
WSD algorithm, e.g. Personalized PageRank [Agirre and Soroa, 2009]
Use the words in the query as context
Select a suitable synset for the head
Use the words of the synset as search alternatives
Approach to the joint evaluation Págico [Rodrigues et al., 2012]
I
Runs with WSD performed better than without
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
40 / 46
Presenting Onto.PT v.0.3.1
Usage example
Onto.PT for query expansion
1
Disambiguate the head of the query
I
I
I
2
WSD algorithm, e.g. Personalized PageRank [Agirre and Soroa, 2009]
Use the words in the query as context
Select a suitable synset for the head
Use the words of the synset as search alternatives
Approach to the joint evaluation Págico [Rodrigues et al., 2012]
I
Runs with WSD performed better than without
Examples:
I
Doces brasileiros que têm origem nos doces portugueses
I
Doenças letais comuns em paı́ses lusófonos transmitidas por mosquitos
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
40 / 46
Presenting Onto.PT v.0.3.1
Usage example
Onto.PT for query expansion
1
Disambiguate the head of the query
I
I
I
2
WSD algorithm, e.g. Personalized PageRank [Agirre and Soroa, 2009]
Use the words in the query as context
Select a suitable synset for the head
Use the words of the synset as search alternatives
Approach to the joint evaluation Págico [Rodrigues et al., 2012]
I
Runs with WSD performed better than without
Examples:
I
Doces brasileiros que têm origem nos doces portugueses
I
Doenças letais comuns em paı́ses lusófonos transmitidas por mosquitos
F
F
doce OR confeito OR guloseima OR gulodice ...
doença OR mal-estar OR enfermidade OR mal OR patologia OR
distúrbio OR padecimento ...
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
40 / 46
Concluding remarks
Main contributions
CARTÃO, the largest lexical graph for Portuguese
I
Larger than PAPEL
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
41 / 46
Concluding remarks
Main contributions
CARTÃO, the largest lexical graph for Portuguese
I
Larger than PAPEL
TRIP, the largest Portuguese thesaurus
I
I
Larger than TeP
Alternative to OpenThesaurus.PT in OpenOffice
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
41 / 46
Concluding remarks
Main contributions
CARTÃO, the largest lexical graph for Portuguese
I
Larger than PAPEL
TRIP, the largest Portuguese thesaurus
I
I
Larger than TeP
Alternative to OpenThesaurus.PT in OpenOffice
Onto.PT, a new public lexical ontology
I
I
Created automatically, higher growth potential
An addition or alternative to existing Portuguese LKBs
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
41 / 46
Concluding remarks
Main contributions
CARTÃO, the largest lexical graph for Portuguese
I
Larger than PAPEL
TRIP, the largest Portuguese thesaurus
I
I
Larger than TeP
Alternative to OpenThesaurus.PT in OpenOffice
Onto.PT, a new public lexical ontology
I
I
Created automatically, higher growth potential
An addition or alternative to existing Portuguese LKBs
A flexible approach, that enables the integration of several resources
I
May be adapted to the construction/enrichment of wordnets in other
languages
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
41 / 46
Concluding remarks
Future
Onto.PT is in constant development!
I
2
New version coming soon...
http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
42 / 46
Concluding remarks
Future
Onto.PT is in constant development!
I
I
2
New version coming soon...
Improvement of each construction step
http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
42 / 46
Concluding remarks
Future
Onto.PT is in constant development!
I
I
I
2
New version coming soon...
Improvement of each construction step
Augmentation by exploiting other resources (e.g. Wikipedia)
http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
42 / 46
Concluding remarks
Future
Onto.PT is in constant development!
I
I
I
I
2
New version coming soon...
Improvement of each construction step
Augmentation by exploiting other resources (e.g. Wikipedia)
Associate definitions/example sentences with synsets
[Henrich et al., 2011]
http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
42 / 46
Concluding remarks
Future
Onto.PT is in constant development!
I
I
I
I
New version coming soon...
Improvement of each construction step
Augmentation by exploiting other resources (e.g. Wikipedia)
Associate definitions/example sentences with synsets
[Henrich et al., 2011]
More evaluation:
I
I
I
2
Quality, e.g. manual evaluation of parts of the resource
Coverage, e.g. mapping with the Global WordNet base concepts
Utility, e.g. utilisation in (more) NLP tasks
2
http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
42 / 46
Concluding remarks
Future
Onto.PT is in constant development!
I
I
I
I
New version coming soon...
Improvement of each construction step
Augmentation by exploiting other resources (e.g. Wikipedia)
Associate definitions/example sentences with synsets
[Henrich et al., 2011]
More evaluation:
I
I
I
Quality, e.g. manual evaluation of parts of the resource
Coverage, e.g. mapping with the Global WordNet base concepts
Utility, e.g. utilisation in (more) NLP tasks
2
Availability
I
2
Updates and other resources in http://ontopt.dei.uc.pt
http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
42 / 46
References
References I
[Agirre and Soroa, 2009] Agirre, E. and Soroa, A. (2009).
Personalizing PageRank for word sense disambiguation.
In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL’09,
pages 33–41, Stroudsburg, PA, USA. ACL Press.
[Amsler, 1980] Amsler, R. A. (1980).
The structure of the Merriam-Webster Pocket dictionary.
PhD thesis, The University of Texas at Austin.
[Calzolari et al., 1973] Calzolari, N., Pecchia, L., and Zampolli, A. (1973).
Working on the italian machine dictionary: a semantic approach.
In Proceedings of 5th Conference on Computational Linguistics, COLING’73, pages 49–52, Morristown, NJ, USA. ACL Press.
[Dias da Silva et al., 2002] Dias da Silva, B. C., de Oliveira, M. F., and de Moraes, H. R. (2002).
Groundwork for the Development of the Brazilian Portuguese Wordnet.
In Advances in Natural Language Processing (PorTAL 2002), LNAI, pages 189–196, Berlin/Heidelberg. Springer.
[Fellbaum, 1998] Fellbaum, C., editor (1998).
WordNet: An Electronic Lexical Database (Language, Speech, and Communication).
The MIT Press.
[Gfeller et al., 2005] Gfeller, D., Chappelier, J.-C., and Rios, P. D. L. (2005).
Synonym Dictionary Improvement through Markov Clustering and Clustering Stability.
In Proceedings of International Symposium on Applied Stochastic Models and Data Analysis, ASMDA 2005, pages 106–113.
[Gonçalo Oliveira et al., 2011] Gonçalo Oliveira, H., Antón Pérez, L., Costa, H., and Gomes, P. (2011).
Uma rede léxico-semântica de grandes dimensões para o português, extraı́da a partir de dicionários electrónicos.
Linguamática, 3(2):23–38.
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
43 / 46
References
References II
[Gonçalo Oliveira et al., 2010] Gonçalo Oliveira, H., Santos, D., and Gomes, P. (2010).
Extracção de relações semânticas entre palavras a partir de um dicionário: o PAPEL e sua avaliação.
Linguamática, 2(1):77–93.
[Henrich et al., 2011] Henrich, V., Hinrichs, E., and Vodolazova, T. (2011).
Semi-automatic extension of germanet with sense definitions from wiktionary.
In Proceedings of 5th Language & Technology Conference, LTC 2011, pages 126–130, Poznan, Poland.
[Hirst, 2004]
Ontology
In Staab,
209–230.
Hirst, G. (2004).
and the lexicon.
S. and Studer, R., editors, Handbook on Ontologies, International Handbooks on Information Systems, pages
Springer.
[Marrafa, 2002] Marrafa, P. (2002).
Portuguese Wordnet: general architecture and internal semantic relations.
DELTA, 18:131–146.
[Maziero et al., 2008] Maziero, E. G., Pardo, T. A. S., Felippo, A. D., and Dias-da-Silva, B. C. (2008).
A Base de Dados Lexical e a Interface Web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil.
In VI Workshop em Tecnologia da Informação e da Linguagem Humana (TIL), pages 390–392.
[Michiels et al., 1980] Michiels, A., Mullenders, J., and Noël, J. (1980).
Exploiting a large data base by Longman.
In Proceedings of the 8th conference on Computational Linguistics, COLING’80, pages 374–382, Morristown, NJ, USA. ACL
Press.
[Pianta et al., 2002] Pianta, E., Bentivogli, L., and Girardi, C. (2002).
MultiWordNet: developing an aligned multilingual database.
In 1st International Conference on Global WordNet.
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
44 / 46
References
References III
[Rodrigues et al., 2012] Rodrigues, R., Gonçalo Oliveira, H., and Gomes, P. (2012).
Uma abordagem ao Págico baseada no processamento e análise de sintagmas dos tópicos.
Linguamática, 4(1):31–39.
[Santos et al., 2010] Santos, D., Barreiro, A., Freitas, C., Gonçalo Oliveira, H., Medeiros, J. C., Costa, L., Gomes, P., and
Silva, R. (2010).
Relações semânticas em português: comparando o TeP, o MWN.PT, o Port4NooJ e o PAPEL.
In Textos seleccionados. XXV Encontro Nacional da Associação Portuguesa de Linguı́stica, pages 681–700. APL, Lisboa,
Portugal.
[Santos and Bick, 2000] Santos, D. and Bick, E. (2000).
Providing Internet access to Portuguese corpora: the AC/DC project.
In Proceedings of 2nd International Conference on Language Resources and Evaluation, LREC 2000, pages 205–210.
[Simões and Farinha, 2011] Simões, A. and Farinha, R. (2011).
Dicionário Aberto: Um novo recurso para PLN.
Vice-Versa, pages 159–171.
[Teixeira et al., 2010] Teixeira, J., Sarmento, L., and Oliveira, E. (2010).
Comparing verb synonym resources for portuguese.
In Proceedings of Computational Processing of the Portuguese Language, 9th International Conference, PROPOR 2010,
volume 6001 of LNAI, pages 100–109. Springer.
[van Assem et al., 2006] van Assem, M., Gangemi, A., and Schreiber, G. (2006).
RDF/OWL representation of WordNet.
W3c working draft, World Wide Web Consortium.
[Vossen, 1997] Vossen, P. (1997).
EuroWordNet: a multilingual database for information retrieval.
In Proceedings of DELOS workshop on Cross-Language Information Retrieval, Zurich.
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
45 / 46
The end
Thank you!
Check http://ontopt.dei.uc.pt
Gonçalo Oliveira & Gomes (CISUC)
Onto.PT
June 8, 2012
46 / 46