Onto.PT: towards the automatic construction of a lexical ontology for
Transcrição
Onto.PT: towards the automatic construction of a lexical ontology for
Onto.PT: towards the automatic construction of a lexical ontology for Portuguese Hugo Gonçalo Oliveira1 Paulo Gomes {hroliv,pgomes}@dei.uc.pt Cognitive & Media Systems Group CISUC, Universidade de Coimbra June 8, 2012 1 supported by the FCT scholarship grant SFRH/BD/44955/2008, co-funded by FSE Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 1 / 46 Introduction Request: Músicos famosos com carreira no cinema Snippet A Snippet B Snippet C Snippet D David Bowie é um músico famoso que também fez carreira no cinema. Elvis Presley foi um musicista e actor célebre, conhecido como o Rei do Rock. Amália Rodrigues foi talvez a mais ilustre fadista portuguesa. Para além de cantar, durante o seu percurso profissional participou em vários filmes. Jo~ ao apanhou a carreira, famosa por chegar atrasada, para ir ao cinema, na cidade. Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 2 / 46 Introduction Request: Músicos famosos com carreira no cinema Snippet A Snippet B Snippet C Snippet D David Bowie é um músico famoso que também fez carreira no cinema . Rob Zombie é um musicista americano e realizador de filmes de terror. Amália Rodrigues foi talvez a mais ilustre fadista portuguesa. Para além de cantar, durante o seu percurso profissional participou em várias pelı́culas. Jo~ ao apanhou a carreira , famosa por chegar atrasada, para ir ao cinema , na cidade. Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 3 / 46 Introduction Request: Músicos famosos com carreira no cinema Snippet A Snippet B Snippet C Snippet D David Bowie é um músico famoso que também fez carreira no cinema . Rob Zombie é um musicista americano e realizador de filmes de terror. Amália Rodrigues foi talvez a mais ilustre fadista portuguesa. Para além de cantar, durante o seu percurso profissional participou em várias pelı́culas. Jo~ ao apanhou a carreira , famosa por chegar atrasada, para ir ao cinema , na cidade. musicista synonym-of músico realizador producer-of cinema filme part-of cinema Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 3 / 46 Introduction Request: Músicos famosos com carreira no cinema Snippet A Snippet B Snippet C Snippet D David Bowie é um músico famoso que também fez carreira no cinema . Rob Zombie é um musicista americano e realizador de filmes de terror. Amália Rodrigues foi talvez a mais ilustre fadista portuguesa. Para além de cantar, durante o seu percurso profissional participou em várias pelı́culas. Jo~ ao apanhou a carreira , famosa por chegar atrasada, para ir ao cinema , na cidade. musicista synonym-of músico realizador producer-of cinema filme part-of cinema ilustre synonym-of famoso fadista hyponym-of músico pelı́cula synonym-of filme percurso profissional synonym-of carreira Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 3 / 46 Introduction Request: Músicos famosos com carreira no cinema Snippet A Snippet B Snippet C Snippet D David Bowie é um músico famoso que também fez carreira no cinema . Rob Zombie é um musicista americano e realizador de filmes de terror. Amália Rodrigues foi talvez a mais ilustre fadista portuguesa. Para além de cantar, durante o seu percurso profissional participou em várias pelı́culas. Jo~ ao apanhou a carreira , famosa por chegar atrasada, para ir ao cinema , na cidade. musicista synonym-of músico realizador producer-of cinema carreira hyponym-of actividade filme part-of cinema carreira hyponym-of transporte ilustre synonym-of famoso fadista hyponym-of músico pelı́cula synonym-of filme percurso profissional synonym-of carreira Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 3 / 46 Introduction Request: Músicos famosos com carreira no cinema Snippet A Snippet B Snippet C Snippet D David Bowie é um músico famoso que também fez carreira no cinema . Rob Zombie é um musicista americano e realizador de filmes de terror. Amália Rodrigues foi talvez a mais ilustre fadista portuguesa. Para além de cantar, durante o seu percurso profissional participou em várias pelı́culas. Jo~ ao apanhou a carreira , famosa por chegar atrasada, para ir ao cinema , na cidade. musicista synonym-of músico realizador producer-of cinema carreira hyponym-of actividade filme part-of cinema carreira hyponym-of transporte ilustre synonym-of famoso cinema hyponym-of arte fadista hyponym-of músico cinema hyponym-of edifı́cio pelı́cula synonym-of filme percurso profissional synonym-of carreira Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 3 / 46 Introduction Request: Músicos famosos com carreira no cinema Snippet A David Bowie é um músico no cinema . Snippet B Rob Zombie é um musicista americano e realizador de filmes de terror. Snippet C Amália Rodrigues foi talvez a mais ilustre fadista portuguesa. Para além de cantar, durante o seu percurso profissional participou em várias pelı́culas . Snippet D Jo~ ao apanhou a carreira , famosa por chegar atrasada, para ir ao cinema , na cidade. famoso que também fez carreira musicista synonym-of músico realizador producer-of cinema carreira hyponym-of actividade filme part-of cinema carreira hyponym-of transporte ilustre synonym-of famoso cinema hyponym-of arte fadista hyponym-of músico cinema hyponym-of edifı́cio pelı́cula synonym-of filme percurso profissional synonym-of carreira Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 3 / 46 Introduction Lexical Knowledge Bases (LKBs) Resources for natural language processing (NLP) Cover the whole language, not a specific domain Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 4 / 46 Introduction Lexical Knowledge Bases (LKBs) Resources for natural language processing (NLP) Cover the whole language, not a specific domain Structured on words and meanings Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 4 / 46 Introduction Lexical Knowledge Bases (LKBs) Resources for natural language processing (NLP) Cover the whole language, not a specific domain Structured on words and meanings I Meaning is inherently conceptual... Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 4 / 46 Introduction Lexical Knowledge Bases (LKBs) Resources for natural language processing (NLP) Cover the whole language, not a specific domain Structured on words and meanings I I Meaning is inherently conceptual... [Hirst, 2004]: Ontology + lexicon = Lexical ontology Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 4 / 46 Introduction Lexical Knowledge Bases (LKBs) Resources for natural language processing (NLP) Cover the whole language, not a specific domain Structured on words and meanings I I Meaning is inherently conceptual... [Hirst, 2004]: Ontology + lexicon = Lexical ontology Essential in the development of NLP tools for a language Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 4 / 46 Introduction Lexical Knowledge Bases (LKBs) Resources for natural language processing (NLP) Cover the whole language, not a specific domain Structured on words and meanings I I Meaning is inherently conceptual... [Hirst, 2004]: Ontology + lexicon = Lexical ontology Essential in the development of NLP tools for a language See Princeton WordNet [Fellbaum, 1998]! Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 4 / 46 Introduction WordNet: dictionary + thesaurus Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 5 / 46 Introduction WordNet: dictionary + thesaurus Applications Writing aids Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 5 / 46 Introduction WordNet: dictionary + thesaurus Applications Writing aids Determining similarities Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 5 / 46 Introduction WordNet: dictionary + thesaurus Applications Writing aids Determining similarities Word sense disambiguation Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 5 / 46 Introduction WordNet: dictionary + thesaurus Applications Writing aids Determining similarities Word sense disambiguation Natural language generation Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 5 / 46 Introduction WordNet: dictionary + thesaurus Applications Writing aids Question answering Determining similarities Automatic summarization Word sense disambiguation Machine translation Natural language generation ... Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 5 / 46 Contents 1 Introduction 2 Related resources 3 Relation acquisition 4 Synset discovery 5 Ontologisation of semantic relations 6 Approach summary 7 Presenting Onto.PT v.0.3.1 8 Concluding remarks Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 6 / 46 Related resources Portuguese LKBs Wordnets WordNet.PTa [Marrafa, 2002] I According to the EuroWordNet [Vossen, 1997] model MWN.PTb I I a b In the scope of MultiWordNet [Pianta et al., 2002] About 69,000 synsets, 69,000 relations http://www.clul.ul.pt/clg/eng/wordnetpt/index.html http://mwnpt.di.fc.ul.pt/ Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 7 / 46 Related resources Portuguese LKBs Wordnets WordNet.PTa [Marrafa, 2002] I I I I According to the EuroWordNet [Vossen, 1997] model Not public, browseable through the Web Handcrafted, as Princeton WordNet Only covers some domains MWN.PTb I I a b In the scope of MultiWordNet [Pianta et al., 2002] About 69,000 synsets, 69,000 relations http://www.clul.ul.pt/clg/eng/wordnetpt/index.html http://mwnpt.di.fc.ul.pt/ Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 7 / 46 Related resources Portuguese LKBs Wordnets WordNet.PTa [Marrafa, 2002] I I I I According to the EuroWordNet [Vossen, 1997] model Not public, browseable through the Web Handcrafted, as Princeton WordNet Only covers some domains MWN.PTb I I I I I I a b In the scope of MultiWordNet [Pianta et al., 2002] About 69,000 synsets, 69,000 relations Not public, browsable through the Web and purchasable Handcrafted, as Princeton WordNet Only covers nouns Several lexical gaps http://www.clul.ul.pt/clg/eng/wordnetpt/index.html http://mwnpt.di.fc.ul.pt/ Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 7 / 46 Related resources Portuguese LKBs (cont.) Public thesauri, structured on synsets TePa [Maziero et al., 2008] I I Synset-base of WordNet.Br [Dias da Silva et al., 2002] About 20,000 synsets OpenThesaurus.PTb I a b Suggestions for OpenOffice writer http://www.nilc.icmc.usp.br/tep2/ http://openthesaurus.caixamagica.pt/ Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 8 / 46 Related resources Portuguese LKBs (cont.) Public thesauri, structured on synsets TePa [Maziero et al., 2008] I I I I Synset-base of WordNet.Br [Dias da Silva et al., 2002] About 20,000 synsets Handcrafted Only covers synonymy and antonymy OpenThesaurus.PTb I a b Suggestions for OpenOffice writer http://www.nilc.icmc.usp.br/tep2/ http://openthesaurus.caixamagica.pt/ Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 8 / 46 Related resources Portuguese LKBs (cont.) Public thesauri, structured on synsets TePa [Maziero et al., 2008] I I I I Synset-base of WordNet.Br [Dias da Silva et al., 2002] About 20,000 synsets Handcrafted Only covers synonymy and antonymy OpenThesaurus.PTb I I I I a b Suggestions for OpenOffice writer Handcrafted (collaboratively) Only covers synonymy Too small (≈ 4,000 synsets) http://www.nilc.icmc.usp.br/tep2/ http://openthesaurus.caixamagica.pt/ Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 8 / 46 Related resources Portuguese LKBs (cont.) (Enhanced) public dictionaries Wiktionary.PTa I About 180,000 entries (not all in Portuguese!) Dicionário Abertob [Simões and Farinha, 2011] I I a b Electronic version of a dictionary from 1913 About 128,000 entries http://pt.wiktionary.org/ http://www.dicionario-aberto.net/search Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 9 / 46 Related resources Portuguese LKBs (cont.) (Enhanced) public dictionaries Wiktionary.PTa I I I I About 180,000 entries (not all in Portuguese!) Handcrafted (collaboratively) (Still) very incomplete Few explicit and unambiguous semantic information Dicionário Abertob [Simões and Farinha, 2011] I I a b Electronic version of a dictionary from 1913 About 128,000 entries http://pt.wiktionary.org/ http://www.dicionario-aberto.net/search Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 9 / 46 Related resources Portuguese LKBs (cont.) (Enhanced) public dictionaries Wiktionary.PTa I I I I About 180,000 entries (not all in Portuguese!) Handcrafted (collaboratively) (Still) very incomplete Few explicit and unambiguous semantic information Dicionário Abertob [Simões and Farinha, 2011] I I I I a b Electronic version of a dictionary from 1913 About 128,000 entries Static resource Few explicit and unambiguous semantic information http://pt.wiktionary.org/ http://www.dicionario-aberto.net/search Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 9 / 46 Related resources Portuguese LKBs (cont.) Other PAPELa [Gonçalo Oliveira et al., 2010] a http://www.linguateca.pt/PAPEL Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 10 / 46 Related resources Portuguese LKBs (cont.) Other PAPELa [Gonçalo Oliveira et al., 2010] I I a Lexical-semantic resource extracted automatically from one dictionary About 102,000 words, 190,000 relations http://www.linguateca.pt/PAPEL Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 10 / 46 Related resources Portuguese LKBs (cont.) Other PAPELa [Gonçalo Oliveira et al., 2010] I I I a Lexical-semantic resource extracted automatically from one dictionary About 102,000 words, 190,000 relations Structured on words, not sense-aware http://www.linguateca.pt/PAPEL Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 10 / 46 Related resources Portuguese LKBs (cont.) Other PAPELa [Gonçalo Oliveira et al., 2010] I I I I Lexical-semantic resource extracted automatically from one dictionary About 102,000 words, 190,000 relations Structured on words, not sense-aware Used in several NLP tasks: F F F F F F a Computing similarity between lexical items Adaptation of textual contents for poor literacy readers Generation of distractors for cloze questions Creation of knowledge bases for question answering and generation Creation of sentiment lexicons ... http://www.linguateca.pt/PAPEL Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 10 / 46 Related resources Onto.PT New lexical ontology for Portuguese Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 11 / 46 Related resources Onto.PT New lexical ontology for Portuguese Constructed automatically Exploitation of public resources I I I Thesauri Dictionaries/encyclopedias Corpora Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 11 / 46 Related resources Onto.PT New lexical ontology for Portuguese Constructed automatically Exploitation of public resources I I I Thesauri Dictionaries/encyclopedias Corpora Structure according to the wordnet model I I Synsets: groups of synonym words → concepts Connected by semantic relations Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 11 / 46 Related resources Onto.PT New lexical ontology for Portuguese Constructed automatically Exploitation of public resources I I I Thesauri Dictionaries/encyclopedias Corpora Structure according to the wordnet model I I Synsets: groups of synonym words → concepts Connected by semantic relations Three independent stages 1 2 3 Acquisition of semantic relations Discovery of synsets Integration (ontologisation) of relations Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 11 / 46 Relation acquisition Relation extraction from dictionaries Why dictionaries? I Earlier works [Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980] Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 12 / 46 Relation acquisition Relation extraction from dictionaries Why dictionaries? I I Earlier works [Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980] Main sources of general lexical information of a language Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 12 / 46 Relation acquisition Relation extraction from dictionaries Why dictionaries? I I I Earlier works [Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980] Main sources of general lexical information of a language Structured on words and meanings Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 12 / 46 Relation acquisition Relation extraction from dictionaries Why dictionaries? I I I I Earlier works [Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980] Main sources of general lexical information of a language Structured on words and meanings (Try to) Cover the whole language Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 12 / 46 Relation acquisition Relation extraction from dictionaries Why dictionaries? I I I I I Earlier works [Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980] Main sources of general lexical information of a language Structured on words and meanings (Try to) Cover the whole language Created by experts Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 12 / 46 Relation acquisition Relation extraction from dictionaries Why dictionaries? I I I I I I Earlier works [Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980] Main sources of general lexical information of a language Structured on words and meanings (Try to) Cover the whole language Created by experts Simple structure, (almost) predictable vocabulary Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 12 / 46 Relation acquisition Relation extraction from dictionaries Why dictionaries? I I I I I I Earlier works [Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980] Main sources of general lexical information of a language Structured on words and meanings (Try to) Cover the whole language Created by experts Simple structure, (almost) predictable vocabulary But.. I Not ready for being used as LKBs Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 12 / 46 Relation acquisition Relation extraction from dictionaries Why dictionaries? I I I I I I Earlier works [Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980] Main sources of general lexical information of a language Structured on words and meanings (Try to) Cover the whole language Created by experts Simple structure, (almost) predictable vocabulary But.. I I Not ready for being used as LKBs Static resources Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 12 / 46 Relation acquisition Relation extraction from dictionaries Why dictionaries? I I I I I I Earlier works [Calzolari et al., 1973, Amsler, 1980, Michiels et al., 1980] Main sources of general lexical information of a language Structured on words and meanings (Try to) Cover the whole language Created by experts Simple structure, (almost) predictable vocabulary But.. I I Not ready for being used as LKBs Static resources CARTÃO: Semantic relations extracted from three dictionaries! I I I Dicionário PRO da Lı́ngua Portuguesa (DLP), through PAPEL Dicionário Aberto (DA) Wiktionary.PT Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 12 / 46 Relation acquisition Pattern o mesmo que a[c]to ou efeito de pessoa que aquele que conjunto de espécie de género/gênero de variedade de [a] parte do/da qualidade de qualidade do que é estado de natural ou habitante de/da/do instrumento[,] para .. produzid[o/a] por/pel[o/a] o mesmo que fazer tornar ter o mesmo que relativo a/à/ao que se que tem diz-se de relativo ou pertencente habitante ou natural de que não é/está de modo de maneira de forma o mesmo que Gonçalo Oliveira & Gomes (CISUC) POS Noun Noun Noun Noun Noun Noun Noun Noun Noun Noun Noun Noun Noun Noun Noun Verb Verb Verb Verb Adjective Adjective Adjective Adjective Adjective Adjective Adjective Adjective Adverb Adverb Adverb Adverb CARTÃO: regularities in dictionary definitions DLP Frequency DA Wikt.PT 0 3.851 1.320 1.148 1.004 798 29 455 445 777 663 299 536 94 155 0 1.680 1.359 467 0 1.236 1.602 2.698 2.066 1.647 0 485 398 49 30 0 10.627 2.501 47 3.357 316 2.846 4.148 621 433 775 543 223 0 284 146 166 1.294 1.672 519 2.685 5.554 1.599 4.291 738 9 0 608 2.261 9 3 182 Onto.PT 1.107 645 329 545 298 223 48 52 107 126 105 73 79 25 60 97 364 266 139 197 1.063 485 477 313 61 189 98 109 36 19 21 Relation Synonymy Causation Hypernymy Hypernymy Member-of Hypernymy Hypernymy Hypernymy Part-of Quality-of Quality-of State-of Place-of Purpose-of Produtor Synonymy Causation Causation Property-of Synonymy Property-of Property-of Part-of Property-of Member-of Place-of Antonı́mia Manner-of Manner-of Manner-of Synonymy June 8, 2012 13 / 46 Relation acquisition CARTÃO: extraction examples Extraction examples candeia s.f. utensı́lio doméstico rústico usado para iluminaç~ ao, com pavio abastecido a óleo espiga s.f. parte das gramı́neas que contém os gr~ aos inquietar v.t. causar ansiedade severo adj. grave, crı́tico Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 14 / 46 Relation acquisition CARTÃO: extraction examples Extraction examples candeia s.f. utensı́lio doméstico rústico usado para iluminaç~ ao, com pavio abastecido a óleo I I utensı́lio hypernym-of candeia iluminação purpose-of candeia espiga s.f. parte das gramı́neas que contém os gr~ aos inquietar v.t. causar ansiedade severo adj. grave, crı́tico Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 14 / 46 Relation acquisition CARTÃO: extraction examples Extraction examples candeia s.f. utensı́lio doméstico rústico usado para iluminaç~ ao, com pavio abastecido a óleo I I utensı́lio hypernym-of candeia iluminação purpose-of candeia espiga s.f. parte das gramı́neas que contém os gr~ aos I I espiga part-of gramı́nea espiga contains grão inquietar v.t. causar ansiedade severo adj. grave, crı́tico Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 14 / 46 Relation acquisition CARTÃO: extraction examples Extraction examples candeia s.f. utensı́lio doméstico rústico usado para iluminaç~ ao, com pavio abastecido a óleo I I utensı́lio hypernym-of candeia iluminação purpose-of candeia espiga s.f. parte das gramı́neas que contém os gr~ aos I I espiga part-of gramı́nea espiga contains grão inquietar v.t. causar ansiedade I inquietar causation-of ansiedade severo adj. grave, crı́tico Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 14 / 46 Relation acquisition CARTÃO: extraction examples Extraction examples candeia s.f. utensı́lio doméstico rústico usado para iluminaç~ ao, com pavio abastecido a óleo I I utensı́lio hypernym-of candeia iluminação purpose-of candeia espiga s.f. parte das gramı́neas que contém os gr~ aos I I espiga part-of gramı́nea espiga contains grão inquietar v.t. causar ansiedade I inquietar causation-of ansiedade severo adj. grave, crı́tico I I grave synonym-of severo crı́tico synonym-of severo Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 14 / 46 Relation acquisition CARTÃO: extraction results CARTÃO contains: I I Relation Synonym-of Hypernym-of Part-of Member-of Causation-of Producer-of Purpose-of Has-quality Has-state Place-of Manner-of Antonym-of Property-of About 155,000 lexical items About 327,000 relations, including: Args. Quantity n,n v,v adj,adj adv,adv n,n n,n n,adj n,n adj,n n,n adj,n v,n n,n adj,n n,n v,n v,adj n,n n,adj n,n n,n adv,n adv,adj adj,adj adj,n adj,v 67,620 28,108 32,364 2,286 97,924 3,893 5,872 7,328 1,071 1,423 748 10,664 1,741 515 6,978 7,824 374 1,055 1,273 376 1,483 2,172 1,854 684 10,652 27,902 Gonçalo Oliveira & Gomes (CISUC) Example alegria,satisfação esticar,estender racional,filosófico imediatamente,já sentimento,afecto núcleo,átomo vı́cio,vicioso aluno,escola rural,campo vı́rus,doença horrı́vel,horror mover,movimento oliveira,azeitona fonador,som sustentação,mastro calcular,cálculo comprimir,compressivo mórbido,morbidez assı́duo,assiduidade exaltação,desvairo Equador,equatoriano ociosamente,indolência virtualmente,virtual direito,torto daltónico,daltonismo musculoso,ter músculo Onto.PT (joy,satisfaction) (to extend,to stretch) (rational,philosophical) (immediately,now) (feeling,affection) (nucleus,atom) (addiction,addictive) (student,school) (rural,country) (virus,disease) (horrible,horror) (to move,movement) (olive tree,olive) (phonetic,sound) (support,mast) (to calculate,calculation) (to compress,compressive) (morbid,morbidity) (assiduous,assiduity) (exaltation,rant) (Ecuador,Ecuadorian) (idly,indolence) (virtually,virtual) (straight,crooked) (daltonic,daltonism) (beefy,to have muscle) June 8, 2012 15 / 46 Relation acquisition CARTÃO: manual evaluation Results of manual evaluation 100 instances per relation type/resource (300/type) Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 16 / 46 Relation acquisition CARTÃO: manual evaluation Results of manual evaluation 100 instances per relation type/resource (300/type) 2 judges for each instance I I I wrong instance (0) wrong relation (1) correct (2) Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 16 / 46 Relation acquisition CARTÃO: manual evaluation Results of manual evaluation 100 instances per relation type/resource (300/type) 2 judges for each instance I I I wrong instance (0) wrong relation (1) correct (2) Relation n synonym-of n v synonym-of v n hypernym-of n v causation-of n adj property-of v Gonçalo Oliveira & Gomes (CISUC) Judge J1 J2 J1 J2 J1 J2 J1 J2 J1 J2 0 2 (.01) 3 (.01) 6 (.02) 7 (.02) 11 (.04) 16 (.05) 12 (.04) 15 (.05) 67 (.22) 39 (.13) Total 1 0 1 (≈0) 0 3 (.01) 19 (.06) 21 (.07) 14 (.05) 18 (.06) 21 (.07) 30 (.10) Onto.PT 2 298 (.99) 296 (.99) 294 (.98) 290 (.97) 270 (.90) 263 (.88) 274 (.91) 267 (.89) 212 (.71) 231 (.77) IAA κ 0.99 0.66 0.98 0.68 0.93 0.64 0.93 0.60 0.81 0.56 June 8, 2012 16 / 46 Relation acquisition Discussion Lexical graph Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 17 / 46 Relation acquisition Discussion Lexical graph massa synonym-of povo ∧ massa hypernym-of tortellini dinheiro synonym-of cacau ∧ fruto hypernym-of cacau Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 17 / 46 Relation acquisition Discussion Lexical graph massa synonym-of povo ∧ massa hypernym-of tortellini → povo hypernym-of tortellini dinheiro synonym-of cacau ∧ fruto hypernym-of cacau Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 17 / 46 Relation acquisition Discussion Lexical graph massa synonym-of povo ∧ massa hypernym-of tortellini → povo hypernym-of tortellini dinheiro synonym-of cacau ∧ fruto hypernym-of cacau → fruto hypernym-of dinheiro Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 17 / 46 Synset discovery Synonymy network Established by synonymy pairs Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 18 / 46 Synset discovery Synonymy network Established by synonymy pairs Propagate synonymy? Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 18 / 46 Synset discovery Synonymy network Established by synonymy pairs Propagate synonymy? I Large network (≈ 40,000 nodes for nouns, ≈ 15,000 for adjs) Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 18 / 46 Synset discovery Synonymy network Established by synonymy pairs Propagate synonymy? I I Large network (≈ 40,000 nodes for nouns, ≈ 15,000 for adjs) Large connected subgraphs (≈ 26,000 nodes for nouns, ≈ 11,000 for adjectives) Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 18 / 46 Synset discovery Synonymy network Established by synonymy pairs Propagate synonymy? I I I Large network (≈ 40,000 nodes for nouns, ≈ 15,000 for adjs) Large connected subgraphs (≈ 26,000 nodes for nouns, ≈ 11,000 for adjectives) Problems, such a: F queda synonym-of ruı́na ∧ queda synonym-of habilidade Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 18 / 46 Synset discovery Synonymy network Established by synonymy pairs Propagate synonymy? I I I Large network (≈ 40,000 nodes for nouns, ≈ 15,000 for adjs) Large connected subgraphs (≈ 26,000 nodes for nouns, ≈ 11,000 for adjectives) Problems, such a: F F queda synonym-of ruı́na ∧ queda synonym-of habilidade → ruı́na synonym-of habilidade Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 18 / 46 Synset discovery Clustering for synsets Synonymy networks extracted from dictionaries tend to have a clustered structure [Gfeller et al., 2005] Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 19 / 46 Synset discovery Clustering for synsets Synonymy networks extracted from dictionaries tend to have a clustered structure [Gfeller et al., 2005] Clusters may be seen as synsets Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 19 / 46 Synset discovery Clustering for synsets Synonymy networks extracted from dictionaries tend to have a clustered structure [Gfeller et al., 2005] Clusters may be seen as synsets Words with more than one sense → overlapping clusters! Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 19 / 46 Synset discovery Clustering algorithm Main idea: each word and its neighbourhood is a potential cluster 1 Network as a matrix M ~ v1 1 1 0 0 0 0 0 0 0 0 Gonçalo Oliveira & Gomes (CISUC) ~ v2 1 1 1 0 0 0 0 0 0 0 ~ v3 0 1 1 1 0 0 0 0 0 0 ~ v4 0 0 1 1 1 0 0 0 0 0 ~ v5 0 0 0 1 1 1 1 0 0 0 Onto.PT ~ v6 0 0 0 0 1 1 1 1 1 1 ~ v7 0 0 0 0 1 1 1 0 0 0 ~ v8 0 0 0 0 0 1 0 1 0 0 ~ v9 0 0 0 0 0 1 0 0 1 0 ~ v10 0 0 0 0 0 1 0 0 0 1 June 8, 2012 20 / 46 Synset discovery Clustering algorithm Main idea: each word and its neighbourhood is a potential cluster 1 2 Network as a matrix M Similarity matrix |V P| sim(a, b) = cos(~va , ~vb ) = v~a .v~b = s i=0 |v~a ||v~b | |V P| i=0 Gonçalo Oliveira & Gomes (CISUC) Onto.PT vai × vb i (1) vai2 × |V P| i=0 vbi2 June 8, 2012 20 / 46 Synset discovery Clustering algorithm Main idea: each word and its neighbourhood is a potential cluster 1 Network as a matrix M 2 Similarity matrix ~ v1 1.0 0.6 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Gonçalo Oliveira & Gomes (CISUC) , θ = 0.5 ~ v2 0.6 1.0 0.7 0.3 0.0 0.0 0.0 0.0 0.0 0.0 ~ v3 0.4 0.7 1.0 0.7 0.3 0.0 0.0 0.0 0.0 0.0 ~ v4 0.0 0.3 0.7 1.0 0.6 0.2 0.3 0.0 0.0 0.0 ~ v5 0.0 0.0 0.3 0.6 1.0 0.6 0.9 0.4 0.4 0.4 Onto.PT ~ v6 0.0 0.0 0.0 0.2 0.6 1.0 0.7 0.6 0.6 0.6 ~ v7 0.0 0.0 0.0 0.3 0.9 0.7 1.0 0.4 0.4 0.4 ~ v8 0.0 0.0 0.0 0.0 0.4 0.6 0.4 1.0 0.5 0.5 ~ v9 0.0 0.0 0.0 0.0 0.4 0.6 0.4 0.5 1.0 0.5 ~ v10 0.0 0.0 0.0 0.0 0.4 0.6 0.4 0.5 0.5 1.0 June 8, 2012 20 / 46 Synset discovery Clustering algorithm Main idea: each word and its neighbourhood is a potential cluster 1 Network as a matrix M 2 Similarity matrix 3 Clusters Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 20 / 46 Synset discovery Take advantage of handcrafted thesauri What about TeP? TeP is... I I I Structured on synsets Created manually Free Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 21 / 46 Synset discovery Take advantage of handcrafted thesauri What about TeP? TeP is... I I I Structured on synsets Created manually Free TeP is more complementary than overlapping with PAPEL/CARTÃO [Santos et al., 2010, Teixeira et al., 2010, Gonçalo Oliveira et al., 2011] Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 21 / 46 Synset discovery Take advantage of handcrafted thesauri What about TeP? TeP is... I I I Structured on synsets Created manually Free TeP is more complementary than overlapping with PAPEL/CARTÃO [Santos et al., 2010, Teixeira et al., 2010, Gonçalo Oliveira et al., 2011] Take advantage of TeP, instead of using it merely as a reference for comparison! Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 21 / 46 Synset discovery Take advantage of handcrafted thesauri What about TeP? TeP is... I I I Structured on synsets Created manually Free TeP is more complementary than overlapping with PAPEL/CARTÃO [Santos et al., 2010, Teixeira et al., 2010, Gonçalo Oliveira et al., 2011] Take advantage of TeP, instead of using it merely as a reference for comparison! 1 Integrate synpairs of CARTÃO in TeP synsets Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 21 / 46 Synset discovery Take advantage of handcrafted thesauri What about TeP? TeP is... I I I Structured on synsets Created manually Free TeP is more complementary than overlapping with PAPEL/CARTÃO [Santos et al., 2010, Teixeira et al., 2010, Gonçalo Oliveira et al., 2011] Take advantage of TeP, instead of using it merely as a reference for comparison! 1 2 Integrate synpairs of CARTÃO in TeP synsets Discover clusters in remaining synpairs Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 21 / 46 Synset discovery Take advantage of handcrafted thesauri What about TeP? TeP is... I I I Structured on synsets Created manually Free TeP is more complementary than overlapping with PAPEL/CARTÃO [Santos et al., 2010, Teixeira et al., 2010, Gonçalo Oliveira et al., 2011] Take advantage of TeP, instead of using it merely as a reference for comparison! 1 2 3 Integrate synpairs of CARTÃO in TeP synsets Discover clusters in remaining synpairs Add new clusters as synsets Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 21 / 46 Synset discovery Take advantage of handcrafted thesauri Assigning synpairs to synsets Starting point: I I Thesaurus T , with synsets S = {v1 , v2 , ..., vn } Synonymy network N, with synpairs p = (vx , vy ) Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 22 / 46 Synset discovery Take advantage of handcrafted thesauri Assigning synpairs to synsets Starting point: I I Thesaurus T , with synsets S = {v1 , v2 , ..., vn } Synonymy network N, with synpairs p = (vx , vy ) Goal: Synpair (alimentação, mantença) → (escravizar, servilizar ) → (permanente, inextinguı́vel) → Gonçalo Oliveira & Gomes (CISUC) Synset {sustento, alimento, mantimento, alimentação, mantença} {oprimir, tiranizar, escravizar, esmagar, servilizar} {durador, duradoiro, duradouro, durável, permanente, perdurável, inextinguı́vel} Onto.PT June 8, 2012 22 / 46 Synset discovery Take advantage of handcrafted thesauri Assigning p = (vx , vy ) to a synset C 1 Select all synsets containing one of the elements of p, ∀(Cj ∈ C ) : vx ∈ Cj ∨ vy ∈ Cj . Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 23 / 46 Synset discovery Take advantage of handcrafted thesauri Assigning p = (vx , vy ) to a synset C 1 Select all synsets containing one of the elements of p, ∀(Cj ∈ C ) : vx ∈ Cj ∨ vy ∈ Cj . 2 Synpair and candidate synsets as adjacency vectors in N Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 23 / 46 Synset discovery Take advantage of handcrafted thesauri Assigning p = (vx , vy ) to a synset C 1 Select all synsets containing one of the elements of p, ∀(Cj ∈ C ) : vx ∈ Cj ∨ vy ∈ Cj . 2 Synpair and candidate synsets as adjacency vectors in N 3 Compute the similarity between ~p and each synset Ck ∈ C : Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 23 / 46 Synset discovery Take advantage of handcrafted thesauri Assigning p = (vx , vy ) to a synset C 1 Select all synsets containing one of the elements of p, ∀(Cj ∈ C ) : vx ∈ Cj ∨ vy ∈ Cj . 2 Synpair and candidate synsets as adjacency vectors in N 3 Compute the similarity between ~p and each synset Ck ∈ C : 4 ~ ) ≥ σ ∧ sim(~p , Cbest ~ ) = max(sim(~p , C~k )). p → Cbest : sim(~p , Cbest Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 23 / 46 Synset discovery Take advantage of handcrafted thesauri Assignment settings 250 synpairs + TeP, three gold references: Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 24 / 46 Synset discovery Take advantage of handcrafted thesauri Assignment settings 250 synpairs + TeP, three gold references: I I I Annotator 1 (A1) Annotator 2 (A2) Intersection (∩) Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 24 / 46 Synset discovery Take advantage of handcrafted thesauri Assignment settings 250 synpairs + TeP, three gold references: I I I Annotator 1 (A1) Annotator 2 (A2) Intersection (∩) IAA(A1, A2) = 68%, κ(A1, A2) = 0.40 Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 24 / 46 Synset discovery Take advantage of handcrafted thesauri Assignment settings 250 synpairs + TeP, three gold references: I I I Annotator 1 (A1) Annotator 2 (A2) Intersection (∩) IAA(A1, A2) = 68%, κ(A1, A2) = 0.40 Best settings, cos(~p , C~k ) ≥ 0.15 Ref. A1 A2 ∩ Setting All Random Best All Random Best All Random Best Gonçalo Oliveira & Gomes (CISUC) Precision 44% 60% 74% 60% 68% 82% 34% 46% 64% Recall 100% 31% 34% 100% 34% 36% 100% 41% 48% RRecall 100% 65% 71% 100% 80% 85% 100% 64% 74% Onto.PT F0.5 61% 62% 73% 75% 73% 83% 51% 53% 69% RF0.5 50% 61% 74% 65% 70% 82% 39% 48% 66% June 8, 2012 24 / 46 Synset discovery Take advantage of handcrafted thesauri Clustering evaluation 1 Select random pairs of words from discovered synsets 2 Classify each pair as correct or incorrect Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 25 / 46 Synset discovery Take advantage of handcrafted thesauri Clustering evaluation 1 Select random pairs of words from discovered synsets 2 Classify each pair as correct or incorrect Using the whole synonymy network I I I 440 noun pairs Two human judges (IAA = 83%, κ = 0.43) Correct: 75% Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 25 / 46 Synset discovery Take advantage of handcrafted thesauri Clustering evaluation 1 Select random pairs of words from discovered synsets 2 Classify each pair as correct or incorrect Using the whole synonymy network I I I 440 noun pairs Two human judges (IAA = 83%, κ = 0.43) Correct: 75% Using only clusters of the network after assignment I I 330 pairs (110 nouns, 110 verbs, 110 adjectives) Two human judges F F I IAA: 96%, 85%, 95% κ: 0.73, 0.39, 0.37 Correct: 85%, 91%, 90% Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 25 / 46 Synset discovery Take advantage of handcrafted thesauri TRIP: a large thesaurus for Portuguese Thesaurus TeP 2.0 TRIP POS Noun Verb Adjective Adverb Noun Verb Adjective Adverb Gonçalo Oliveira & Gomes (CISUC) Total 17,149 8,280 14,568 1,095 45,457 11,924 22,316 2,488 Ambiguous 5,802 4,680 3,730 227 15,392 6,607 7,782 694 Onto.PT Words Avg(senses) 1.71 2.69 1.46 1.30 1.80 2.87 1.83 1.42 Max(senses) 20 50 19 11 22 52 22 12 June 8, 2012 26 / 46 Synset discovery Take advantage of handcrafted thesauri TRIP: a large thesaurus for Portuguese Thesaurus TeP 2.0 TRIP Thesaurus TeP 2.0 TRIP POS Noun Verb Adjective Adverb Noun Verb Adjective Adverb POS Noun Verb Adjective Adverb Noun Verb Adjective Adverb Gonçalo Oliveira & Gomes (CISUC) Total 17,149 8,280 14,568 1,095 45,457 11,924 22,316 2,488 Ambiguous 5,802 4,680 3,730 227 15,392 6,607 7,782 694 Total 8,254 3,899 6,062 497 16,936 4,424 7,948 813 Avg(size) 3.56 5.71 3.5 2.87 4.84 7.75 5.14 4.34 Words Avg(senses) Max(senses) 1.71 20 2.69 50 1.46 19 1.30 11 1.80 22 2.87 52 1.83 22 1.42 12 Synsets size = 2 size > 25 max(size) 3,083 0 21 907 48 53 3,032 18 43 258 0 9 5,986 226 131 873 193 132 3,127 161 117 270 1 27 Onto.PT June 8, 2012 26 / 46 Synset discovery Examples of large (noun) synsets imbecile/stupid person patamaz, boca-aberta, imbecil, lucas, malhadeiro, orate, zé-cuecas, lerdaço, tantã, boleima, babão, jato, zambana, badó, ânsar, bolônio, chapetão, parvalhão, haule, papa-moscas, lerdo, patau, sànona, perturbado, possidónio, babaquara, tolo, galafura, babuı́no, zângano, inepto, badana, cabaça, andor, pax-vóbis, idiota, pascoal-bailão, sandeu, asneirão, zé, capadócio, calino, doudivanas, pasguate, parreco, babanca, palerma, molusco, parrana, moco, ansarinho, bajoujo, burro, truão, estulto, pexote, maninelo, lérias, banana, banazola, patego, bobo, estúpido, asno, sonso, ignorante, troixa, otário, simplório, pancrácio, patola, songo-mongo, toleirão, totó, burgesso, morcão, microcéfalo, patinho, bacoco, babancas, inhenha, pàteta, néscio, matias, parvoinho, mané, anastácio, manembro, tatamba, bobalhão, bertoldo, patavina, tonto, apedeuto, pachocho, ingênuo, bocoió, simplacheirão, jerico, zote, sebastião, lorpa, atónito, patacão, pato, parvoeirão, ingénuo, papalvo, pateta, tanso, cretino, bolónio, basbaque, mentecapto, pachola, apaixonado, pasmão, pascácio, tarola, trouxa, parvo, jumento, geta, arara, gato-bravo, pedaço-de-asno, parvajola, pacóvio, laparoto, crendeiro, loura Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 27 / 46 Synset discovery Examples of large (noun) synsets alcoholic intoxication torcida, embriagamento, veneno, mona, zurca, trapisonda, lontra, rosca, perua, raposada, rola, tertúlia, carraspana, peleira, pizorga, cabra, chuva, tachada, caroça, ardina, girgolina, égua, carrega, zerenamora, rasca, touca, venena, gardunho, ema, porre, ebriez, carapanta, chiba, ebriedade, bico, inebriamento, bebedeira, carrapata, penca, taçada, canja, garça, ganso, tortelia, turca, cabrita, mela, resina, senisga, bebedice, bezana, vinhaça, zangurrina, bêbeda, bibra, borrachice, zuca, coca, torta, doninha, piela, graxa, trabuzana, água, cegonha, gateira, bicancra, samatra, galinhola, gata, pala, ganza, pifão, bode, cobra, prego, zola, nêspera, narda, parrascana, vinho, gardinhola, tropecina, embriaguez, cardina, tiorga, temulência, narceja, pisorga, grossura, dosa, trovoada, carneira, perunca, bruega, canjica, raposa, garrana, raposeira, cartola, cachorra, entusiasmo, carpanta, piteira, borracheira, cabeleira, carrocha, pifo, camoeca, marta, cachaceira, zangurriana, verniz, carrada Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 28 / 46 Synset discovery Examples of large (noun) synsets money jimbo, pastel, guines, baguines, parrolo, marcaureles, ouro, grana, arame, massaroca, tutu, metal, bagalho, bilhestres, milho, jan-da-cruz, china, cum-quibus, cobre, mussuruco, pilim, pasta, bagaço, zerzulho, painço, chelpa, finanças, calique, tostão, pecuniária, bagalhoça, boro, dieiro, pila, gaita, pataco, verba, cacau, matambira, gimbo, cunques, caroço, fanfa, maco, pecúnia, estilha, jibungo, roço, massa, dinheiro, maquia, bago, teca, pecunia, quantia, espécie, guita, patacaria, carcanhol, pingo Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 29 / 46 Ontologisation of semantic relations Moving from term-based to synset-based relations Goal: move from a R b ∈ G to A R B, A ∈ T , B ∈ T Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 30 / 46 Ontologisation of semantic relations Moving from term-based to synset-based relations Goal: move from a R b ∈ G to A R B, A ∈ T , B ∈ T I porta part-of carro → {porta, entrada, portão} part-of {carro, automóvel} Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 30 / 46 Ontologisation of semantic relations Moving from term-based to synset-based relations Goal: move from a R b ∈ G to A R B, A ∈ T , B ∈ T I porta part-of carro → {porta, entrada, portão} part-of {carro, automóvel} Available information: I I Thesaurus T , with synsets Relational triples between terms, in a lexical graph G Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 30 / 46 Ontologisation of semantic relations Moving from term-based to synset-based relations Goal: move from a R b ∈ G to A R B, A ∈ T , B ∈ T I porta part-of carro → {porta, entrada, portão} part-of {carro, automóvel} Available information: I I Thesaurus T , with synsets Relational triples between terms, in a lexical graph G Output: semantic graph, wordnet W I I Same relations as in G But between synsets Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 30 / 46 Ontologisation of semantic relations Ontologising algorithms Related Proportion (RP) Number of Triples (TP) Average Cosine (AC) Related Proportion + Average Cosine (RP+AC) Number of Triples + Average Cosine (NT+AC) Minimum distance (MD) PageRank (PR) Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 31 / 46 Ontologisation of semantic relations Evaluating the ontologising algorithms Gold reference I I Thesaurus: TeP + OpenThesaurus.PT Term-based triples: 452 (hypernymy, part-of, purpose-of), from PAPEL Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 32 / 46 Ontologisation of semantic relations Evaluating the ontologising algorithms Gold reference I I I Thesaurus: TeP + OpenThesaurus.PT Term-based triples: 452 (hypernymy, part-of, purpose-of), from PAPEL All possible attachments tb-triple = (documento hypernym-of recibo) A1 : documento, declaração (document, declara- B1 : recibo, comprovante, nota, tion) quitação, senha (receipt, confirming, note, quittance) A2 : escritura, documento (deed, document) tb-triple = (planta part-of floresta) A1 : relação, quadro, planta, mapa (relation, B1 : bosque, floresta, mata, brenha, selva frame, plant, map) (hood, forest, jungle) A2 : vegetal, planta (vegetable, plant) A3 : traçado, desenho, projeto, planta, plano (design, project, plant, plan) Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 32 / 46 Ontologisation of semantic relations Evaluating the ontologising algorithms Compared to attachments using... I I Ontologising algorithms, random candidate baseline CARTÃO Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 33 / 46 Ontologisation of semantic relations Evaluating the ontologising algorithms Compared to attachments using... I I Ontologising algorithms, random candidate baseline CARTÃO Best algorithms: Relation Hypernym-of (210 tb-triples) Part-of (175 tb-triples) Purpose-of (67 tb-triples) Gonçalo Oliveira & Gomes (CISUC) Algorithm Random RP AC RP+AC Random RP AC RP+AC Random RP AC RP+AC P% 42.1 53.3 60.6 55.8 47.4 56.9 58.7 63.3 44.8 51.5 63.2 63.4 Onto.PT R% 10.7 12.4 15.8 14.8 12.6 10.6 14.9 16.3 9.0 5.1 13.0 13.6 F1 % 17.1 20.0 25.1 23.4 19.9 17.9 23.8 25.9 15.0 9.3 21.5 22.3 F0.5 % 26.5 32.1 38.7 35.9 30.6 30.4 37.0 40.1 25.0 18.3 35.6 36.5 Fr % 42.1 49.9 60.3 55.8 47.4 47.0 58.7 63.3 44.8 32.6 63.2 63.4 June 8, 2012 33 / 46 Ontologisation of semantic relations Evaluating the ontologising algorithms RP+AC, a R b 1 To ontologise (θ = 0.5) a/b, fix b/a Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 34 / 46 Ontologisation of semantic relations Evaluating the ontologising algorithms RP+AC, a R b 1 To ontologise (θ = 0.5) a/b, fix b/a [Example] for each Ai ∈ A I A1 = (a, c, d, e), pa1 = I A2 = (a, f , g ), pa2 = I A3 = (a, h, i, j), pa3 = 2 Gonçalo Oliveira & Gomes (CISUC) Onto.PT 3 4 2 3 1 4 June 8, 2012 34 / 46 Ontologisation of semantic relations Evaluating the ontologising algorithms RP+AC, a R b 1 To ontologise (θ = 0.5) a/b, fix b/a [Example] for each Ai ∈ A I A1 = (a, c, d, e), pa1 = I A2 = (a, f , g ), pa2 = I A3 = (a, h, i, j), pa3 = 2 3 3 4 2 3 1 4 pa1 = max(pai ) ∧ pa1 ≥ θ, a → A1 Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 34 / 46 Ontologisation of semantic relations Evaluating the ontologising algorithms RP+AC, a R b 1 To ontologise (θ = 0.5) a/b, fix b/a [Example] for each Ai ∈ A I A1 = (a, c, d, e), pa1 = I A2 = (a, f , g ), pa2 = I A3 = (a, h, i, j), pa3 = 2 3 4 3 4 2 3 1 4 pa1 = max(pai ) ∧ pa1 ≥ θ, a → A1 If no suitable Ai or Bj , represent candidate synsets as matrices in N: I ~ i = {~ai0 , ...,~ain }, n = |Ai | A I ~ j = {~bj0 , ..., ~bjm }, m = |Bj | B Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 34 / 46 Ontologisation of semantic relations Evaluating the ontologising algorithms RP+AC, a R b 1 To ontologise (θ = 0.5) a/b, fix b/a [Example] for each Ai ∈ A I A1 = (a, c, d, e), pa1 = I A2 = (a, f , g ), pa2 = I A3 = (a, h, i, j), pa3 = 2 3 4 5 3 4 2 3 1 4 pa1 = max(pai ) ∧ pa1 ≥ θ, a → A1 If no suitable Ai or Bj , represent candidate synsets as matrices in N: I ~ i = {~ai0 , ...,~ain }, n = |Ai | A I ~ j = {~bj0 , ..., ~bjm }, m = |Bj | B Compute the average similarity of the elements of each pair of ~ i, B ~j) synsets: cos(A Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 34 / 46 Ontologisation of semantic relations Evaluating the ontologising algorithms RP+AC, a R b 1 To ontologise (θ = 0.5) a/b, fix b/a [Example] for each Ai ∈ A I A1 = (a, c, d, e), pa1 = I A2 = (a, f , g ), pa2 = I A3 = (a, h, i, j), pa3 = 2 3 4 3 4 2 3 1 4 pa1 = max(pai ) ∧ pa1 ≥ θ, a → A1 If no suitable Ai or Bj , represent candidate synsets as matrices in N: I ~ i = {~ai0 , ...,~ain }, n = |Ai | A I ~ j = {~bj0 , ..., ~bjm }, m = |Bj | B 5 Compute the average similarity of the elements of each pair of ~ i, B ~j) synsets: cos(A 6 Select the most similar pair (Ax , By ) : cos(Ax , By ) = max(cos(Ai ∈ A, Bj ∈ B)) Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 34 / 46 Approach summary From dictionaries to a wordnet in three steps Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 35 / 46 Approach summary From dictionaries to a wordnet in three steps 1 gado s.m. conjunto de animais criados para diversos fins; rebanho Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 35 / 46 Approach summary From dictionaries to a wordnet in three steps 1 gado s.m. conjunto de animais criados para diversos fins; rebanho I I tb triple1 = rebanho SINONIMO DE gado tb triple2 = animal MEMBRO DE gado Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 35 / 46 Approach summary From dictionaries to a wordnet in three steps 1 gado s.m. conjunto de animais criados para diversos fins; rebanho I I 2 tb triple1 = rebanho SINONIMO DE gado tb triple2 = animal MEMBRO DE gado synset1 = (manada, rebanho, mancheia, boiada) I +tb triple1 = (manada, rebanho, mancheia, boiada, gado) Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 35 / 46 Approach summary From dictionaries to a wordnet in three steps 1 gado s.m. conjunto de animais criados para diversos fins; rebanho I I tb triple1 = rebanho SINONIMO DE gado tb triple2 = animal MEMBRO DE gado 2 synset1 = (manada, rebanho, mancheia, boiada) 3 synset2 = (bicho, animal, alimal, béstia, minante) I I +tb triple1 = (manada, rebanho, mancheia, boiada, gado) sb triple1 = synset2 MEMBRO DE synset1 Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 35 / 46 Presenting Onto.PT v.0.3.1 Synsets About 150,000 lexical items Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 36 / 46 Presenting Onto.PT v.0.3.1 Synsets About 150,000 lexical items Organised in about 110,000 synsets Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 36 / 46 Presenting Onto.PT v.0.3.1 Synsets About 150,000 lexical items Organised in about 110,000 synsets Synsets are ordered according to AC/DC [Santos and Bick, 2000] frequency of their words I Words inside synsets are ordered according to their AC/DC frequency POS Nouns Verbs Adjectives Adverbs Gonçalo Oliveira & Gomes (CISUC) size > 1 19.211 3.998 7.272 710 Onto.PT Synsets size = 1 45.654 21.344 10.680 1.283 Total 64.865 25.342 17.952 1.993 June 8, 2012 36 / 46 Presenting Onto.PT v.0.3.1 Relations (excluding inverse) About 170,000 relations Same types as in PAPEL/CARTÃO Relations Predicates Instances Hypernym n hiperonimoDe n n parteDe n n parteDeAlgoComProp adj adj propDeAlgoParteDe n n membroDe n n membroDeAlgoComProp adj adj propDeAlgoMembroDe n n contidoEm n n contidoEmAlgoComProp adj n materialDe n n causadorDe n n causadorDeAlgoComProp adj adj propDeAlgoQueCausa n n causadorDaAccao v v accaoQueCausa n n localOrigemDe n adj antonimoAdjDe adj 83,552 3,672 4,911 91 5,847 106 909 355 264 835 1,347 26 619 56 8,052 1,293 538 Part Member Contains Material Causation Place Antonym Gonçalo Oliveira & Gomes (CISUC) Relations Producer Purpose Quality State Manner Manner without Property Onto.PT Predicates Instances n produtorDe n n produtorDeAlgoComProp adj adj propDeAlgoProdutorDe n n fazSeCom n n fazSeComAlgoComProp adj v finalidadeDe n v finalidadeDeAlgoComProp adj n temQualidade n n devidoAQualidade adj n temEstado n n devidoAEstado adj adv maneiraPorMeioDe n adv maneiraComProp adj adv maneiraSem n adv maneiraSemAccao v adj dizSeSobre n adj dizSeDoQue v 1,718 88 529 6,551 79 7,271 322 934 1,059 327 197 1,833 1,561 216 14 9,145 25,014 June 8, 2012 37 / 46 Presenting Onto.PT v.0.3.1 Onto.PT as a Semantic Web model Adaptation of the W3C WordNet RDF/OWL Basic [van Assem et al., 2006] Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 38 / 46 Presenting Onto.PT v.0.3.1 OntoBusca: Onto.PT’s interface Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 39 / 46 Presenting Onto.PT v.0.3.1 Usage example Onto.PT for query expansion 1 Disambiguate the head of the query I I I WSD algorithm, e.g. Personalized PageRank [Agirre and Soroa, 2009] Use the words in the query as context Select a suitable synset for the head Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 40 / 46 Presenting Onto.PT v.0.3.1 Usage example Onto.PT for query expansion 1 Disambiguate the head of the query I I I 2 WSD algorithm, e.g. Personalized PageRank [Agirre and Soroa, 2009] Use the words in the query as context Select a suitable synset for the head Use the words of the synset as search alternatives Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 40 / 46 Presenting Onto.PT v.0.3.1 Usage example Onto.PT for query expansion 1 Disambiguate the head of the query I I I 2 WSD algorithm, e.g. Personalized PageRank [Agirre and Soroa, 2009] Use the words in the query as context Select a suitable synset for the head Use the words of the synset as search alternatives Approach to the joint evaluation Págico [Rodrigues et al., 2012] I Runs with WSD performed better than without Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 40 / 46 Presenting Onto.PT v.0.3.1 Usage example Onto.PT for query expansion 1 Disambiguate the head of the query I I I 2 WSD algorithm, e.g. Personalized PageRank [Agirre and Soroa, 2009] Use the words in the query as context Select a suitable synset for the head Use the words of the synset as search alternatives Approach to the joint evaluation Págico [Rodrigues et al., 2012] I Runs with WSD performed better than without Examples: I Doces brasileiros que têm origem nos doces portugueses I Doenças letais comuns em paı́ses lusófonos transmitidas por mosquitos Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 40 / 46 Presenting Onto.PT v.0.3.1 Usage example Onto.PT for query expansion 1 Disambiguate the head of the query I I I 2 WSD algorithm, e.g. Personalized PageRank [Agirre and Soroa, 2009] Use the words in the query as context Select a suitable synset for the head Use the words of the synset as search alternatives Approach to the joint evaluation Págico [Rodrigues et al., 2012] I Runs with WSD performed better than without Examples: I Doces brasileiros que têm origem nos doces portugueses I Doenças letais comuns em paı́ses lusófonos transmitidas por mosquitos F F doce OR confeito OR guloseima OR gulodice ... doença OR mal-estar OR enfermidade OR mal OR patologia OR distúrbio OR padecimento ... Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 40 / 46 Concluding remarks Main contributions CARTÃO, the largest lexical graph for Portuguese I Larger than PAPEL Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 41 / 46 Concluding remarks Main contributions CARTÃO, the largest lexical graph for Portuguese I Larger than PAPEL TRIP, the largest Portuguese thesaurus I I Larger than TeP Alternative to OpenThesaurus.PT in OpenOffice Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 41 / 46 Concluding remarks Main contributions CARTÃO, the largest lexical graph for Portuguese I Larger than PAPEL TRIP, the largest Portuguese thesaurus I I Larger than TeP Alternative to OpenThesaurus.PT in OpenOffice Onto.PT, a new public lexical ontology I I Created automatically, higher growth potential An addition or alternative to existing Portuguese LKBs Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 41 / 46 Concluding remarks Main contributions CARTÃO, the largest lexical graph for Portuguese I Larger than PAPEL TRIP, the largest Portuguese thesaurus I I Larger than TeP Alternative to OpenThesaurus.PT in OpenOffice Onto.PT, a new public lexical ontology I I Created automatically, higher growth potential An addition or alternative to existing Portuguese LKBs A flexible approach, that enables the integration of several resources I May be adapted to the construction/enrichment of wordnets in other languages Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 41 / 46 Concluding remarks Future Onto.PT is in constant development! I 2 New version coming soon... http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 42 / 46 Concluding remarks Future Onto.PT is in constant development! I I 2 New version coming soon... Improvement of each construction step http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 42 / 46 Concluding remarks Future Onto.PT is in constant development! I I I 2 New version coming soon... Improvement of each construction step Augmentation by exploiting other resources (e.g. Wikipedia) http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 42 / 46 Concluding remarks Future Onto.PT is in constant development! I I I I 2 New version coming soon... Improvement of each construction step Augmentation by exploiting other resources (e.g. Wikipedia) Associate definitions/example sentences with synsets [Henrich et al., 2011] http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 42 / 46 Concluding remarks Future Onto.PT is in constant development! I I I I New version coming soon... Improvement of each construction step Augmentation by exploiting other resources (e.g. Wikipedia) Associate definitions/example sentences with synsets [Henrich et al., 2011] More evaluation: I I I 2 Quality, e.g. manual evaluation of parts of the resource Coverage, e.g. mapping with the Global WordNet base concepts Utility, e.g. utilisation in (more) NLP tasks 2 http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 42 / 46 Concluding remarks Future Onto.PT is in constant development! I I I I New version coming soon... Improvement of each construction step Augmentation by exploiting other resources (e.g. Wikipedia) Associate definitions/example sentences with synsets [Henrich et al., 2011] More evaluation: I I I Quality, e.g. manual evaluation of parts of the resource Coverage, e.g. mapping with the Global WordNet base concepts Utility, e.g. utilisation in (more) NLP tasks 2 Availability I 2 Updates and other resources in http://ontopt.dei.uc.pt http://www.globalwordnet.org/gwa/ewn to bc/corebcs.html Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 42 / 46 References References I [Agirre and Soroa, 2009] Agirre, E. and Soroa, A. (2009). Personalizing PageRank for word sense disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL’09, pages 33–41, Stroudsburg, PA, USA. ACL Press. [Amsler, 1980] Amsler, R. A. (1980). The structure of the Merriam-Webster Pocket dictionary. PhD thesis, The University of Texas at Austin. [Calzolari et al., 1973] Calzolari, N., Pecchia, L., and Zampolli, A. (1973). Working on the italian machine dictionary: a semantic approach. In Proceedings of 5th Conference on Computational Linguistics, COLING’73, pages 49–52, Morristown, NJ, USA. ACL Press. [Dias da Silva et al., 2002] Dias da Silva, B. C., de Oliveira, M. F., and de Moraes, H. R. (2002). Groundwork for the Development of the Brazilian Portuguese Wordnet. In Advances in Natural Language Processing (PorTAL 2002), LNAI, pages 189–196, Berlin/Heidelberg. Springer. [Fellbaum, 1998] Fellbaum, C., editor (1998). WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press. [Gfeller et al., 2005] Gfeller, D., Chappelier, J.-C., and Rios, P. D. L. (2005). Synonym Dictionary Improvement through Markov Clustering and Clustering Stability. In Proceedings of International Symposium on Applied Stochastic Models and Data Analysis, ASMDA 2005, pages 106–113. [Gonçalo Oliveira et al., 2011] Gonçalo Oliveira, H., Antón Pérez, L., Costa, H., and Gomes, P. (2011). Uma rede léxico-semântica de grandes dimensões para o português, extraı́da a partir de dicionários electrónicos. Linguamática, 3(2):23–38. Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 43 / 46 References References II [Gonçalo Oliveira et al., 2010] Gonçalo Oliveira, H., Santos, D., and Gomes, P. (2010). Extracção de relações semânticas entre palavras a partir de um dicionário: o PAPEL e sua avaliação. Linguamática, 2(1):77–93. [Henrich et al., 2011] Henrich, V., Hinrichs, E., and Vodolazova, T. (2011). Semi-automatic extension of germanet with sense definitions from wiktionary. In Proceedings of 5th Language & Technology Conference, LTC 2011, pages 126–130, Poznan, Poland. [Hirst, 2004] Ontology In Staab, 209–230. Hirst, G. (2004). and the lexicon. S. and Studer, R., editors, Handbook on Ontologies, International Handbooks on Information Systems, pages Springer. [Marrafa, 2002] Marrafa, P. (2002). Portuguese Wordnet: general architecture and internal semantic relations. DELTA, 18:131–146. [Maziero et al., 2008] Maziero, E. G., Pardo, T. A. S., Felippo, A. D., and Dias-da-Silva, B. C. (2008). A Base de Dados Lexical e a Interface Web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In VI Workshop em Tecnologia da Informação e da Linguagem Humana (TIL), pages 390–392. [Michiels et al., 1980] Michiels, A., Mullenders, J., and Noël, J. (1980). Exploiting a large data base by Longman. In Proceedings of the 8th conference on Computational Linguistics, COLING’80, pages 374–382, Morristown, NJ, USA. ACL Press. [Pianta et al., 2002] Pianta, E., Bentivogli, L., and Girardi, C. (2002). MultiWordNet: developing an aligned multilingual database. In 1st International Conference on Global WordNet. Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 44 / 46 References References III [Rodrigues et al., 2012] Rodrigues, R., Gonçalo Oliveira, H., and Gomes, P. (2012). Uma abordagem ao Págico baseada no processamento e análise de sintagmas dos tópicos. Linguamática, 4(1):31–39. [Santos et al., 2010] Santos, D., Barreiro, A., Freitas, C., Gonçalo Oliveira, H., Medeiros, J. C., Costa, L., Gomes, P., and Silva, R. (2010). Relações semânticas em português: comparando o TeP, o MWN.PT, o Port4NooJ e o PAPEL. In Textos seleccionados. XXV Encontro Nacional da Associação Portuguesa de Linguı́stica, pages 681–700. APL, Lisboa, Portugal. [Santos and Bick, 2000] Santos, D. and Bick, E. (2000). Providing Internet access to Portuguese corpora: the AC/DC project. In Proceedings of 2nd International Conference on Language Resources and Evaluation, LREC 2000, pages 205–210. [Simões and Farinha, 2011] Simões, A. and Farinha, R. (2011). Dicionário Aberto: Um novo recurso para PLN. Vice-Versa, pages 159–171. [Teixeira et al., 2010] Teixeira, J., Sarmento, L., and Oliveira, E. (2010). Comparing verb synonym resources for portuguese. In Proceedings of Computational Processing of the Portuguese Language, 9th International Conference, PROPOR 2010, volume 6001 of LNAI, pages 100–109. Springer. [van Assem et al., 2006] van Assem, M., Gangemi, A., and Schreiber, G. (2006). RDF/OWL representation of WordNet. W3c working draft, World Wide Web Consortium. [Vossen, 1997] Vossen, P. (1997). EuroWordNet: a multilingual database for information retrieval. In Proceedings of DELOS workshop on Cross-Language Information Retrieval, Zurich. Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 45 / 46 The end Thank you! Check http://ontopt.dei.uc.pt Gonçalo Oliveira & Gomes (CISUC) Onto.PT June 8, 2012 46 / 46
Documentos relacionados
Exploring Onto.PT
[email protected] Cognitive & Media Systems Group CISUC, University of Coimbra, Portugal a
Leia maisOnto.PT: integrating lexical-semantic knowledge to build a public
Words → lexicon Meanings → concepts → ontology
Leia mais