Semantic modeling of collocations for lexicographic purposes
Transcrição
Semantic modeling of collocations for lexicographic purposes
Semantic modeling of collocations for lexicographic purposes Lothar Lemnitzer, Alexander Geyken Berlin-Brandenburgische Akademie der Wissenschaften CCLCC Workshop at ESSLLI 2014 Tübingen, 14 Aug 2014 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Content • • • Context and motivation Tools, data, and theoretical framework Expected impact 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de The BB Academy of Sciences Founded 1700 by Gottfried Wilhelm Leibniz Preußische Akademie à Deutsche Akademie à AW der DDR à BB Akademie der Wissenschaften Organisational division in „Clusters“, e.g. Alte Welt, Preußen, Zentrum Sprache Hosts many long-term projects ( > 25 years) 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Digitales Wörterbuch der deutschen Sprache (DWDS) Long-term project (2007 - 2024) 10 researchers (lexicographers, corpus linguists, computational linguists) Goal: Bringing a dictionary of contemporary German upto-date Integrating resources into a Digital Lexical System (www.dwds.de) • Dictionaries, Corpora, Statistics 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Legacy: Wörterbuch der deutschen Gegenwartssprache • • • • Compiled between 1961 and 1977 90 000 full entries ~ 230.000 usage examples and collocation groups Oudated (missing: Handy, Smartphone, downloaden ….)! 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Lexicographic goals - 2018 Add: ~ 20.000 full entries, 25.000 base entries Information program for full entries includes: • Collocations (drawn from a word profile) • Citations (drawn from a corpus of contemporary German) Approach: Intellectual work, supported by tools (one of which is the Wortprofil) 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de word profile („Sketch Engine“) 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Authoring enviroment (Oxygen) 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Phase 1: Testing Phase 1: Testing (- 2012) • 136 sample articles (incl. Allergie, Jeans, Handy) • ~ 800 collocations drawn form the word profile Phase 2: Production (2013 – now) ~ 2500 articles written ~ 5-6 collocations/entry Projection: > 100.000 collocations for 20.000 entries Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Requirements To make collocational information more easy to handle for the user, we need to: • • Group the collocates according to their syntactic relation with the headword / base Group the collocates according to their semantic relation with the headword / base We need an appropriate model of collocation 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Model of collocation R(B,C) (binary relation, for the sake of simplicity) Collocate: Shall we group collocation according to their semantic traits? Intuition: light, red, blue… is somehow distinct from worn out, washed out R: can we provide a fixed set of (semantic) relations between base and collocate (Lexical Functions?) Base: are we able to inherit c-relations along a hierarchy of bases? 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de R 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Lexical Function The concept was introduced by Mel‘čuk (Meaning Text Theory) • Prototypical LF: MAGN MAGN(Raucher) = stark MAGN(smoker) = heavy „Universal“ set of ~ 60 lexical elementary functions N Combinations of elementary functions Language specific sets of values for individual relations Used for lexicographical work, e.g. DEC • 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Ex.: Allergie / allergy The base denotes an overreaction of the immune system to sth, a durable state which is caused by sth Applicable LF MAGN(Allergie) = schwer, heftig (severe) S0_Incep_Func(Allergie) = Entstehung, Entstehen (development) S0_Caus_Oper(Allergie) = auslösen (cause) Propt(Allergie) = gegen (to) Labor (Allergie) = leiden_an (suffer_from) 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Ex.: Jeans The base denotes a piece of wearable textile, a garment, i.e. an artifact Applicable LF Ver_Pred_Minus = hauteng (clinging), Magn_Ver_Pred_Minus = knalleng (skin-tight) Q: What to do with: schwarze Jeans (black), Jeans mit Streifen (striped) 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Open questions / issues with LF Do the (combinations of) LF suffice to describe all collocations (to an appropriate level of specifity)? • Are LF better suited for some (classes) of bases (e.g. bases denoting events and states) than for others (e.g. bases denoting artifacts)? • Are we able to „translate“ the LF labels into expressions which the average user is able to understand (following work of Polguère 2000) à Broaden our view by looking into other theoretical frameworks, e.g. Generative Lexicon 14.8.14 • Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de B 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Inheritance of collocations Q: Are we able to model an inheritance relation for collocates along a lexical hierarchy of bases Starting point: GermaNet To do: find and extract collocates which are shared by two (closely related) bases 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Ex. 1: co-hyponyms 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Ex.: hyponym / hypernym 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Expected impact Large set of manually selected and annotated collocations, could serve as a starting point for building multilingual resources A tool for generating (contrastive) collocational profiles (web service, adaptable to other languages) The model should be transferable to other lexicalsemantic resources à networks of syntagmatic relations This is work in progress, with the risk of failure, still more questions than answers 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de Thank you for your attention http://www.dwds.de Correspondence: [email protected] 14.8.14 Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin www.bbaw.de