Semantic modeling of collocations for lexicographic purposes

Transcrição

Semantic modeling of collocations for lexicographic purposes
Semantic modeling of collocations for
lexicographic purposes
Lothar Lemnitzer, Alexander Geyken
Berlin-Brandenburgische Akademie der
Wissenschaften
CCLCC Workshop at ESSLLI 2014
Tübingen, 14 Aug 2014
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Content
• 
• 
• 
Context and motivation
Tools, data, and theoretical framework
Expected impact
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
The BB Academy of Sciences
Founded 1700 by Gottfried Wilhelm Leibniz
Preußische Akademie à Deutsche Akademie à
AW der DDR à BB Akademie der
Wissenschaften
Organisational division in „Clusters“, e.g. Alte Welt,
Preußen, Zentrum Sprache
Hosts many long-term projects ( > 25 years)
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Digitales Wörterbuch der
deutschen Sprache (DWDS)
Long-term project (2007 - 2024)
10 researchers (lexicographers, corpus linguists,
computational linguists)
Goal:
Bringing a dictionary of contemporary German upto-date
Integrating resources into a Digital Lexical System
(www.dwds.de)
•  Dictionaries, Corpora, Statistics
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Legacy: Wörterbuch der
deutschen Gegenwartssprache
• 
• 
• 
• 
Compiled between
1961 and 1977
90 000 full entries
~ 230.000 usage
examples and
collocation groups
Oudated (missing:
Handy, Smartphone,
downloaden ….)!
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Lexicographic goals - 2018
Add: ~ 20.000 full entries, 25.000 base entries
Information program for full entries includes:
•  Collocations (drawn from a word profile)
•  Citations (drawn from a corpus of contemporary
German)
Approach: Intellectual work, supported by tools
(one of which is the Wortprofil)
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
word profile („Sketch Engine“)
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Authoring enviroment (Oxygen)
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Phase 1: Testing
Phase 1: Testing (- 2012)
•  136 sample articles (incl. Allergie, Jeans, Handy)
•  ~ 800 collocations drawn form the word profile
Phase 2: Production (2013 – now)
~ 2500 articles written
~ 5-6 collocations/entry
Projection: > 100.000 collocations for 20.000
entries
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Requirements
To make collocational information more easy to
handle for the user, we need to:
• 
• 
Group the collocates according to their syntactic
relation with the headword / base
Group the collocates according to their semantic
relation with the headword / base
We need an appropriate model of collocation
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Model of collocation
R(B,C) (binary relation, for the sake of simplicity)
Collocate: Shall we group collocation according to
their semantic traits? Intuition: light, red, blue…
is somehow distinct from worn out, washed out
R: can we provide a fixed set of (semantic)
relations between base and collocate (Lexical
Functions?)
Base: are we able to inherit c-relations along a
hierarchy of bases?
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
R
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Lexical Function
The concept was introduced by Mel‘čuk (Meaning
Text Theory)
•  Prototypical LF: MAGN
MAGN(Raucher) = stark
MAGN(smoker) = heavy
„Universal“ set of ~ 60 lexical elementary functions
N Combinations of elementary functions
Language specific sets of values for individual relations
Used for lexicographical work, e.g. DEC
• 
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Ex.: Allergie / allergy
The base denotes an overreaction of the immune
system to sth, a durable state which is caused
by sth
Applicable LF
MAGN(Allergie) = schwer, heftig (severe)
S0_Incep_Func(Allergie) = Entstehung, Entstehen
(development)
S0_Caus_Oper(Allergie) = auslösen (cause)
Propt(Allergie) = gegen (to)
Labor (Allergie) = leiden_an (suffer_from)
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Ex.: Jeans
The base denotes a piece of wearable textile, a
garment, i.e. an artifact
Applicable LF
Ver_Pred_Minus = hauteng (clinging),
Magn_Ver_Pred_Minus = knalleng (skin-tight)
Q: What to do with: schwarze Jeans (black), Jeans
mit Streifen (striped)
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Open questions / issues with LF
Do the (combinations of) LF suffice to describe
all collocations (to an appropriate level of
specifity)?
•  Are LF better suited for some (classes) of bases
(e.g. bases denoting events and states) than for
others (e.g. bases denoting artifacts)?
•  Are we able to „translate“ the LF labels into
expressions which the average user is able to
understand (following work of Polguère 2000)
à Broaden our view by looking into other
theoretical frameworks, e.g. Generative Lexicon
14.8.14
• 
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
B
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Inheritance of collocations
Q: Are we able to model an inheritance relation for
collocates along a lexical hierarchy of bases
Starting point: GermaNet
To do: find and extract collocates which are shared
by two (closely related) bases
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Ex. 1: co-hyponyms
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Ex.: hyponym / hypernym
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Expected impact
Large set of manually selected and annotated
collocations, could serve as a starting point for
building multilingual resources
A tool for generating (contrastive) collocational
profiles (web service, adaptable to other
languages)
The model should be transferable to other lexicalsemantic resources à networks of syntagmatic
relations
This is work in progress, with the risk of
failure, still more questions than answers 14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de
Thank you for your attention
http://www.dwds.de
Correspondence: [email protected]
14.8.14
Berlin-Brandenburgische Akademie der Wissenschaften • Jägerstrasse 22/23 • 10117 Berlin
www.bbaw.de

Documentos relacionados