A Mapping of CIDOC CRM Events to German Wordnet for Event
Detection in Texts
Martin Scholz
Friedrich-Alexander-University Erlangen-Nürnberg Digital Humanities Research Group
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 2
Outline
●
Motivation: information extraction from free text
●
GermaNet ― a German wordnet
●
Simple mapping CRM events → GermaNet
●
Fine-grained mapping
●
Statistics & evaluation
●
Difficulties with modelling event mentions
Semantic Annotation of Free Text
●
In CH documentation, information is often encoded as free text.
●
Poorly machine-processable
●
Full text search
●
Extract information as structured data
●
Entities, events and relations
●
Poor quality with automatic extraction
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 4
Free text ― Example
Die Tafel zeigt die legendäre, drei Tage und zwei Nächte dauernde Schlacht.
Ein Engel kam zu Hilfe und führte den Sieg herbei.
Zum Dank stiftete Karl das Schottenkloster in Regensburg.
1514 entbrannte um dessen Fortbestand ein Streit zwischen Bischof, Kaiser und Stadtrat.
The panel shows the legendary, three days long battle.
An angel came to their aid and led them to victory.
In return, Karl founded the Schottenkloster in Regensburg.
In 1514, the quesion about its continued existence gave rise to a conflict
between the bishop, emperor and the city council.
Use case: WissKI
●
Semi-automatic information extraction
●
Not unsupervised
●
User annotates text
●
From annotations structured data is generated
●
Machine gives annotation proposals
●
Not only one best solution, but
●
Choice between possible/likely annotations
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 6
WissKI ― Text annotation
Text analysis pipe
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 8
Building a lexicon
●
Lexicon-based detection of events (in its core)
●
Lexicon of words that support an event class
●
Lexicon creation
●
Manual creation is time-consuming
●
No corpus to build from
●
Idea: use an intermediate lexicon for bootstrapping
GermaNet
●
A word net for German
●
Similar to Princeton Wordnet
●
Word (sense) ↔ Synset
●
Relations between synsets
●
Hyponymy, Meronymy, ...
●
~ 75k synsets, ~ 100k word senses
●
This work uses version 7 (latest version: 8)
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 10
GermaNet ― Example
GermaNet ― Example
This supports
a E67 Birth
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 12
A simple mapping
<class name="ecrm:E67_Birth">
<synset pos="v" word="gebären" sense="1" />
<synset pos="n" word="Geburt" sense="1" />
<synset pos="n" word="Geburt" sense="2" />
<synset pos="n" word="Geburt" sense="3" />
</class>
A simple mapping ― Compiled
<class name="ecrm:E67_Birth">
<word lemma="gebären" pos="v"/>
<word lemma="entbinden" pos="v"/>
<word lemma="laichen" pos="v"/>
...
<word lemma="Geburt" pos="n"/>
<word lemma="Drillingsgeburt" pos="n"/>
<word lemma="Brut" pos="n"/>
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 14
A simple mapping ― Compiled
<class name="ecrm:E67_Birth">
<word lemma="gebären" pos="v"/>
<word lemma="entbinden" pos="v"/>
<word lemma="laichen" pos="v"/>
...
<word lemma="Geburt" pos="n"/>
<word lemma="Drillingsgeburt" pos="n"/>
<word lemma="Brut" pos="n"/>
...
</class>
GermaNet ― Example
These are not E67 Births These are not
E67 Births These do not
support
E67 Births
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 16
A simple mapping ― Problem
●
Meanings of GermaNet synsets don't match CRM events exactly
●
A synset supports an event class, but some or even all of its hyponymic synsets may not
●
Scope of CRM event is not reflected in language
●
E67 Birth / E69 Death only applies to humans
●
Personalized „actors“ / figurative meanings
●
E12 Production, E65 Creation, etc. distinguish
material ↔ immaterial
Fine-grained mapping
●
Sort out obvious false positive synsets/words
●
Exclude a synset and its descendants or
●
Exclude all descendants of a synset
●
Drawbacks
●
Creation and maintenance more complex and time-
consuming
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 18
Fine-grained mapping
<class name="ecrm:E67_Birth">
<synset pos="n" word="Geburt" sense="1">
<exclude_synset word="Brut" sense="1" />
...
</synset>
<class name="ecrm:E66_Formation">
<synset pos="n" word="Heirat" sense="1" descend="false" />
<synset pos="n" word="Liebesheirat" sense="1" />
...
Mapping statistics
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 20
Event detection evaluation
●
Lexicon-based
●
Preprocessor:
●
Detect separable verb particles
●
Postprocessor:
●
Detect light verbs
●
No word sense disambiguation (currently)
Event detection evaluation
●
Hand-made corpus
●
50 short texts from art history
●
3000 tokens, 500 event class annotations
●
Observations
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 22
Difficulties with modelling event mentions
●
Context influences how to model an event mention in text
●
Which CRM class
●
How many instances
Modelling event mentions:
Co-triggered events
●
Words primarily denoting objects may also support events
Gemälde (painting) → E12 Production
Geschenk (present / gift) → E8 Acquisition, E10 Transfer of Custody
●
Influences interpretation:
John's painting vs. John's house
●
Lexicalize to what extend?
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 24
Modelling event mentions:
CRM event or Symbolic Object
●
An event in the text may not necessarily be modelled as CRM event
●
Past events → E5 Event and subclasses
●
Future/hypothetical events
→ E55 Type or E29 Design or Procedure
Ein Engel kam zu Hilfe und führte den Sieg herbei.
An angel came to their aid and led them to victory.
Modelling event mentions:
Reclassification of events
●
Context may reclassify an CRM event supported by a word
●
Negation, interruption, alteration, lack of knowledge
●
More general CRM class
●
Document foreseen purpose as E55 Type
Der römische Feldherr Dentatus weist die Geschenke [...] zurück.
Der Beginn mit der Anfertigung [...] erfolgte [...] nach der Anmeldung.
Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 26
Modelling event mentions:
Instance(s) or class
●
How many event instances are referred to?
●
Individual
●
Collection → E5 or subclass
●
Class → E29 or E55
●