• Aucun résultat trouvé

slides

N/A
N/A
Protected

Academic year: 2022

Partager "slides"

Copied!
27
0
0

Texte intégral

(1)

A Mapping of CIDOC CRM Events to German Wordnet for Event

Detection in Texts

Martin Scholz

Friedrich-Alexander-University Erlangen-Nürnberg Digital Humanities Research Group

(2)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 2

Outline

Motivation: information extraction from free text

GermaNet ― a German wordnet

Simple mapping CRM events → GermaNet

Fine-grained mapping

Statistics & evaluation

Difficulties with modelling event mentions

(3)

Semantic Annotation of Free Text

In CH documentation, information is often encoded as free text.

Poorly machine-processable

Full text search

Extract information as structured data

Entities, events and relations

Poor quality with automatic extraction

(4)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 4

Free text ― Example

Die Tafel zeigt die legendäre, drei Tage und zwei Nächte dauernde Schlacht.

Ein Engel kam zu Hilfe und führte den Sieg herbei.

Zum Dank stiftete Karl das Schottenkloster in Regensburg.

1514 entbrannte um dessen Fortbestand ein Streit zwischen Bischof, Kaiser und Stadtrat.

The panel shows the legendary, three days long battle.

An angel came to their aid and led them to victory.

In return, Karl founded the Schottenkloster in Regensburg.

In 1514, the quesion about its continued existence gave rise to a conflict

between the bishop, emperor and the city council.

(5)

Use case: WissKI

Semi-automatic information extraction

Not unsupervised

User annotates text

From annotations structured data is generated

Machine gives annotation proposals

Not only one best solution, but

Choice between possible/likely annotations

(6)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 6

WissKI ― Text annotation

(7)

Text analysis pipe

(8)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 8

Building a lexicon

Lexicon-based detection of events (in its core)

Lexicon of words that support an event class

Lexicon creation

Manual creation is time-consuming

No corpus to build from

Idea: use an intermediate lexicon for bootstrapping

(9)

GermaNet

A word net for German

Similar to Princeton Wordnet

Word (sense) ↔ Synset

Relations between synsets

Hyponymy, Meronymy, ...

~ 75k synsets, ~ 100k word senses

This work uses version 7 (latest version: 8)

(10)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 10

GermaNet ― Example

(11)

GermaNet ― Example

This supports

a E67 Birth

(12)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 12

A simple mapping

<class name="ecrm:E67_Birth">

<synset pos="v" word="gebären" sense="1" />

<synset pos="n" word="Geburt" sense="1" />

<synset pos="n" word="Geburt" sense="2" />

<synset pos="n" word="Geburt" sense="3" />

</class>

(13)

A simple mapping ― Compiled

<class name="ecrm:E67_Birth">

<word lemma="gebären" pos="v"/>

<word lemma="entbinden" pos="v"/>

<word lemma="laichen" pos="v"/>

...

<word lemma="Geburt" pos="n"/>

<word lemma="Drillingsgeburt" pos="n"/>

<word lemma="Brut" pos="n"/>

(14)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 14

A simple mapping ― Compiled

<class name="ecrm:E67_Birth">

<word lemma="gebären" pos="v"/>

<word lemma="entbinden" pos="v"/>

<word lemma="laichen" pos="v"/>

...

<word lemma="Geburt" pos="n"/>

<word lemma="Drillingsgeburt" pos="n"/>

<word lemma="Brut" pos="n"/>

...

</class>

(15)

GermaNet ― Example

These are not E67 Births These are not

E67 Births These do not

support

E67 Births

(16)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 16

A simple mapping ― Problem

Meanings of GermaNet synsets don't match CRM events exactly

A synset supports an event class, but some or even all of its hyponymic synsets may not

Scope of CRM event is not reflected in language

E67 Birth / E69 Death only applies to humans

Personalized „actors“ / figurative meanings

E12 Production, E65 Creation, etc. distinguish

material ↔ immaterial

(17)

Fine-grained mapping

Sort out obvious false positive synsets/words

Exclude a synset and its descendants or

Exclude all descendants of a synset

Drawbacks

Creation and maintenance more complex and time-

consuming

(18)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 18

Fine-grained mapping

<class name="ecrm:E67_Birth">

<synset pos="n" word="Geburt" sense="1">

<exclude_synset word="Brut" sense="1" />

...

</synset>

<class name="ecrm:E66_Formation">

<synset pos="n" word="Heirat" sense="1" descend="false" />

<synset pos="n" word="Liebesheirat" sense="1" />

...

(19)

Mapping statistics

(20)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 20

Event detection evaluation

Lexicon-based

Preprocessor:

Detect separable verb particles

Postprocessor:

Detect light verbs

No word sense disambiguation (currently)

(21)

Event detection evaluation

Hand-made corpus

50 short texts from art history

3000 tokens, 500 event class annotations

Observations

(22)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 22

Difficulties with modelling event mentions

Context influences how to model an event mention in text

Which CRM class

How many instances

(23)

Modelling event mentions:

Co-triggered events

Words primarily denoting objects may also support events

Gemälde (painting) → E12 Production

Geschenk (present / gift) → E8 Acquisition, E10 Transfer of Custody

Influences interpretation:

John's painting vs. John's house

Lexicalize to what extend?

(24)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 24

Modelling event mentions:

CRM event or Symbolic Object

An event in the text may not necessarily be modelled as CRM event

Past events → E5 Event and subclasses

Future/hypothetical events

→ E55 Type or E29 Design or Procedure

Ein Engel kam zu Hilfe und führte den Sieg herbei.

An angel came to their aid and led them to victory.

(25)

Modelling event mentions:

Reclassification of events

Context may reclassify an CRM event supported by a word

Negation, interruption, alteration, lack of knowledge

More general CRM class

Document foreseen purpose as E55 Type

Der römische Feldherr Dentatus weist die Geschenke [...] zurück.

Der Beginn mit der Anfertigung [...] erfolgte [...] nach der Anmeldung.

(26)

Martin Scholz, FAU Erlangen-Nürnberg, 26.09.2013 26

Modelling event mentions:

Instance(s) or class

How many event instances are referred to?

Individual

Collection → E5 or subclass

Class → E29 or E55

Can often only determined by event's arguments

Die Einzelteile der zerlegten Retabel gelangten in den Kunsthandel.

Durch den Besitz dieser Güter stellten wohlhabende Bürger [...] zur Schau.

The parts of the disassembled retable ended up in art trade.

By the possession of these goods wealthy people showed [...].

} border?

(27)

Thank you!

Mapping & Compiler:

http://wiss-ki.eu/node/167

wiss-ki.eu

Références

Documents relatifs

A central role plays a hand-crafted mapping of CIDOC CRM 1 events to GermaNet synsets to ease the process of creating a lexicon for automatic event detection.. We discuss two

Production is described using data coming from various sections of the ICCD schema, in which we find all the information to describe the creators and the techniques involved in

Generalization and infer- ring of metadata from related objects is achieved by using the propagation of some object properties along the transitive part-of and

The documentation activity is seen as being implicit in the historical reality referred to in the question: The police- and surveillance reports have been created dur- ing an event,

Recording the parameters used in the Complete Registration process, modeled as a D7 Digital Machine Event, concerns both the Pre-Registration Cleaning (E9 Formal Der- ivation)

As we will see in the next sections, in the archaeological domain the spatial location and/or extent of an object can be defined in different ways: sometimes it is possible to have

The domain of cultural mapping deals with capturing and representing knowledge regarding diverse tangible and intangible aspects of cultural heritage, which involve hybrid data

Over the last decades, the CIDOC Conceptual Reference Model has become the internationally recognized standard for modeling data from the field of cultural heritage..