• Aucun résultat trouvé

Detailed Contents

N/A
N/A
Protected

Academic year: 2021

Partager "Detailed Contents"

Copied!
5
0
0

Texte intégral

(1)

Contents . . . i

Acknowledgements . . . iii

List of Figures . . . v

List of Tables . . . vii

List of Abbreviations . . . ix Introduction. . . 1 1 Motivation . . . 3 2 Objectives . . . 7 2.1 Research questions . . . 8 2.2 Method . . . 10 2.3 Use case . . . 12 3 Structure . . . 15 I Information Extraction . . . 19 1 Background . . . 21

1.1 Natural language processing . . . 21

1.2 Information extraction . . . 24 1.3 Related fields . . . 26 2 Named-Entity Recognition . . . 29 2.1 Task definition . . . 29 2.2 Typologies . . . 31 2.2.1 Named-entity typology . . . 31

2.2.2 Types of NER systems . . . 34

2.3 Entity ambiguity and disambiguation . . . 35

2.3.1 Synonymy . . . 36

2.3.2 Homonymy and polysemy . . . 37

2.3.3 Metonymy . . . 38

2.3.4 Disambiguation . . . 38

3 Relations and Events . . . 40

(2)

3.1.1 Typology of relations . . . 42

3.1.2 Relation detection systems . . . 43

3.2 Event extraction and temporal analysis . . . 44

3.2.1 Event annotation . . . 46

3.2.2 Event extraction systems . . . 46

3.3 Template filling . . . 48

II Semantic Enrichment with Linked Data . . . 51

1 Making Sense of the Web . . . 53

1.1 The original vision . . . 54

1.2 Data structure and interoperability . . . 56

1.2.1 XML and RDF . . . 56

1.2.2 SPARQL . . . 59

1.2.3 SKOS . . . 61

1.3 From the Semantic Web to Linked Data . . . 61

2 Semantic Resources . . . 64 2.1 Knowledge bases . . . 64 2.1.1 DBpedia . . . 65 2.1.2 YAGO . . . 66 2.1.3 Freebase . . . 66 2.1.4 Wikidata . . . 67 2.1.5 ConceptNet . . . 67 2.2 Ontologies . . . 70

2.2.1 Web Ontology Language . . . 71

2.2.2 Limits of ontologies . . . 72

2.3 Identifiers . . . 73

2.3.1 Uniform resource identifiers . . . 73

2.3.2 Identifiers and locators . . . 75

3 Enriching Content . . . 76

3.1 Terminology . . . 77

3.1.1 Terms and concepts . . . 78

3.1.2 Terms and entities . . . 80

3.2 Entity linking . . . 81

3.2.1 Wikification . . . 82

3.2.2 Semantic Annotation . . . 83

3.2.3 Knowledge Base Population . . . 84

3.3 Semantic relatedness . . . 85

III The Humanities and Empirical Content . . . 89

(3)

1.1 Deterministic and empirical data . . . 92

1.2 Crossover application domains . . . 93

1.3 Specificities of the humanities . . . 95

2 Digital Humanities . . . 96

2.1 Context . . . 96

2.1.1 From humanities computing to digital human-ities . . . 97

2.1.2 The era of digitisation . . . 97

2.1.3 Information extraction for cultural heritage . . 98

2.2 Close and distant reading . . . 100

2.2.1 Close reading and New Criticism . . . 100

2.2.2 Distant reading or the end of theory . . . 101

2.2.3 Reconciling the two approaches . . . 102

2.3 Critiques . . . 103

2.3.1 Over-interpretation . . . 103

2.3.2 The Hype cycle . . . 104

2.3.3 Picking the low-hanging fruit . . . 106

3 Historische Kranten . . . 107

3.1 Structure . . . 109

3.2 Linguistic distribution . . . 112

3.2.1 Hard-coded language tag . . . 112

3.2.2 Periodical titles . . . 113

3.2.3 Language detection . . . 114

3.3 People and needs . . . 118

3.3.1 Stakeholders . . . 118

3.3.2 Field survey . . . 119

3.3.3 Specifications . . . 121

IV Quality, Language, and Time. . . 123

1 Data Quality . . . 125

1.1 Fitness for use . . . 126

1.2 Optical character recognition . . . 128

1.3 Linked Open Data . . . 131

1.3.1 owl:sameAs and identity . . . 132

1.3.2 Quality of DBpedia . . . 134

2 Multilingualism . . . 137

2.1 Language-independent information extraction . . . 138

2.1.1 Multilingual NER . . . 140

2.1.2 Other cross-lingual applications . . . 143

(4)

2.3 Multilingual corpora . . . 146

3 Language Evolution . . . 147

3.1 The generative lexicon . . . 147

3.2 Stratified timescales . . . 148

3.2.1 Application to empirical databases . . . 149

3.2.2 Application to language evolution . . . 150

3.3 Concept drift . . . 151

3.3.1 Application to place names . . . 153

3.3.2 Emergence and salience of concepts . . . 154

V Knowledge Discovery. . . 157

1 MERCKX: A Knowledge Extractor . . . 159

1.1 Similar tools . . . 160 1.1.1 DBpedia Spotlight . . . 161 1.1.2 OpenCalais . . . 162 1.1.3 AlchemyAPI . . . 163 1.1.4 Stanford NER . . . 164 1.1.5 AIDA . . . 164 1.1.6 Zemanta . . . 165 1.1.7 Babelfy . . . 165 1.2 Components . . . 166 1.2.1 Python and NLTK . . . 167 1.2.2 X-Link . . . 168 1.2.3 DBpedia dump . . . 169 1.3 Workflow . . . 171 1.3.1 Download . . . 171 1.3.2 Dictionary . . . 172

1.3.3 Tokenisation, spotting, and annotation . . . 174

2 Evaluation . . . 175 2.1 Preliminary assessment . . . 176 2.1.1 Linguistic coverage . . . 176 2.1.2 SQuaRE analysis . . . 177 2.2 Methodology . . . 179 2.2.1 Objective . . . 179 2.2.2 Metrics . . . 180 2.2.3 Corpus . . . 182

2.3 Results and discussion . . . 185

2.3.1 Quantitative analysis . . . 186

2.3.2 Qualitative analysis . . . 187

(5)

3 Validation . . . 190

3.1 Beyond search engines . . . 190

3.2 Applications . . . 192

3.2.1 Search suggestions . . . 193

3.2.2 Related resources and data visualisation . . . . 195

3.3 Generalisation . . . 197 3.3.1 Other languages . . . 198 3.3.2 Other domains . . . 200 3.3.3 Other entities . . . 203 Conclusions . . . 207 1 Overview . . . 209 2 Outcomes . . . 213 2.1 Main findings . . . 213 2.2 Limitations . . . 215 2.3 Operational recommendations . . . 216 3 Perspectives . . . 218 3.1 Implementation . . . 218 3.2 Extrinsic evaluation . . . 219 3.3 Other applications . . . 220 A Source Code . . . 223

B Guidelines for Annotators . . . 225

C Follow-up. . . 226

Detailed Contents . . . 227

Références

Documents relatifs

Identifies proper nouns, common nouns, plural nouns, adjectives, prepositions…..

MANUSCRIPT III: Is it possible to differentiate eastern and western European populations of the spruce bark beetle, Ips typographus (Coleoptera: Scolytinae) based on

ressources rares  sont représentées par  des  carrés de couleur  présents  sur  la  même  grille.  Malgré  leur  simplicité,  les  agents  sont  ici 

The second part explains how the criticism to the dominant androcentric paradigm in the Humanities and Social Sciences, together with the proposal of Cybernetics to address

transparent, including extensive in-kind support from the Magic Evidence Ecosystem Foundation (MAGIC) and their partner, the BMJ, to develop and disseminate this living guideline

Organised by: Mareike König, Suzanne Dumouchel, Lisa Bolz (all DHIP) Claudine Moulin (Institut d’Études Avancées, IEA Paris/ University Trier), Pierre Mounier (Open-Edition),

For improving the results further as shown in row 3 of Table 4, we used three splitting based features. i) Split the OCR words using commonly used Sandhi rules and used the no.

Les ensembles des points efficients approch´ es ´ etudi´ es dans le premier chapitre ont per- mis d’introduire un nouveau type de “domination” et les r´ esultats assurent