• Aucun résultat trouvé

Zero-shot information extraction to enhance a knowledge graph describing silk textiles

N/A
N/A
Protected

Academic year: 2022

Partager "Zero-shot information extraction to enhance a knowledge graph describing silk textiles"

Copied!
1
0
0

Texte intégral

(1)

Weaving our past into the future LaTeCH-CLfL 2021

@ EMNLP

Zero-Shot Information Extraction to Enhance a Knowledge Graph Describing Silk Textiles

Thomas Schleider and Raphael Troncy

[email protected] / [email protected]

A Knowledge Graph of European Silk Cultural Heritage

❏ We set up an ETL pipeline:

❏ that crawls records from museum websites and APIs (19 museums, 39274 unique objects related to silk)

❏ that uses the CIDOC-CRM ontology and an expert-made thesaurus describing the silk textile production

❏ that relies on manual mapping between the museum fields and the classes and properties of the ontology to build a global RDF graph

❏ that further enriches the graph using external sources (Wikidata, Geonames, Getty AAT, etc.)

❏ Research Questions:

❏ How do we extract information from rich textual descriptions to

complement the structured data (categorical values) available in museum records?

❏ Can we use structured data to predict metadata gaps?

Baselines: Classical Supervised Approaches with CNN

❏ Models respectively trained on textual descriptions or images plus the properties they need to learn to infer

Text: 4 TextCNN model with different weights per semantic property which is trained using the textual description of the object and

pre-trained aligned multilingual fastText embeddings.

Images: ResNet 152-based Multi-task CNN with 1 shared FC layer of 100 nodes and 5 output branches, 1 for each semantic property. It is trained on examples with pairs of image and semantic property.

❏ Evaluation through hold out of respective property that is getting predicted

Evaluation

Conclusion & Future Work

❏ Promising results for the ZSL method to enrich a domain specific knowledge graph

❏ It is possible to have more fine-grained class predictions without having the requirement of having training data

❏ Several limitations:

❏ Mapping between ConceptNet concepts with appropriate topic

neighbourhood and SILKNOW concepts not always easy / possible

❏ Supervised methods still perform generally better

❏ Hybrid approach through bootstrapping possible for future improvements

https://github.com/silknow/ZSL-KG-silk

References

Schleider et al. The SILKNOW Knowledge Graph. Semantic Web

Journal, Special Issue on Cultural Heritage and Semantic Web (submitted), 2021.

I. Harrando & R. Troncy. Explainable Zero-Shot Topic Extraction Using a Common-Sense Knowledge Graph. In 3rd International Conference on Language, Data and Knowledge (LDK), 2021.

See also https://zeste.tools.eurecom.fr/

Paper ID 44

Contribution: Zero-Shot Learning Text Classification

❏ Unsupervised approach based on the ConceptNet common-sense knowledge graph

❏ Required to establish mappings between the targeted classes and the concepts defined in ConceptNet

❏ For every of these concepts a “topic neighbourhood” is created with related candidate words

❏ A score for each label is computed based on the content of the input text

❏ The output is given together with a confidence score. The prediction (classification) is explainable as the topic neighbourhood and the most important words that led to the prediction can be displayed

Approach Language Class Precision Recall F1-score

ZSL ES Embroidery 89.40% 6% 12.3%

ZSL ES Satin 3.5% 61.5% 18.8%

ZSL EN Damask 52.0% 33.3% 40.6%

ZSL EN Embroidery 32.4% 86.7% 47.1%

ZSL EN Velvet 61.6% 50.0% 55.2%

ZSL EN Tabby 1.5% 50.0% 2.9%

ZSL EN Satin 77.4% 28.5% 41.6%

CNN - Image EN/ES/FR/IT Embroidery 83.9% 83.2% 83.6%

CNN - Image EN/ES/FR/IT Tabby 17.4% 30.8% 22.3%

CNN - Image EN/ES/FR/IT Velvet 70.1% 67.8% 68.9%

CNN - Text EN/ES/FR/IT Damask 98% 90% 94%

CNN - Text EN/ES/FR/IT Embroidery 96% 98% 97%

CNN - Text EN/ES/FR/IT Velvet 95% 84% 89%

Excerpt of results for the

technique

property across approaches

Confusion matrix for the English results of ZSL for

technique

Left side shows true labels, bottom the predictions. A brighter colour signalises higher agreement.“Embroidery” and “velvet” are correctly

predicted the most.“Tabby”, however, is almost never predicted, not even mistakenly so the whole row is completely dark.

Property KG Concept ConceptNet Material Vegetal Fibre /c/es/vegetal

Gold Thread /c/es/oro, /c/en/gold

Technique

Embroidery /c/es/bordado,/c/en/embroidery Velvet /c/es/terciopelo,

/c/en/velvet Voided Velvet /c/en/velvet Tabby (silk

weave) /c/en/tabby Louisine /c/es/tafetán

Satin (Fabric) /c/es/raso, /c/en/satin Brocaded /c/en/brocaded

Depiction

Vegetal Motif /c/es/planta Floral Motif /c/es/flor Fleur-de-lis /c/es/flor

Rhombus /c/es/geometría

Record 37.80.1 from the MET museum represented in the knowledge graph. The technique used to produce this textile is unknown

Références

Documents relatifs

“Solution”: agree on whole computation circuit → encode as SAT problem → transform into 3-coloring problem → include NIZK proof of that 3-coloring problem with the

Multilingual aspect: Defining the labels in the ontology in different languages allows to visualize the graph in a multilingual form as shown in Figure 1 which describes the

The DIG knowledge graph allows law enforcement agencies to conduct the needed research to understand the full extent of the operations of a organization and to assemble a case

Con- sequently, a good way to improve Open Information Extraction is to handle parsing errors before the extraction stage itself.. To achieve this goal, we have decided to

In this paper, we address this very task by providing a holistic approach and architecture to populate a commercial Knowledge Graph based on heterogeneous data sources.. The

Graph-KD provides methods to understand relation- ships between concepts using open discovery, closed discovery and knowledge inference.. The purpose of the tool is the support

Direc- tional information was never used when the same protein was both subject and object of the triple (overlap sce- nario). 1) Undirected: triples forming direct

Tattoos carry an enormous amount of information that uniquely character- izes their owners and the tattoo artist who made it. However, this information is inaccessible. To this