• Aucun résultat trouvé

Babelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection?

N/A
N/A
Protected

Academic year: 2022

Partager "Babelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection?"

Copied!
1
0
0

Texte intégral

(1)

Babelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection?

Roberto Navigli Univeristy La Sapienza, Rome, Italy

[email protected]

In the first part of the talk, I will present BabelNet, a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the knowledge resource with lexical information for all languages. We present experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. In a second set of experiments, we show that, when provided with a vast amount of high-quality semantic relations, knowledge-rich word sense disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.

The second part of the talk is devoted to analyzing cases in which BabelNet can be of help in cross-language plagiarism detection. Can a large multilingual semantic network provide hints for detecting plagiarized text? We will see examples of how and when multilingual concepts and disambiguated text can support this task.

Références

Documents relatifs

Peut communiquer dans le cadre d’une tâche simple et courante ne demandant qu’un échange d’information simple et direct sur des sujets familiers relatifs au travail et aux

More particularly, the proposed system employs the Simplified Lesk algorithm for the WSD process, employs a dictionary in Arabic to select the best word senses of

We experimented with another chunking algorithm, which produces chunks on a sentence level (1 natural sentence = 1 chunk), but we drop this approach due to a below

– EsEn1stTopsBestSense500Docs: translated terms in topics (from Spanish to En- glish); both original and expanded terms of 100 concepts, using best sense for initializing the

• weather was correctly understood as the weather concept. Although the disambiguation was incorrect, WSD was as good as LEM because the "encoding" was the same in the

In this paper we deal with the problem of providing users with cross-language recommendations by comparing two different content- based techniques: the first one relies on

Accuracy of the LSA-based expansion PageRank model, as compared with theP P R standard and word oriented (w2w) versions of the personalized PageRank over the Evalita 2007

In comparison, the advantages of oWSD are: (1) it is simple as it just imports an ontology and obtains concept senses from Word- Net, (2) it is a running tool, (3) the WSD algorithm