Authorship Verification with Entity Coherence and Other Rich Linguistic Features Notebook for PAN at CLEF 2013
Texte intégral
Documents relatifs
These representative words are selected through the tool “SubDue” (des- cribed in Section 2.2) in order to construct a representation of the whole documents written by the given
The simplest approach we prepared is purely based on compression distances between the unknown document on the one hand and the reference documents and collected documents on the
Abstract In this work, we explore how well can we perform the PAN‘13 author- ship identification task by using correlation of frequent significant features found in documents written
Authorship verification problem is a type of authorship attribution problem, in which given a set of documents written by one author, and a sample document, one is to answer
Hence, if several statistical models had been created using documents that belong to different classes and cross-entropies are calculated for an unknown text on the basis of each
We used the following information retrieval tech- nique, suggested by Koppel and Winter (2013), to generate the impostors: We chose a few (3-4) seed documents and randomly chose
Having extracted an author profile P (A) for the unknown document and a profile P(B) for the set of known author documents, the distance between these profiles is calculated using
The frequencies of word class, positive word, negative words, pronoun, punctuation and foreign words have been normalized by dividing the total number of words present in