Local n-grams for Author Identification Notebook for PAN at CLEF 2013
Texte intégral
Documents relatifs
We propose as future work the following ideas: (1) add new heuristics to handle bad constructed sentences in tweets instead of ignore them, (2) combine the proposed fea- tures
We have adopted a machine learning ap- proach based on several representations of the texts and on optimized decision trees which have as entry various attributes and which are
1) Traditional Authorship Attribution: given unknown documents and sets of known documents from different authors, the task was,.. a) to denote an author for each document
In word processing, we consider words average length, unique word number average in a sentence and word number average in a sentence as a feature (in this section, we process
Our main ap- proach uses linear programming to find a linear combination of distances, however we also present prelimilary results with a support vector regression and neural
The Author Identification task was performed using a combination of Vector Space Model [1] (VSM) and Similarity Overlap Metric [3] (SOM) on the character n-grams extracted from
Abstract Our work on author identification and author profiling is based on the question: Can the number and the types of grammatical errors serve as indica- tors for a specific
The most common framework for testing candidate algorithms is a closed-set text classification problem: given known sample documents from a small, finite set of candidate