• Aucun résultat trouvé

Topic Modeling without Generative Probabilistic Model: An Approach and its Validation

N/A
N/A
Protected

Academic year: 2022

Partager "Topic Modeling without Generative Probabilistic Model: An Approach and its Validation"

Copied!
1
0
0

Texte intégral

(1)

Topic Modeling without Generative Probabilistic Model: An Approach and its Validation

?

Mikhail G. Kreines, Elena M. Kreines

BaseTech LLC, Raketny boulevard, 11-7-164, Moscow, 129366, Russia [email protected]

Abstract. We propose a novel approach to compute a model of seman- tic structure of natural language texts corpora based on the models of the texts. This approach differs from LSI and standard ways of topic modeling (D. Blei, 2012). We do not use matrix factorization and prob- abilistic distributions of the words given a priory. As the base for the corpora model construction we use the set of the original vector models of the corpora texts. The ways of empirical validation and applications of the models are considered.

Keywords: natural languages, texts, text collections, models, Informa- tion Retrieval

?The work is partially financially supported by the Ministry of Education and Science, Russian Federation (Contract N. 14.579.21.0090 from 27.11.2014, project identifier RFMEFI57914X0090).

Références

Documents relatifs

In this work, since the primary goal of learning the models is to verify properties over the systems, we evaluate the learning algorithms by checking whether we can reliably

The linguistic component is a revised SKIP - GRAM model in which the text embeddings work in concert with the latent visual factors to explain the occurrence of word- context pairs in

Either these prac- tices are gradually recognized, named and defi ned so as to be integrated into the strategic text, or they remain invisible or exterior (perhaps because they

In the current study, the functionality of a Support Vector Machine (SVM) to detect plagiarism in Persian texts through extrinsic method has been studied.. The

The architecture of this tool is composed of four steps: i) text normalization where the retweet symbol, special characters (such as emojis) and additional white spaces are re-

ever, previous attempts to model automatically their meaning have been based on the analysis of isolated sentences and do not compute the exact scope of conditional sequences

They deal with the different kinds of definitions which the mathematicians talked about; the features of their defining processes (a focus is made on the interplay between

Unruly practice : narratives Consuming strategic texts (practical coping : resistance, bricolage, emplotment) / Producing everyday narratives (games, recipes, tales) Following