• Aucun résultat trouvé

Semantic is Beautiful: Clustering and Diversifying Search Results with Graph-based Word Sense Induction

N/A
N/A
Protected

Academic year: 2022

Partager "Semantic is Beautiful: Clustering and Diversifying Search Results with Graph-based Word Sense Induction"

Copied!
1
0
0

Texte intégral

(1)

Semantic is beautiful: clustering and diversifying search results with graph-based

Word Sense Induction

Roberto Navigli

Department of Computer Science, Sapienza University of Rome [email protected]

Abstract: Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of pos- sibly labeled clusters. Each cluster is supposed to represent a different meaning of the input query, thus taking into account the language ambiguity issue. However, Web clustering methods typically rely on some notion of textual similarity of search results. As a result, text snippets with no word in common tend to be clus- tered separately, even if they share the same meaning.

In this talk, we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). Key to our approach is to first acquire the senses (i.e., meanings) of a query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on datasets of ambiguous queries, show that our approach outperforms both Web clustering and search engines in the clustering and diversification of search results.

Références

Documents relatifs

Collection: Collection used as background knowledge, # Docs: Number of documents retrieved to build the co-occurrence graph, Filter: Type of filter used to filter out words in text..

Finally, two distinct word senses could not theoretically lead to a single lin- guistic mental image of an action, since two different meanings are likely to identify two

We write the conditional posterior distribution over topics for global context word token i in in- stance d as Pr(t (i) g = j | d, t − i , s, · ), where t (i) g = j is the

Also, when semantic relations are involved allow- ing multiple alignments increases the overall results (which is in line with the relatively high number of 1:n alignments in the

In our approach we thus rely on a) Sense tagged corpora to obtain contextualized sense representations. The objec- tive of which is to capture sense relations and interactions

In this paper we introduce a new method for constructing vectors of ambiguous words occurrences for word sense induction based on the recently introduced model

In monolingual contextual methods of sense induction, information used for clustering comes from the context of the occurrences of a polysemous word and the resulting

The evaluation results show that PageRank-based sense induction achieves interesting results when compared to state-of-the-art content-based algorithms in the context of Web