Information retrieval Query analysis

Top PDF Information retrieval Query analysis:

Query expansion in information retrieval :   What can we learn from a deep analysis of queries ?

Query expansion in information retrieval : What can we learn from a deep analysis of queries ?

Information retrieval aims at retrieving relevant documents answering a user’s need expressed through a query. Users’ queries are generally less than 3 words which make a correct answer really difficult. Automatic query expansion (QE) improves the precision on average even if it can decrease the results for some queries. We propose a new automatic QE method that estimates the importance of expansion candidate terms by the strength of their relation to the query terms. The method combines local analysis and global analysis of texts. We evaluate the method using international bench- mark collections and measures. We found comparable results on average compared to the Bo2 method. However, we show that a deep analysis of ini- tial and expanded queries brings interesting insights that could help future research in the domain.
En savoir plus

12 En savoir plus

Non-parametric motion activity analysis for statistical retrieval with partial query

Non-parametric motion activity analysis for statistical retrieval with partial query

As stressed previously, our approach relies on a statistical modeling of the distribution of the local motion-related measurements. It can be regarded as an extension of texture modeling for grey level im- ages. In our approach, local motion quantities play a role similar to grey levels for texture analysis not with- standing the continuous nature of the exploited mo- tion information. Since the proposed Gibbsian mod- eling framework is based on cooccurrence measure- ments, it in fact requires to use motion-related obser- vations defined over a finite set. Besides, to ensure feasible comparison of motion content between differ- ent videos, we introduce in practice a quantization of the continuous motion measurements within a prede- fined bounded interval. Another reason to fix a limit beyond which these local motion measurements are no more regarded as usable is due to the fact that gradient- based motion measurements known to be valid for rather small motion magnitude. Typically, sampling within [0 , 4] on 16 levels proves accurate enough in our previous work [12, 15]. We had also investigated logarithmic quantization but it did not prove relevant in our experiments. Let Λ be the discretized range of values for {v obs (p)}.
En savoir plus

15 En savoir plus

Statistical Analysis to Establish the Importance of Information Retrieval Parameters

Statistical Analysis to Establish the Importance of Information Retrieval Parameters

of WordNet senses for query terms [Mothe and Tanguy, 2005]), term discrimina- tion (IDF [Jones, 1972]), document list homogeneity (NQC [Shtok et al., 2009]), and result lists divergence (QF [Zhou and Croft, 2007]). Clustering queries. In this paper, we use query difficulty predictors as features for query clustering. We then learn a model to associate a query clus- ter to the relevant system; in this way any new query that can be associ- ated with an existing cluster can be processed by the corresponding system. In [He and Ounis, 2003] query features were also used to cluster TREC top- ics. The best term-weighting schema in terms of precision/recall measures was then associated with each topic cluster. Different term weighting schemas were applied depending on query characteristics. On TREC Robust benchmark, this method improved results on poorly performing queries. Kurland [Kurland, 2009] considered query-specific clusters for re-ranking the retrieved documents. Clus- tering queries has also been studied in the web context, based on query logs [Wen et al., 2002]. Bigot et al. [Bigot et al., 2011] showed how to group queries by their difficulty, using TREC submission results for training. A new query for the system, according to its type, was then assigned to the right search engine. Mothe and Tanguy [Mothe and Tanguy, 2007] also showed that queries can be clustered according to their linguistic features and that these features are corre- lated with system performances. In addition, combining various predictors for a better correlation with query performance [Kurland et al., 2012], [Chifu, 2013] also suggested that query difficulty predictors values could represent features to differentiate query types.
En savoir plus

24 En savoir plus

How ontology based information retrieval systems may benefit from lexical text analysis

How ontology based information retrieval systems may benefit from lexical text analysis

The idea is to use each approach where it excels rather than to somehow average their points of view at each step of the search process. We thus propose to first use a conceptual model for document retrieval. The relevance of documents w.r.t. a query is then computed using both semantic similarity based on the conceptual model and users’ preferences through a weight distribution over query concepts. Secondly, an explanation step, based on an original vizualisation system, helps users to gain insights into the results and facilitates interaction with the search engine for query reformulation. In addition to this relevance map, the user may require a more precise analysis of the document relevances. Each relevant document is thus segmented to highlight all the text portions related to the query concepts. The text portions do not necessarily contain any query concept labels but rather terms that have been related to the concept lexicons in the segmentation step. Users can have access to a more detailed analysis of relevance at a glance, while also identifying new relevant concepts to be able to more precisely express their information needs and then reformulate their queries.
En savoir plus

23 En savoir plus

Performance Analysis of Information Retrieval Systems

Performance Analysis of Information Retrieval Systems

The first statistical approach we used to model the data was Classification And Regression Trees (CART) [21]. An advantage of this method is that, besides predicting results, we can see which are the best discriminating variables by plotting the trees. Here, we are not interested in knowing exactly what the MAP value will be for one combi- nation of parameters, we would rather like to know how good this value will be. Therefore, we need to define what a good or bad result is. In order to do so, we looked at the first and third quartiles of the MAP for each query group and decided that the results would be good when they were greater than the third quartile, bad when they were lower than the first quartile and average otherwise. Trees were computed for all systems and all queries of each complexity level. Pruning was performed by setting a threshold on a complexity parameter in order to provide more interpretable trees.
En savoir plus

13 En savoir plus

Infrastructure and Algorithms for Information Retrieval Based On Social Network Analysis/Mining

Infrastructure and Algorithms for Information Retrieval Based On Social Network Analysis/Mining

There are several social content search engines, which index real-time content spreading systems. This includes TwitterSearch 18 , Social Bing 19 , collecta [JLW + 11], One- Riot [One09] 20 , etc. Social content search systems deal with a different kind of content than classic search engines. Indeed, posts and statuses published on social media are often short, frequent, and do not change after being published, while web pages are rich, generated more slowly, and evolve after creation [TRM11]. Dealing with such content is challenging, because it requires real-time and recency sensitive queries pro- cessing. Sensitive query refers to a query where the user expects documents, which are both topically relevant as well as fresh [DZK + 10]. A study has been performed by Teevan et al. [TRM11] that give an overview of “What is the motivation behind a user to use a social content search system rather than a classic search engine?”. This study reveals that social content search systems are interrogated with queries, which are shorter, more popular, and less likely to evolve as part of a session than Web queries. The main goal is to find temporally relevant information (e.g. breaking news, real- time content, and popular trends) and information related to people (e.g. content directed at the searcher, information about people of interest, and general sentiment and opinion).
En savoir plus

168 En savoir plus

Annotation of Scientific Summaries for Information Retrieval.

Annotation of Scientific Summaries for Information Retrieval.

Keywords. Corpus annotation, discourse structure analysis, automatic summarization, document ranking, term weighting. 1. Introduction The question of assisting information seekers in locating a specific category (facet) of information has rarely been addressed in the IR community due to the inherent difficulty of such a task. Indeed, efficiency and effectiveness have been the main guiding principles in building IR models and tools. Our aim here is to delve into the problem of how to assist a researcher or a specialist in rapidly accessing a specific category or class of information in scientific texts. For this, we need annotated corpora where relevant sentences are marked up with the type of information they are purportedly carrying. We identified eight categories of information in abstracts which can be useful in the framework of information-category driven IR: OBJECTIVE, RESULT, NEWTHING, HYPOTHESIS, FINDINGS, RELATED WORK, CONCLUSION, FUTUREWORK. These categories enable the user to identify what a paper is all about and what the contribution of the author is to his/her field. We adopted a surface linguistic analysis using lexico-syntactic patterns that are generic to a given language and rely on surface cues to perform sentence annotation from scientific abstracts. Once annotated, the corpus is fed into an automatic summarizer which takes into account the different semantic annotations for query-oriented document ranking and automatic summarization. The automatic summarizer used here is Enertex developed by LIA team at the University of Avignon (Fernández et al, 2007a). Enertex is based on neural networks (NN), inspired by statistical physics, to study fundamental problems in Natural Language Processing, like automatic summarization and topic segmentation.
En savoir plus

15 En savoir plus

Browsing Information Retrieval System Results

Browsing Information Retrieval System Results

vides more than 100 measures for any run using an ad hoc collection. The analysis of the evaluation resulting files can be cumbersome because it consists in one value per query and per effectiveness measure. For this reason, it is a common practice to consider average effectiveness where values are averaged over queries ; hiding some potentially very interesting details and query-related re- sults (Berto et al., 2013). In the research work we report in this paper, our goal was to propose an interface which could help researchers browsing through the results of various information retrieval systems when using a reference collection to compare them in various ways, both aggregated and detailed.
En savoir plus

9 En savoir plus

Health consumer-oriented information retrieval

Health consumer-oriented information retrieval

external knowledge have yielded disappointing results. Indeed, the global benefits of the three query expansion strategies are limited, even though it appears as very interesting for particular queries. These mixed results are similar to the existing studies on query expansion for general language [17]. We plan to study further how to exploit the biomedical terminologies in IR tasks. A detailed analysis of the results may lead to better ways to choose which terms to consider in the queries, and which synonyms of these terms to add to the query. The incorporation of the terminological knowledge during the indexing step is also a promising avenue but raises computational issues.
En savoir plus

6 En savoir plus

Browsing Information Retrieval System Results

Browsing Information Retrieval System Results

The analysis of the evaluation resulting files can be cumbersome because it consists in one value per query and per effectiveness measure. For this reason, it is a common practice to consider average effectiveness where values are averaged over queries ; hiding some potentially very interesting details and query-related re- sults (Berto et al., 2013). In the research work we report in this paper, our goal was to propose an interface which could help researchers browsing through the results of various information retrieval systems when using a reference collection to compare them in various ways, both aggregated and detailed.
En savoir plus

10 En savoir plus

On the collaboration support in Information Retrieval

On the collaboration support in Information Retrieval

The model is evaluated through a collaboration simulation-based framework as done in [25]. The model is pointed out as effective for both relative and absolute expertise within the group, in other words, when both (a) users have different levels of expertise, without being necessarily identified as expert or novice and (b) users are clearly labeled as expert and novice. A deeper analysis at the role level outlines that the model is able to improve the search experi- ence of novice by displaying document lists more specific over the search session. In another model relying on a horizontal distinction within collaborators’ domain expertise levels, [119] propose to support collaboration between a group of domain experts aiming at solving a multi-faceted information need. This ap- proach allows to leverage users’ different knowledge expertise and assigns them implicit knowledge-based roles towards at least one query facet. These facets are modeled through document and user topical-based representations using the LDA generative model [8]. The proposed algorithm includes two main steps. The first one estimates the document relevance according to each expert with respect to his/her facet expertise and the shared information need. For this pur- pose, the authors combine (a) the document relevance probability p (π(u j )∣d i )
En savoir plus

50 En savoir plus

A Possibilistic Query Translation Approach for Cross-Language Information Retrieval

A Possibilistic Query Translation Approach for Cross-Language Information Retrieval

In fact, query translation approaches need training and matching models which compute the similarities (or the relevance) between words and their translations. Existing models for query translation in CLIR are based on poor, uncertain and imprecise data. While probabilistic models are unable to deal with such type of data, possibility theory applies naturally to this kind of problems [8]. Thus, we propose a possibilistic approach for query translation derived from a probabilistic one using a probability/possibility transformation [6]. This approach begins with a query analysis step, then a lexical analysis step, and finally the selection of the best translation using different similarity measures.
En savoir plus

11 En savoir plus

A Possibilistic Query Translation Approach for Cross-Language Information Retrieval

A Possibilistic Query Translation Approach for Cross-Language Information Retrieval

In fact, query translation approaches need training and matching models which compute the similarities (or the relevance) between words and their translations. Existing models for query translation in CLIR are based on poor, uncertain and imprecise data. While probabilistic models are unable to deal with such type of data, possibility theory applies naturally to this kind of problems [8]. Thus, we propose a possibilistic approach for query translation derived from a probabilistic one using a probability/possibility transformation [6]. This approach begins with a query analysis step, then a lexical analysis step, and finally the selection of the best translation using different similarity measures.
En savoir plus

12 En savoir plus

Bayesian models for visual information retrieval

Bayesian models for visual information retrieval

First, it is based on a universal recognition language (the language of probabilities) that provides a computational basis for the integration of information from mult[r]

211 En savoir plus

Private Information Retrieval from Transversal Designs

Private Information Retrieval from Transversal Designs

B. Motivations and results As pointed out by Yekhanin [20], “the overwhelming com- putational complexity of PIR schemes (...) currently presents the main bottleneck to their practical deployment”. Consider a public database which is frequently queried, e.g. a database storing stock exchange prices where private queries could be very relevant. Fast retrieval is crucial is this context. Hence, one cannot afford each run of the PIR protocol to be computationally inefficient, for instance Ω(k) if k is the size of the database. Therefore, a relevant goal is to build PIR protocols with sublinear computational complexity in the length of the database stored by each server.
En savoir plus

19 En savoir plus

3D shape retrieval using uncertain semantic query: a preliminary study

3D shape retrieval using uncertain semantic query: a preliminary study

Figure 1: Overview - we propose a process to go from a semantic request to a suitable 3D-shape that can be used in a retrieval machine in order to retrive 3D-shapes matching the semantic request. We present a way to generate suitable 3D-shapes and examples of machine filter. precision on his request by selecting matching shapes when a concept is unknown. The idea is to generate several instances of the classes, based on several un- certainties in the definition of the class (see section 4.1). This is the first contribution of this paper. Then, a filter is applied, in order to keep only instances of the desired concept. This is the second contribution. This filter takes two forms : the human one (manual selec- tion of the generated shapes that match the concept) and the learned machine one (the concept if learned from previously humbly selected shapes). The basic process is that the human filtering is required for any unknown concept and used to train the machine one. If a reliable filter has been trained for a concept, the human intervention is not required anymore.
En savoir plus

9 En savoir plus

Relating Dependent Terms in Information Retrieval

Relating Dependent Terms in Information Retrieval

Notice that the above assignment satisfies the conditions required for BPA of Dempster-Shafer theory. 4.3.3 Retrieval Model Given a query , it can also be segmented into a sequence of segments , … , . For example, the query 爱 滋 病 防 治 (prevention and treatment of AIDS) can be segmented into 爱滋病 (AIDS) and 防治 (prevention and treatment). The corresponding term sets are {爱滋病, 爱滋, 爱, 滋, 病} and {防治, 防, 治}. For a query, we assume a logical AND relation between different term sets. To simplify the notation, we will also represent the term sets of the query by ∗ , … , ∗ , and the term sets of terms , , … in a document by ∗ , ∗ , …. Then the correspondence between a document and this query can be determined by the following conditional probability using language modeling:
En savoir plus

162 En savoir plus

Passage Retrieval in Log Files: An Approach Based on Query Enrichment

Passage Retrieval in Log Files: An Approach Based on Query Enrichment

http://www.lirmm.fr/~{saneifar,laurent,poncelet,mroche} Abstract. The question answering systems are considered the next gen- eration of search engines. This paper focuses on the first step of this pro- cess, which is to search for relevant passages containing answers. Passage Retrieval, can be difficult because of the complexity of data, log files in our case. Our contribution is based on the enrichment of queries by us- ing a learning method and a novel term weighting function. This original term weighting function, used within the enrichment process, aims to assign a weight to terms according to their relatedness to the context of answers. Experiments conducted on real data show that our protocol of primitive query enrichment make it possible to retrieve relevant passages.
En savoir plus

13 En savoir plus

User-Driven System-Mediated Collaborative Information Retrieval

User-Driven System-Mediated Collaborative Information Retrieval

Two main approaches for role mining are proposed in the literature. The first one, rising from the social network perspective, relies on the analysis of the topology proper- ties of users’ network through information interaction [18, 12, 19, 7]. Nowicki and Snijders [18] presented a block- structure mining method based on pair-wise graph relations using Gibbs sampling. In [12], authors identified microblog networks leaders by applying PageRank algorithm on the followers network while Pal et al. [19] proposed to apply a probabilistic clustering method that uses a set of features from microblogging practices. Henderson et al. [7] provided an unsupervised learning approach for role mining relying on the transformation of feature-based multi-dimensional ma- trix where nodes represent users and edges represent infor- mation stream. The underlying objective is to select the better compression model that summarizes the users’ behav- ior leading to roles. Different from the first approach, the second one discovers user’s roles by looking at the similari- ties vs. dissimilarities between interaction data distributions among users [14]. Authors proposed, first, the LDA-based Author-Recipient-Topic (ART) model which is a global role mining approach applied in a communication social network for highlighting emerging roles through users’ behavior sim- ilarities. Second, they proposed a pair-wise focused version of the ART by introducing the Role-Author-Recipient-Topic (RART) model which emphasizes roles of the author and the recipient according to the message-contents, assuming that these roles could be fixed or could evolve throughout the message writing process.
En savoir plus

12 En savoir plus

Influence diagrams for contextual information retrieval

Influence diagrams for contextual information retrieval

Keywords: personalized information access, influence diagrams, user context 1 Introduction A key challenge in information retrieval is the use of contextual evidence within the ad-hoc retrieval.Several approaches explored in contextual retrieval area, techniques for building the user’s profile [3, 6, 2] and using it in the retrieval process [2, 7, 5, 8]. Most of these approaches employ implicit user feedback to model the related long-term interests as contexts represented by word vectors [2], classes of concepts [3] or a hierarchy of concepts [6]. Since the contexts are modeled, they are exploited in order to refine the query [2, 8], re-rank the documents [7] or model the whole retrieval process [5, 8]. The latest goal is precisely our own one. Our contribution is particularly based on the belief that contextual retrieval is a decision-making problem. For this reason we propose to apply influence diagrams witch are an extension of Bayesian networks to such problems, in order to solve the hard problem of user’s relevance estimation.
En savoir plus

5 En savoir plus

Show all 10000 documents...