Haut PDF The Image Shape Spectrum for Image Retrieval

The Image Shape Spectrum for Image Retrieval

The Image Shape Spectrum for Image Retrieval

Unit´e de recherche INRIA Lorraine, Technopˆole de Nancy-Brabois, Campus scientifique, ` NANCY 615 rue du Jardin Botanique, BP 101, 54600 VILLERS LES Unit´e de recherche INRIA Rennes, Ir[r]

26 En savoir plus

Learning the Shape of Image Moments for Optimal 3D Structure Estimation

Learning the Shape of Image Moments for Optimal 3D Structure Estimation

Despite this state-of-the-art, it is however worth noting that the selection of a good set of image moments for 6-dof visual control or SfM is still an open problem. Ideally, one would like to find a unique set of visual features resulting in the ‘most linear’ control problem with the largest convergence domain, or in maximum observability (i.e., information gain) for a given camera displacement in case of SfM tasks. However, to the best of our knowledge, only local, partial (e.g., depending of the particular shape of the object) or heuristic results are currently available. For instance, [2], [6], [7] propose different combinations of image moments able to only guarantee local 6-dof stability of the servoing loop around the desired pose, and with a basin of attraction to be heuristically determined case by case. As for what concerns the SfM case, the choice of which moments to exploit for allowing a converging estimation of the scene structure is also not straightforward. In [8], [11] the area a
En savoir plus

8 En savoir plus

Using Neighborhood Distributions of Wavelet Coefficients for On-the-Fly, Multiscale-Based Image Retrieval

Using Neighborhood Distributions of Wavelet Coefficients for On-the-Fly, Multiscale-Based Image Retrieval

where H is the differential entropy and H x is the cross en- tropy. The estimation of statistical measures in the multidimen- sional case is hard. In particular, kernel-based methods such as Parzen estimates become unadapted due to the sparsity of samples in high dimension (curse of dimensionality): the tradeoff between a kernel with a large bandwidth to perform well in low local sample density (which oversmoothes the estimator) and a kernel with a smaller bandwidth to preserve local statistical variabilities (which results in an unstable es- timator) cannot always be achieved. We use instead the kth nearest neighbor (kNN) framework [10] to compute the KL divergence. Indeed it follows the dual approach to the above fixed size kernel: the bandwidth adapts to the local sample density by letting the kernel contain exactly k neighbors of a given sample. Moreover it allows direct estimation of the divergence without explicitly estimating the pdf.
En savoir plus

5 En savoir plus

On the Choice of Similarity Measures for Image Retrieval by Example

On the Choice of Similarity Measures for Image Retrieval by Example

INRIA, Rocquencourt Domaine de Voluceau, BP-105 F-78153 Le Chesnay Cedex, France.. Jean-Philippe.Tarel@inria.fr Sabri.Boughorbel@inria.fr ABSTRACT.[r]

10 En savoir plus

Random Subwindows and Randomized Trees for Image Retrieval, Classification, and Annotation

Random Subwindows and Randomized Trees for Image Retrieval, Classification, and Annotation

Image Annotation B Goal Given a training set of images with pixel-wise labelling (ie. every pixel is labeled with one class among a finite set of predefined classes), the goal is to build a model that will be able to predict accurately the class of every pixel of any new, unseen image.

1 En savoir plus

Challenging deep image descriptors for retrieval in heterogeneous iconographic collections

Challenging deep image descriptors for retrieval in heterogeneous iconographic collections

By introducing the Alegoria dataset, we aim at proposing a com- plementary benchmark, designed for precise evaluation of robust- ness on a broader panel of appearance variations. These variations bring into play challenging conditions such as long-term acquisi- tions (multi-date contents) as well as multi-source contents (draw- ings, engraving, photographs, etc.) that are not widely represented in the other popular datasets and have the additional interest of bridging cultural heritage and geographical information domains. We also generalize the content to a larger panel of geographical landscapes, including urban contents and landmarks as well as more natural landscapes such as mountains and rivers. The cathedral Notre-Dame de Paris is a good example of this complementarity: this landmark can be found in both Alegoria and Paris datasets, the difference being in what is evaluated. On Paris dataset, we assess the absolute performance of the retrieval method, whereas on Ale- goria we can assess how the method reacts to different types of variations, including variations due to multi-date and multi-source contents.
En savoir plus

9 En savoir plus

Image Retrieval with Reciprocal and shared Nearest Neighbors

Image Retrieval with Reciprocal and shared Nearest Neighbors

see (J´egou et al., 2008). All the experiments presented below that use this bag-of-features representation of the Holidays image set are labeled “Holidays BOF”. Many systems have tried using a more sophisti- cated representation of the local features extracted from an image set. Typically, aggregating local fea- tures gives better result quality. We thus felt neces- sary to apply our approach to these sophisticated de- scription schemes and evaluate the gains. We there- fore computed VLAD features over the Holidays im- age set (J´egou et al., 2010), clustering them to 64 centroids. The experiments using the VLAD repre- sentation of Holidays are labeled “Holidays VLAD”. The other reason for using VLAD is to determine the ability of our technique to also improve quality when using lower dimensional features.
En savoir plus

12 En savoir plus

Review of Recent Deep Learning Based Methods for Image-Text Retrieval

Review of Recent Deep Learning Based Methods for Image-Text Retrieval

After that, Sarafianos et al. propose Text-Image Modal- ity Adversarial Matching (TIMAM [34]), which adopts an Adversarial Representation Learning (ARL) framework to learn modality-invariant representations for more effective image-text matching. In the ARL framework, a two-layer fully-connected network adversarial discriminator is op- timized in the common space. The better discriminator pain, the better cross-modal retrieval gain. TIMAM also adds Bidirectional Encoder Representations from Trans- formers (BERT [3]) in front of LSTM branch to optimize text features. At the same time, Liu et al. propose a new deep adversarial graph attention convolution network (A- GANet [23]). A-GANet extracts image features not only from the CNN branch but also from a graph attention convo- lution layer based on a visual scene graph. The visual scene graph carries information about object regions and relation- ships according to human visual perception characteristics. High-level structured semantic visual features are learned from this designed graph attention convolution layers. Par- ticular joint embedding layers connect the image and text features through the adversarial learning module. Further- more, Wang et al. [37] and Zhu et al. [46] use adversarial learning in food images and recipes matching.
En savoir plus

7 En savoir plus

Shape-based analysis on component-graphs for multivalued image processing

Shape-based analysis on component-graphs for multivalued image processing

The e fficiency of the component-tree first relies on its low computation cost. In this context, many efforts were conducted to build component-trees in quasi-linear time, in sequential [9,25] and distributed ways [26]. (The reader is refered to [27] for a re- cent survey.) The success of component-trees also relies on the development of e fficient algorithmic processes for node selection. To cope with filtering and segmentation is- sues, two main approaches were developed. The first consists of minimizing an energy globally defined over the tree nodes, leading to define an optimal cut [28], that can be interpreted as a segmentation of the underlying image. This approach is the basis for carrying interactive segmentation [29]. The second consists of determining locally the nodes that should be preserved or discarded, based on attribute values [30]. This approach is formalized as an anti-extensive filtering framework [9,14] – recalled in Section 4 – that constitutes the methodological basis of the present work.
En savoir plus

13 En savoir plus

Shape-based analysis on component-graphs for multivalued image processing

Shape-based analysis on component-graphs for multivalued image processing

The success of these methods relies on the develop- ment of efficient algorithmic processes for node selec- tion. Two main approaches were developed to cope with filtering and segmentation issues. The first approach consists of minimizing an energy globally defined over the tree nodes, leading to compute an optimal cut [37], interpreted as a segmentation of the image. This of- ten formulated as an optimization problem, where the space of solutions is composed of partitions from the hierarchy (in this context, a notion of braids of parti- tions [39] was introduced as a general framework for the optimization of segmentation based on hierarchical partitions). This is the basis for interactive segmenta- tion [38]. The second approach consists of determining locally the nodes that should be preserved or discarded, based on attribute values [40] stored at each node of the tree. The computed attributes are chosen according to the application context. This approach is formalized as an anti-extensive filtering framework [26, 32], recalled in Sec. 4, and constitutes the methodological basis of the present work. The subtree obtained by pruning the component-tree of the image, with respect to these at- tributes, can then be used to reconstruct a binary or grey-level result.
En savoir plus

21 En savoir plus

Graph laplacian for interactive image retrieval

Graph laplacian for interactive image retrieval

using a linear operator learned on both the labeled and the unlabeled sets. The main contributions of this work are: (i) In contrast to existing relevance feedback methods which only rely on the labeled set of images, our approach integrates the unlabeled data in the training process through the clus- ter assumption [8, 9] (As discussed in Section 3.1). These unlabeled data turn out to be very useful when only few labeled images are available since it allows us to favor deci- sion boundaries located in low density regions of the image database, which are very often encountered in practice. (ii) In the second main contribution of this work, we derive a new from of the graph Laplacian which makes it possible to embed the dataset in a robust way. This graph Laplacian, based on diffusion map, captures the conditional probabilities of transition from any sample to another with a path of a given length. Its particularity is to only consider the intermediate paths with high transition likelihoods (see Section 3.2).
En savoir plus

5 En savoir plus

An hybrid method for fine-grained content based image retrieval

An hybrid method for fine-grained content based image retrieval

descriptors presented in chapter 1 for example. No mutation was used in the GA used, however, as individuals generated after crossover did not match any image in the database, a matching technique is used so that individuals are always repre- senting images. After the crossover, the image with the representation vector the closest to the generated individual is selected to be the ospring instead of the gen- erated individual. The new individuals are then ranked to be presented to the user for evaluation before the next generation starts. The tness function is a weighted sum of the user's evaluations given at each generation and a similarity function on the image representation, both having the same weight. Giving the same weight those two components is questionable, a multi-objective algorithm could have been used, or more importance could have been given to the user's evaluation. Indeed, user's preference is what is the most important, even though at the beginning of the search, not much information about it is available. [ Lai & Chen 2011 ] presents the population to the user at each generation for feedbacks, and the process continues until the user is satised. The user's fatigue must be taken into account in such a system, with a population of 20 individuals and a 10 levels scale to give feedbacks, user shortly feels the fatigue and provide inaccurate feedbacks or simply stop using the system. Therefore ways to reduce user's involvement must be found. Despite the basic representation of images and user's fatigue, the results show that the use of the GA to explore the search space brought some benet over a simple nearest neighbors search performed from the image description. Experiments were done on a small dataset with only 1000 images, so the improvements are expected to be even better on bigger datasets, in which exploration is even more important. This encourages the use of GAs to explore the search space.
En savoir plus

174 En savoir plus

Statistical models for deformable templates in image and shape analysis

Statistical models for deformable templates in image and shape analysis

of this model detailed in Subsection 2.3. The requirements for convergence concern the Markov chain which has to be geometrically ergodic on any compact subset. Several choices may be done to satisfy this criterion [4, 2]. In [47], some particular representations of deformations and templates are introduced using a common finite element decomposition of the image domain. The resulting deformation and template fields have Markovian properties facilitating samplings and accelerating algorithm convergence. When the data are observed sequentially, one can use the online EM [16]. Using a sequential algorithm is useful for several reasons: storing the ob- servations throughout the process (which might be resource intensive in case of high dimensional data) is no longer required in an online setting, because each observation is processed only once. Then, when the E-step conditional expectation in intractable, as it is the case in the deformable template model, its approximation is much lighter than in a batch algo- rithm. Finally, in cases where the observed data evolves during the acqui- sition time, the sequential learning provides an evolution of the trend of the parameters throughout the iterations.
En savoir plus

36 En savoir plus

Distance-based shape statistics for image segmentation with priors

Distance-based shape statistics for image segmentation with priors

In the last part a definition of the mean shape of a sample set of shapes is given, as well as the one of characteristic deformations that convey the shape variability, and then this sha[r]

195 En savoir plus

An exploration of diversified user strategies for image retrieval with relevance feedback

An exploration of diversified user strategies for image retrieval with relevance feedback

4 Setting of the Study 4.1 ground truth Databases and Description of the Visual Content Ground truth image databases are used for evaluating the selection criteria and the user strategies described above; for every database, the ground truth consists in the definition of a set of crisp classes (mutually exclusive here), cov- ering the entire database. For a ground truth database a user can usually find many other classes overlapping those of the ground truth, so the evaluation of a retrieval algorithm on such a database cannot be considered exhaustive, even with respect to the content of that single database. To cover a wide range of contexts, it is very important to use several databases and to have com- plexity differences not only among the databases, but also among classes of each database.
En savoir plus

18 En savoir plus

A New Combination Method Based on Adaptive Genetic Algorithm for Medical Image Retrieval

A New Combination Method Based on Adaptive Genetic Algorithm for Medical Image Retrieval

6 Conclusion and Future Works The purpose of this paper is to better study the importance of combining two re- trieval methods; textual and conceptual method. To do this, we used the UMLS as a specific ontology for the medical domain, which allowed us to extract con- cepts representative of each document. Our approach begins with extracting concepts by MetaMap tool, after words, it uses the graph theory to retain only the relevant concepts, and later to refine the results obtained we used the genetic algorithm to combine the two types of indexing: textual and conceptual. Our re- sults showed that the combination method can on the one hand to improve the efficiency of our model and secondly to improve the semantic representation of the document.
En savoir plus

14 En savoir plus

Leveraging large scale Web data for image retrieval and user credibility estimation

Leveraging large scale Web data for image retrieval and user credibility estimation

3.5.1 Flickr group modeling Similar to the processing of the ImageNet images, we extract all of the three image features detailed in Subsection 3.2.2 for Flickr group images. To maintain comparability with ImageNet models, we use the same large class of ImageNet images as negatives (i.e. the negative class described in 3.4.2 ). Similarly, we train models using all three image features but we perform the CV analysis only with models built upon Overfeat features. While the meaning of each concept is known for ImageNet, in the case of Flickr groups, a first challenge is finding the proper textual description of the group that best describes its visual content. A first possible choice is the group’s title. After an investigation of a set of group titles, we noticed a high level of noise among titles (i.e. non alpha- numeric characters, different languages, subjective statements). There are also a lot of titles that have a narrative nature, making them impractical for a proper textual representation of the group. Another problem with choosing titles is the bias towards the initial choice of a group’s author. This is notably encountered among less carefully curated groups, where the content may evolve in another direction than the one initially intended by its creator. For all of these reasons, we chose a data driven approach that is based on the predominant tags associated to the images found in the groups. In this way, we also capture the collective social intention behind tagging for the set of user that provided contributions to the group. Text pre-processing consists in extracting the most salient tags of each group. Groups are structured thematically but a single tag might not be sufficient to describe them. Tags are ranked by the number of unique users which annotate images of a group with them. This measure is chosen instead of tag frequency, which is sensitive to bulk uploads, in order to maximize the social relevance of tags. In this way, we eliminate the possibility that a single user would have a high influence over the tags selected for describing the group. After an initial examination, we empirically retain the top three tags as a textual representation of groups and write this representation as FG t = {T 1 , T 2 , T 3 }.
En savoir plus

268 En savoir plus

An improved feature vector for content-based image retrieval in DCT domain

An improved feature vector for content-based image retrieval in DCT domain

2.1 AC-Pattern In DCT domain, energy is packed into a few coeffi- cients, and some coefficients represent directional in- formation. So the proposed approach selects 9 co- efficients out of all 15 AC coefficients in each block and uses their statistical information to construct the AC-Pattern. These 9 coefficients are categorized into 3 groups: horizontal (Group H), vertical (Group V) and diagonal (Group D). For each group, the sum of the coefficients is calculated firstly and then the squared-differences between each coefficient and the sum of this group are calculated. Finally, the sums of squared-differences of each group are used to con- struct AC-Pattern. The process of forming AC-Pattern is shown in Figure 1. Compared with the method of (Zhong and Def´ee, 2005), this way of construct- ing AC-Pattern reduces obviously the complexities of the feature vector. For example, the maximal dimen- sion of the AC-Pattern of (Zhong and Def´ee, 2005) is 15 and the one of ours is only 3. This number remains the same as the method of (Bai et al., 2012).
En savoir plus

5 En savoir plus

Delta-TSR: a description of spatial relationships between objects for image retrieval

Delta-TSR: a description of spatial relationships between objects for image retrieval

1.2 Contributions and outline of the paper The objective of this work is to propose an efficient and effective representation of the spatial layout of objects for image or sub-image retrieval in large collections of images, under the query-by-example paradigm. The proposed approach is called ∆-TSR and is implemented in this paper on objects which are local visual features based on salient points represented in a BoF model; it is presented in section 2. ∆-TSR uses the same idea of TSR, i.e. representing image layout by the triangular relationships between its objects. However, the adopted coding has a filtering capacity higher than TSR, leading to superior performances in terms of quality of retrieval as well as of online retrieval time. On several image data sets varying from 600 to 6000 images, we demonstrate its relevance both in terms of quality of the description (Section 3) and of retrieval time (Section 4), facing TSR and a classical bag-of-features representation of visual content. Finally, Section 5 concludes.
En savoir plus

20 En savoir plus

Result diversification in social image retrieval: a benchmarking framework

Result diversification in social image retrieval: a benchmarking framework

– CEA (re-ranking, social cues, informativeness — multimodal) [ 29 ]: focuses on the use of social cues (user and temporal information) in the retrieval process and on their com- bination with visual cues. The diversification relies on the use of an informativeness measure which accounts for the novelty brought by each candidate with respect to can- didates which were already selected. The simplest runs exploit the initial Flickr ranking and diversify images iteratively by selecting, in each round, images which are new using a social criterion that can be either the user ID or, in a more relaxed version, the user ID and the date of the image. The same algorithm used by the MUCKE [ 3 ] team is used to obtain an initial re-ranking of images, the only difference being that visual con- tent is described using HoG instead of GIST. In the diversification step, new images are selected by maximizing their visual distance to the images which were already selected; – UPMC (re-ranking, clustering — multimodal) [ 22 ]: uses re-ranking to improve rele- vance. To compare images, several similarity metrics are used, e.g., Euclidean distance for visual descriptors, Dirichlet Prior Smoothing and cosine for textual models, classical great-circle distance Haversine formula for the distance between two GPS coordinates. In addition, to better exploit geographical granularity between images, the Xilopix the- saurus 9 is used to convert image information into concepts (i.e., by matching the query
En savoir plus

30 En savoir plus

Show all 10000 documents...