Submitted runs to the ImageCLEF 2009

7.2 The IDOL2 collection

7.5.3 Submitted runs to the ImageCLEF 2009

Participating in this competition, we have built 3 graph models based on the previous validating process. We eliminated the HSV histogram model because of its poor performance on different lighting conditions. We used the same visual vocabulary of 500 visual concepts generated for night condition set. Each model provided a ranked result corresponding with the test sequence released. The post-processing steps were performed similar to the validating process employing the same configuration. The visual language models built for the competition are listed as follows:

• Me1: visual language model based on edge histogram extracted from 10x10 patches division

• Me2: visual language model based on edge histogram extracted from 5x5 patches division

• Ms: visual language model based on color SIFT local features

Based on the 3 visual models constructed, we have submitted 5 runs to the ImageCLEF evaluation:

• 01-LIG-Me1Me2Ms: linear fusion of the results coming from 3 models (Score = 328)

• 02-LIG-Me1Me2Ms-Rk15: re-ranking the result of 01-LIG-Me1Me2Ms with the regrouping of top 15 scores for each room (Score = 415)

• 03-LIG-Me1Me2Ms-Rk15-Fil003: if the result of the 1st and the 4th in the ranked list is too small (i.e. β <0.003), we remove image that from the list. We refrain the decision from some cases other than to mark them as unknown room (Score = 456.5)

• 04-LIG-Me1Me2Ms-Rk2-Diff20: re-ranking the result of 01-LIG-Me1Me2Ms with the regrouping of top 2 scores for each room and using smoothing window (20images/frame) to update the room-id from image sequences (Score = 706)

• 05-LIG-Me1Ms-Rk2-Diff20: same as 04-LIG-Me1Me2Ms-Rk2-Diff20 but with the fusion of 2 model Me1 and Ms (Score = 697)

Our best run 03-LIG-Me1Me2Ms-Rk15-Fil003 for the obligatory track is ranked at 10^th place among all the 21 runs submitted. The best run in the competition (score = 793 points) was obtained with an approach based on local

7.6. Summary 105

feature matching. Run 04-LIG-Me1Me2Ms-Rk2-Diff20 had not met the criteria of the optional task which only used the sequence before the classified image.

Nevertheless, this run has increased by roughly 250 points from the best obligatory run. It means that we still have room to improve the performance of our systems with the valid smoothing window.

7.6 Summary

To summarize, we have shown in this chapter the second application of the visual graph model, namely mobile robot self-localization. Coping with the specific condition of an indoor laboratory environment, we have implemented another instance of the proposed graph model. The proposed visual graph models have to adapt to the specific visual contents of the image collection, as well as adapt to the environment changes (such as lighting condition, object moving, human involving and the unknown room).

We have constructed different graph models based on patch concepts and SIFT concepts which represented the abstract form and the object details respectively.

A particular relation between the two concepts is also included to capture the co-occurrence information among the concepts. The results obtained shown that the integration of spatial relations into the visual graph model outperformed the standard language model and the SVM classification which based only on the visual concept.

We have also performed a validation process based on the validation sets to choose the best visual features adapting to the environment changes. Post-processing step of the ranked list was also studied. Finally, we provided the official results of our submitted run to the ImageCLEF 2009 forum.

In the next chapter, we will conclude our thesis and give some perspectives into the future works.

Part IV Conclusion

107

Chapter 8 Conclusions and Perspectives

We are not interested in the unusual, but in the usual seen unusually.

Beaumont Newhall

Content-Based Image Retrieval (CBIR) has been an open problem for the past two decades. Several attempts have been made to overcome the information gap between low-level visual features and the semantics layer of image. In [Marr 1982], Marr proposed a common paradigm for designing a visual recog-nition system which includes three sub-modules: image processing, mapping and high-level interpretation. Our works aimed at solving the two latter problems.

In this thesis, we have introduced a graph-based model for representing image content which added an intermediate layer to image representation. This graph captured the spatial relations among visual concepts associated with extracted regions of images. The graph matching process is based on the extension of unigram conceptual modeling, proposed initially in [Maisonnasse et al. 2008].

Theoretically, our model fits within the language modeling approach for infor-mation retrieval, and expands previous proposals for graph-based representation.

Even though we have chosen to illustrate the proposed approach with the scene recognition problems, this method is not fundamentally tied to a specific type of images. The designed framework can be extended for several types of image representations, as well as several applications in different fields, such as, image retrieval/annotation, object recognition, video classification/categorization, or medical imaging classification. This list, by all means, is not exhautive. As suggested by Nicolas Maillot, the combination with a reasoning layer or an ontology network [Maillot 2005] will equippe the graph model with the capacity of understanding the scenic contents. The system is then able to detect multiple object instances embeded in a particular scene, e.g, car, people, building, street ...

109

8.1 Summary

We summarize here some main points mentioned in this dissertation:

Part I introduced the current state-of-the-art in the Content-Based Image Retrieval field.

In chapter2, we gave a survey on different methods of image processing such as: image decomposition and visual features extraction. This is a basic step in representing of the image contents. Based in the extracted visual fetures, the bag-of-words model has been introduced. The bag-bag-of-words model often represents image content by a sparse vector of visual concepts. Images are matched based on the Euclidean distances or the cosine similarity of the quantized vectors. The bag-of-words model is simple but lacks the information on the spatial relations between visual concepts.

In chapter 3, we reviewed two principal branches of learning methods based on the conceptual representation: generative approaches and discriminative approaches. Important approaches, such as, Naive Bayes, Language Modeling, Support Vector Machines, have also been introduced. Then, we discussed on the need of embedding the structural information of visual concepts into a graph-based image representation. We also investigated some current graph matching algorithms and their limitations. Finally, an initial proposal of the graph-based image retrieval framework was sketched.

Part II described the proposed approach based on the graph-based image representation and a generative matching algorithm.

In chapter4, we presented the system architecture for the graph-based image modeling. This framework included three main stages: image processing, graph modeling and graph retrieval. The image processing step aims at extracting the different visual features from image regions to build a set of visual vocabularies.

The graph modeling step consists of visual concepts construction and spatial relation extraction. Each image is then represented by a corresponding visual graph. Finally, the graph retrieval stage generates the probabilities likelihood for the query image from the trained graphs in the database. Images are ranked based on their relevance values.

Chapter5defined formally the visual graph model based on a set of concept sets and a set of relation sets. Two instances of the visual graph models were used to illustrate the adaptability of the latter to the real applications. Then, we showed how the document graphs are matched against the query graph using the extension of the language modeling framework. For better understanding, we have demonstrated with an intuitive example of graph matching. Finally, we showed how visual graphs were actually ranked in the log-probability space.

Dans le document The DART-Europe E-theses Portal (Page 115-122)