Thesis outline - The DART-Europe E-theses Portal

We describe here the structure of this thesis. This thesis has six chapters:

Chapter 2 introduces the early works on image indexing and retrieval. We will give an overview of the image processing such as image decomposition (grid partition, region segmentation or local keypoints), visual feature extraction (color, edge histogram and local invariant features). A preliminary indexing models based on the Bag-of-Word (BoW) model is also introduced. We describe how the visual concepts are constructed from the low-level visual features and quantized with the vector model. How latent semantic technique was used successfully with the BoW model is also discussed. Our goal is to present in this chapter the basic

1.4. Thesis outline 9

steps in representing image contents. Based on these elementary steps, we present in chapter 3 the different learning methods of visual concepts in the literature.

Chapter 3 concentrates on different machine learning techniques based on the numerical representation of an image. We review two main approaches in information retrieval: generative-based model and discriminative-based model.

The generative models include two main methods: Naive Bayes and Probabilistic Latent Semantic Analysis (pLSA). The discriminative models include two main methods: k-NN classification and the famous Support Vector Machine (SVM). We also mention in this chapter how the structure been captured to represent image content with the graph-based model. One important model that our method relied on is Language Modeling (LM) method will be detailed in this chapter.

Chapter 4 gives an overview of our proposed approach. The proposed model includes 3 main stages:

• Image processing stage aims at extracting image regions and keypoints from the image. It also consists of computing the numerical feature vectors associated with image regions or keypoints.

• Graph modeling stage consists of grouping similar visual features into clusters using the unsupervised learning algorithm. The visual concepts are generated for each type of visual feature. Then, the spatial relations between the visual concepts are extracted. Finally, an image is represented by a visual graph composed of a set of visual concepts and a set of spatial relations.

• Graph retrieval stage is to retrieve the relevant graphs to a new image query. Inspired by the language model, we extend this framework for matching the query graph with the trained graph from the database. Images are then ranked based on their probability likelihoods.

Chapter 5 details the proposed visual graph model. We formalize the definition of visual graph model and give examples of two graph instance. The graph matching model takes the query graph model and the document graph model as input to rank the image based on their probability likelihood. The matching model is an extended version of the language modeling to graphs. We also explain how we transform the normal probability into the log-probability domain to compute the relevance status value of image.

Chapter 6 presents the first application using the proposed approach: outdoor scene recognition system. We will present the proposed visual graph models adapted for the STOIC collection. The experimental result will be studied with different impacts of the relation and of multiple image queries on the classification performance. We will describe different techniques for optimizing the smoothing

parameter with cross validation technique and optimization based on the test set.

The implementation of the scene recognition system will also be detailed in this chapter.

Chapter 7 demonstrates the second application of the visual graph model, namely mobile robot localization. The proposed visual graph models adapted to this image collection will be presented. We will provide the experimental results with different impacts of the relation and of the room classification accuracies.

We also give a comparison of the proposed model with the SVM method. Then, we will discuss on how validation set has been used to choose the appropriate features for representing the image contents. The post-processing step and the official results of the run submitted to the ImageCLEF will also be discussed.

Chapter 8 concludes this dissertation with the discussion on the contribution and also on the perspective of the future works.

Part I

State of The Art

Chapter 2 Image Indexing

To take photographs means to recognize - simultaneously and within a fraction of a second - both the fact itself and the rigorous organization of visually perceived forms that give it meaning.

Henri Cartier-Bresson.

2.1 Introduction

In [Marr 1982], Marr described the three layers of a classical paradigm in machine vision: the processing layer (1), the mapping layer (2), the high-level interpretation layer (3) (detailed in Figure2.1). These three layers can be aligned to the three levels of image representation in CBIR, namely feature layer (low level), conceptual layer (middle level) and semantics layer (high level). The feature layer concerns how to extract good visual feature from the pictorial data of an image. This layer is close to the actual computer representation of image.

The conceptual layer maps the low-level signal information to a higher visual perception form, called visual concept. A visual concept is represented for a set of homogenous group of visual features. The semantics layer represents image with the highest form of knowledge representation which is close to the human understanding, i.e., textual description or textual concept.

For this reason, the “semantic gap” is often referred to “the lack of co-incidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation”

[Smeulders et al. 2000]. More precisely, it is the lack of knowledge representation between the low-level feature layer and the high-level semantics layer. Since this problem is still unsolved, our objective is to inject a new intermediate-level of image representation in between conceptual layer and semantics layer. We believe that will help to reduce this gap.

Figure 2.1: Illustration of Marr’s paradigm [Marr 1982] for a vision system.

In this chapter, we will describe the works concerning mostly the first two layers (visual feature layer and conceptual layer) in a CBIR system. In the next section, we will present three different methods for region extraction:

grid partitioning, region segmentation and interest point detection. Section 2.3 provides the information on the visual features extraction step. Section2.4gives more details on the indexing models, such as vector model, bag-of-words model and latent semantics indexing model, from the CBIR fields. Finally, section2.5 will summarize this chapter.

Dans le document The DART-Europe E-theses Portal (Page 19-25)