# Flexible Object Models for Category-Level 3D Object Recognition

Texte intégral

### L’archive ouverte pluridisciplinaire

### Akash Kushal

### Computer Science Department

### Cordelia Schmid

### Jean Ponce

### WILLOW Team–ENS/INRIA/ENPC

### methods represent object classes as assemblies of salient

### Lazebnik et al. [10] propose a bag-of-parts model in which the parts are composed of multiple features linked together in an affinely rigid structure. These parts are quite discriminative and relatively stable against intra-class varia- tions; however, large viewpoint variations result in the affine model no longer holding for the object parts. In this pa-

### The learning process starts by selecting two images at random from the training set and computing appearance-

### •

### •

### predicted match (in terms of SIFT matching score) for each secondary patch is then refined using a non-linear refine- ment process similar to the one used to align the initial matches. However, now the refinement process also pe- nalizes the deviation of the refined location from the pre- dicted location to prevent the match from drifting too far from its predicted location.

### The PSM hypotheses learned in Section 3.1 are scored by matching them to a set of validation images containing the object as well as a set of background images. Let N

### We train a logistic regression model to evaluate the qual- ity of individual detections (PSM matches). Concretely, we attach to each match m of a PSM u consisting of n features,

### with every match m for a PSM u based on whether it matches the object part corresponding to u or some back- ground texture in the image. Let us define P u (a|ℓ) as the probability that a PSM match of u has appearance vector a given that it has label ℓ. Since a PSM consists of multi- ple overlapping features, the individual feature matches (the components of a) are not independent, making their proba- bilistic modeling difficult. Thus, instead of learning P u (a|ℓ) directly, we learn a parametric model for

### P

### P u (a|bkg) = P

### P u (obj) × P

### P u (bkg|a)P u (a) = P

### •

### •

### We can associate with any instance u

### model as a MRF (Markov Random Field) structure dubbed

### For efficiency reasons, we ignore the cliques of size greater than two while modeling the intra-PSM relations. The pair- wise consistency constraints between adjacent PSMs are modeled using normal distributions on the relative affine transformations. Concretely, let R u:v ≡ A u −1 A v denote the affine transformation between the patches of v and u, or equivalently the vector of IR

### We assume a diagonal form for the covariance matrix σ u:v .

### p g (A i

### j

### • Compute gradient ∂L/∂M

### ∂L/∂M

### • Update σ

### • Run BP and update the messages on all validation images using the updated σ

### Poisson distribution are learned using the statistics observed on the background data. Note that the appearance models P (m|obj) and P(m|bkg) are not learned explicitly. Instead, the logistic regression model associated with every PSM is used to predict their ratio

### the subset of D corresponding to instance number i of the object in the image (there may be no such instances, or sev- eral ones). Even though D may contain multiple matches for a single PSM, each object instance contains at most one match per PSM. Let Φ denote the set of all the PSMs in the object model, and let Φ S ⊂ Φ for any set of PSM matches S denote the set of PSMs that are matched at least once by matches in S . Finally, let us denote the PSM correspond- ing to a PSM match m by u m and the appearance of m by a m . Recall that the appearance of a PSM match m is a bi- nary vector containing one coordinate for every feature in a m whose value is either 1 or −1 depending on whether the corresponding feature is matched in m or not. An ex- planation E = {O 1 , O 2 . . . O k , B} of D is a partition

### O

### ObservedImage O

### p g (m|bkg)P u

### m∈O

### E

### p({O

### p({O

### um (a m |bkg) ×

### ×

### poiss (n

### g (O

### where n m denotes the number of times m is matched in B i . The first term can be computed using the logistic regression model for the PSM. The second and third terms are com- puted using the learned occlusion parameters p vis , p occ and background Poisson distribution parameter K u . The final term is computed (approximately) by running loopy belief propagation on the graph. In fact, what we need to com- pute is the ratio

### g (O i |obj) which can be computed as fol- lows. The nodes in Φ O

### •

### •

### •

### [1] J. Besag. Spatial interaction and the statistical analysis of

### [2] N. Dalal and B. Triggs. Histograms of oriented gradients for

### [3] R. Fergus, P. Perona, and A. Zisserman. Object class recog-

### [4] V. Ferrari, T. Tuytelaars, and L. V. Gool. Integrating multiple

### [6] C. Garcia and M. Delakis. Convolutional face finder: A neu- ral architecture for fast and robust face detection.

### [7] F. Jurie and C. Schmid. Scale-invariant shape features for

### [8] J. Koenderink and A. Van Doorn. Affine structure from mo-

### [9] A. Kushal and J. Ponce. Modeling 3D objects from stereo

### [10] S. Lazebnik, C. Schmid, and J. Ponce. A maximum entropy framework for part-based texture and object recognition. In

### [12] N. Loeff, H. Arora, A. Sorokin, and D. Forsyth. Efficient unsupervised learning for localization and detection in object

### [13] D. Lowe. Object recognition from local scale-invariant fea-

### [14] M. Everingham et al. The 2005 PASCAL Visual Object

### [15] K. Murphy, Y. Weiss, and M. Jordan. Loopy belief propaga-

### [16] F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. 3D object modeling and recognition using local affine-invariant

Documents relatifs

First, because the inferential value of cases showing a dissociation between two abilities is greater than the value of cases showing an association between the same two

The Object Paradigm (OP) proposed by Partridge (2005) has advantages in the area of ontology engineering over the other perdurantist approaches because it: (1) Provides

“We take pretty seriously that idea of taking people on,” he says, “because more often than not, we’re making some kind of, you know, not forever marriage, but a

IoT applications, by using sensed data or acting on the real world, imply a software architecture designed in order to request, find, and access Objects that provide them.. The

For conservative dieomorphisms and expanding maps, the study of the rates of injectivity will be the opportunity to understand the local behaviour of the dis- cretizations: we

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

It may be the subject or the direct object in a sentence and thus, can mean either 'who?' or 'whom?' You may also choose to use the longer forms: qui est-ce qui to ask 'who?', qui

Ownership and representation contexts were added to both objects and object types: the owner context controls which other objects can access an object, while the representation