A representation of contextual relationships knowledge in images

(1)

HAL Id: hal-00875623

https://hal.archives-ouvertes.fr/hal-00875623

Preprint submitted on 22 Oct 2013

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A representation of contextual relationships knowledge in images

Nguyen Vu Hoang, Valérie Gouet-Brunet, Marta Rukoz

To cite this version:

Nguyen Vu Hoang, Valérie Gouet-Brunet, Marta Rukoz. A representation of contextual relationships

knowledge in images. 2012. �hal-00875623�

(2)

lamsade

LAMSADE

Laboratoire d Analyses et Modélisation de Systèmes pour l Aide à la Décision

UMR 7243

Mai 2012

A representation of contextual relationships knowledge in images

N. V. Hoang, V.Gouet-Brunet, M.Rukoz

CAHIER DU

320

(3)

A representation of ontextual relationships knowledge in images

Nguyen Vu Hoang

1,3

, ValérieGouet-Brunet

2

,Marta Rukoz

1,3

1:LAMSADE-UniversitéParis-Dauphine-PlaedeLattredeTassigny-F75775ParisCedex16

2:CEDRIC/CNAM-292,rueSaint-Martin-F75141ParisCedex03

3:UniversitéParisOuestNanterreLaDéfense-200,avenuedelaRépublique-F92001NanterreCedex

nguyenvu.hoangdauphine.fr, valerie.gouetnam.fr,marta.rukozdauphine.fr

23 mars 2012

Abstrat

Thisreportisfousedonthestudyofmethodsforimageretrievalinolletionofheterogeneous

ontents.Thespatialrelationshipsbetweenentitiesinanimageallowtoreatetheglobaldesription

oftheimagethatwealltheimageontext.Takingintoaounttheontextualspatialrelationships

inthesimilaritysearhofimagesanallowimprovingtheretrievalqualitybylimitingfalsealarms.

Wedenedtheontextofimageasthepreseneofentityategoriesandtheirspatialrelationships

intheimage.

By studying statistially the relationships between dierent entity ategories on LabelMe, a

symboliimages databasesof heterogeneous ontent, wereate a artographyoftheir spatial re-

lationships that an be integrated in a graph-based model of the ontextual relationships, the

prinipal ontribution of this report. This graph desribesthe general knowledge of everyentity

ategories.Spatialreasoning onthis knowledgegraph anhelp improving tasks ofimage proes-

singsuhasdetetionandloalization ofanentityategorybyusingthepreseneofanotherone.

Further,thismodelanbeappliedtorepresenttheontextofanimage. Thesimilaritysearhba-

sedonontextanbeahievedbyomparing thegraphs,then, ontextualsimilaritybetweentwo

imagesisevaluatedbythesimilaritybetweentheirgraphs.Thisworkwasevaluatedonthesymbo-

liimagedatabaseofLabelMe.Theexperimentsshoweditsrelevaneforimageretrievalbyontext.

Keywords :Image,similaritysearh,spatial relationships,imageontext.

1 Introdution

Theinterpretationofimagesbyamahinerequirestohavearepresentationofimagespreproessed

manuallyor automatially. Thisrepresentation an be builtfrom visualfeatures(suhasolor,shape

of elements in images) or higher level information (suh asspatial relationships between elements or

models of these elements). To date, it is still diult to build a robust model for automati image

(4)

say that what makes suh humans' possible aptitudes is their ability to interpret visual features of

images by using prior knowledge. Prior knowledge is very often related to the presene of multiple

entities inanimage and to thespatial information linkingthem,thatan bealledtheontext ofthe

image.Humans an inorporate this knowledge to analysetheimage and toreate personal semanti

onepts.Aording to [8 ℄,image interpretation an be lassied into threelevels ofomplexity:

Level 1 : interpretation is based mainly on primary featuressuh as olor, texture, shape, seg-

mentedregions, interestpointsand/orthespatialloation ofimageelements. Thesefeaturesare

rather objetive and their estimation is performed diretly, it does not require any knowledge

base. Many approahes of image retrieval or of image ategorization an be lassied into this

level (e.g.BagofFeaturesandall its derived approahes).

Level 2 : interpretation involves some degree of logial inferene onerning the image ontent.

At this level, queries are performed in order to retrieve entities of a given type or to retrieve a

speientityfromanother.Theneed ofaproessedknowledgebaseisobvious. Thisknowledge

basean ontain low-level information of entity ategories (e.g olor, shape), relationships bet-

ween ategories (e.g.orrelations,onditional probabilities, spatial relationships), andmore.

Level 3 : interpretation is based on symboli features. At this level, a signiant amount of

high-level reasoningaboutthemeaning and purposeon theentities or on bakground of images

an be involved. The result of this proessing isthat an image an be linked to a onept bya

subjetivejudgement.Todate,humansareonlyoneswhoanproposeaneetive interpretation

ofan image at thislevel.

Mostof theapproahesproposedlie between levels 1and 2.Itisstill diultto lie betweenlevels

2 and 3 that refer to high-level semanti image retrieval [12℄. The main eort is to onnet low-level

featuresto high-levelsemantis ofimages. The eetive approahesto dateare:

1. usingmahinelearningmethods inorderto assoiate high-levelonepts to low-levelfeatures,

2. takinginto aount userfeedbakinorderto improve subjetive onepts,

3. inferringvisual ontent basedon textualinformation extrated from image ontext,

4. usingentity ontology inorderto dene high-levelonepts.

Mostontentbasedimageretrievalsystemsexploit aombination oftwoormoreofthesemethods

in order to perform high-level semanti image retrieval (see [24, 8, 18 , 27, 6 ℄). Although the results

obtained are promising, designing systems that really understand image ontent at semanti level is

stilla openproblem.

In this paper, we propose a model to represent a general knowledge about relationships suh as

spatial onesbetween entity ategories existing inanimage database.Furthermore,this model an be

used to desribe the ontext of a given image. The denition of image ontext is disussed in the

setion2.2. Finally,we observe thatanimage maybelinked to multiplesubjetive interations,then,

we hope to attribute a semanti meaning to eah ontext image that an failitate image retrievalor

reognition tasks.Our work fallsinthethird and fourthategories ofapproahes listed above.In the

(5)

andonsider thatthese onepts areknown.

Inthesetion2,wepresentthedenitionofimageontextandseveralspatialrelationshipsbetween

ategories. The setion 3 presents the onepts and denitions of our graph model. In the next, we

disusstheevolutionapaityof the graphto the newknowledge and spatialreasoning inthesetion

4and 5.Finally, inthe setion6,we present several experiments to evaluateour graph model.

2 Initial denitions

In this setion, we present several denitions like spatial relationships and image ontext before

presenting our prinipal work.

2.1 Spatial relationship

In our framework, we are rstly interested in the representation of spatial relationships between

symboli objets inimages, alled entities. In CBIR,embedding suh information into image ontent

desription provides a better representation of the ontent as well as new senarios of interrogation.

Thespatial relationships an be theunary, binary,andternary relationships.

We all unary relationship, the relationship between an entity and its loalization in an image,

where loalization is dened as a region or an areaof the image. Areas of an image an be represen-

ted in dierent ways like quad-tree or quin-tree, see for example [20, 28 ℄. Sine we do not have any

knowledgeapriorioftheloationoftheategoriesintheimages, weproposetosplitimagesinaxed

numberof regularareas (i.e. equal size areas). First, we divideeah image ina xedsized grid. Eah

ell of this grid, alledatomi area, is represented bya ode. Fig.1and 2 depit a splitting in

9

^or ⁱⁿ

16

^dierent ^basi âreas ând ^theirsôdes, respetively. We then ombine these odesto present more omplex areas,byexample for 9-area splitting, ode

009

^represents ^area⁽ ⁾ ^grouping ^together ^areas

001

⁽ ⁾ ^and

008

⁽ ^).

001 008 064

002 016 128

004 032 256

Figure 1 Codesinunaryrelationshipbysplittinganimageinnine areas.

00001 00016 00256 04096

00002 00032 00512 08192

00004 00064 01024 16384

00008 00128 02048 32768

Figure 2 Codesinunaryrelationshipbysplittinganimagein 16areas.

A binary relationship links two entities of distint ategories together in an image. In last years,

(6)

lassied as topologial, diretional or distane-based approahes (see [11 ℄ for more details), and an

be applied on symboli objets or low levelfeatures. Here, we have foussed on relationships between

theentities of thedatabase desribed intermsof diretional relationships withapproah 9DSpa [17℄,

oftopologial relationships [9, 10℄and of a ombination of them with2Dprojetions[19 ℄. We do not

useorthogonal[3℄ and9DLTrelationship [2℄beause ofits inonvenienes mentioned in[17 ℄.

A ternary relationship desribesa relationship of a triplet of ategories. To our knowledge, a few

approaheswere proposedto desriberisp triangular relationships of threesymbolientities.Wean

mentionTSRapproah[13℄andourapproah

∆

^-TSR^(see^[16℄). ^Byâpplying^toâ^setôfheterogeneous symboli entities that do not have xed shape and size, these approahes annot desribed fully tri-

angularspatial relationships between symboli entities sine theytakeinto aount onlytheenterof

eahentityasrepresentation of it.

2.2 Image ontext

Inanimage, thereognition or detetionofentityategoryrequiresdierent information fromthe

raw image data. Aording to [26℄, in the real world, there exists a strong relationship between the

environmentsand entities foundwithinitor between the entities.Entitiesarenever inisolation.They

an tend to o-vary with others entities and partiular environments for providing a rih olletion

of ontextual assoiations.The reognition or detetion will be aurate and quik ifentities usually

appear ina familiarbakground.Then, initially,we andene that theontext ofan image desribes

allpossible typesof relationshipbetween the entities in thisimage, or between theentities and bak-

ground of this image. The use of image ontext an bring a strong interest not only for reognizing

or deteting the entity ategory but also for image retrieval. For the reognition or detetion of an

entity ategory, it is evident to examine the general ontext of image if theloal features are insu-

ient (e.g.entityis small,orappearspartially). For image retrieval, theomparisonof image ontexts

anhelptolteroutthefalsealarmsbeforeenterinthestepofomparisonofvisualontentsofimages.

By usingthevisual featuresinimage,theontext an bedesribed byrelationships between loal

informations and global information of the image. This ontext denition an drive to a hard work

of image proessing. Another natural way of representing the ontext of an image is using the o-

ourrene relationships ofits entities. In thereal world, theo-ourrene might happen at a global

level, for example a bed room will predit a bed, or at a loal level, for example a table will predit

thepreseneofa hair. Aprobabilisti probleman bealso assoiated inthis ase.Moreomplex,the

spatialrelationships between entityategories inimagesan be taken intoaount.Ingeneral, thatis

diult to have an exat denition of ontext; eah ase of use an depend on a partiular ontext

denition.

Here, we try to study dierent relationship that ould be present in images. Aording to [14℄,

entities inanimage anbethings (e.g.ar,people) or stuff(e.g.road,buildings,more preisethat

aretheregionsinimages).In general,we an have vetypesof relationship :

(7)

Stu-Thing:textureregionsthat allows to preditthepresent ofan entity ategory.

Stu-Stu:relationships between regionsof images.

Sene-Thing : sene information suh as sale, global diretion that allows to determine the

loationof anentityategory.

Sene-Stu:seneinformationsuhassale,globaldiretionthatallowstodeterminetheloation

ofa region.

In our framework, we do not dierentiate the entities present in image as "Thing" or as "Stu"

beause we are interested in symboli objets that are represented by polygons. These entities are

lassiedsimplybyategory.Wedenetheimage ontext bythepreseneofentityategoriesinimage

and by thespatial relationships between these entity ategories. The presene of at least an instane

ofanentityategorywillonrmthepreseneofthisone.Thespatialrelationships betweenategories

inimagewillberepresentedinageneral way(e.g.probabilities).There aretwoprinipalwaysofusing

theontext inavision system:

Apriori:inthis way,theontext servesto loatethe entities,to limit thesearhingregion,and

toderease the retrievaltime(for examplethe approahesproposedby[26 , 14,25℄).

Aposteriori :the ontext servesto objetreognition iftheloalinformation isnot suient, it

anhelpto redue theambiguitiesofthepresentsof objetsinthesame sene(forexample the

approahesproposedby[23, 22,5 ℄).

There has been a growing interest in exploiting ontextual information for image retrieval, las-

siation or objet detetion, reognition. Dierent tehniques have been exploited to desribe the

ontext of image for this purpose. Intuitively, the spatial loations of objets and bakground sene

from global view an be used as inside-image ontext. Further, the ombination of objet detetion

andlassiationtaskstogetheranprovidenaturalomprehensiveontextforeahotherwithoutany

external assistane. Moreover, a knowledge database, onsidered asa external element, based onma-

hinelearningSVMorprobabilistitehnique,allowstoenhaneothertasksasloalization,detetion,

et. Ourapproah proposedinthenext setions isa priorione.

3 A Graph-based Knowledge Representation

In this setion, we present how to represent a knowledge between entity ategories in images by

usingagraph.Theoneptsanddenitionsofthisgrapharepresentedinsetion3.1.Toavoidbuilding

anunreadable graph,theattributes ofnodeandthegraph onstraintsaredisussedinsetion3.2and

3.3respetively.Finally,setion 3.4presentsabrief example oftheuseof thegraph.

3.1 Conept and denitions

In reality, events an be expressed by two notions : entity and relationship. For example, if we

have anevent "ourlaboratory invited aprofessor last month", then this event an be representedby

two entities : "our laboratory" and "a professor" that have a relationship "invite" of attribute "last

month". We knowthat a general knowledge presents a general event based on onrete eventswhih

happened. For example,basedon the previousevent, ageneral event ouldbe formed:"Laboratories

(8)

maybe linked bya relationship of type "invite". Based on this argument, we wouldliketo represent

the learnt knowledge by using a graph onept developed only on two notions of "ategory" and of

"relationship". In this graph, the instane of ategory (an entity) is not meaningful to guarantee a

generalrepresentationof aknowledge. However, entities arealwaysexamined beforebuildingagraph.

Thereason wasexplained previously :ageneral event is basedon partiularsevents.

Normally,in a lassi graph, a vertex (ora node) represents a ategory and an edgerepresents a

relationship. We observed thata "relationship" maybe unary,binary,ternary or n-nary relationship,

then a "relationship" an onern one or many ategories. A lassi graph an represent only the

binaryrelationships between two verties. By using a hypergraph, a generalization of a graph [4℄,an

edge an onnet any number of verties (see Fig.3(a)). However, this onnetion of a set of verties

is represented by only an edge. We know that, between two or more "ategories", there are many

"relationships".It means thatdierent edgesan have thesame end nodes, thenamultigraph [1℄an

beanotheralternativegraph.However,amultigraphallowstodesribemultiplerelations betweentwo

andonlytwoverties(seeFig.4(a)). Forour onept,wewouldlike tousetheadvantages ofthesetwo

graph models, and we know that a bipartite graph [29 ℄ an model the more general multigraph and

hypergraph (see examplesofrepresentation ofhypergraph inFig.3(b) andof multigraphinFig.4(b)).

(a)Hypergraph (b)Bipartite graph

Figure 3 An example ofhypergraph andits bipartite representation.

(a)Multigraph (b)Bipartite graph

Figure 4 Anexample of multigraphand its bipartite representation.

We would like to present multiple relations between multiple verties, that is why we deided to

represent arelationshipbyanode.Our graph,denoted

G

^,îsâ^bipartite ^graph ândôntains^two ^types

(9)

ofnodes:aategorynode,denoted

C

^and^arelationshipnode,denoted

R

^.^Wê^giveâ^denition^toêah

typeof node inour graph

G

^:

Aategory node

C

^represents ^the êxisteneôf â^set ôf âtegories ⁱⁿâ ^same environment, i.e. in thesame database.For aset ofategories

K = {cat

i

}

^,^its representation node is

C

_{cat_i_} ^or

C

K^.

C

K ân ôwn ^dierent âttributes ^desribing ^some information of

K

^suh âs ^visual ^features ôr â

dediatedobjetdetetionalgorithm.

Similarly, a relationship node

R

^represents ^a

type

^of relationship between ategories in a set

J = {cat

_j

}

^, ^we ^denote ^this ^node

R

_J^type^. ^F^rom ^a

R

^type_J ^, ^we ^an ^learn ^all ^possible ongurations of relationship

type

^involved ^from

J

^. În ôur ^framework, ^we âre êspeially ⁱⁿ^terested ⁱⁿ ^spatial

relationships,for example,relationship

type

^an^be^the^topologial ^spatialrelationship [10 ℄ that desribesdierent ongurations :disjoint,joint, overlaps,insides ,et.

Now, graph

G

^is ^dened ^as^:

G = (V, E)

⁽¹⁾

Knowing that

V

^is ^the ^set^of ^nodesⁱⁿ^the^graph ^:

V = {C

K

} ∪ {R

J

}

⁽²⁾

Thisgraphis anundireted graph.Then,

E

îs^the^set ôfêdges ^:

E = {e

_(C_K_,R_J₎

|∀C

K

, R

J

∈ V ∧ K ⊆ J }

∀C

K

, R

J

∈ V ; e

_(C_K_,R_J₎

⇒ e

_(R_J_,C_K₎

∀C

_K

, C

_J

∈ V ; ∄e

_(C

K,C_J)

∀R

K

, R

J

∈ V ; ∄e

_(R_K_,R_J₎

(3)

Figure5 givesanexample of graph(at3 levels).

3.2 Attributes of a node

Ontheotherhand,our requirementisthatgraph-basedrepresentationmustbesimpletoompute,

leartounderstand,andextendibletorepresentanewomplexgeneralknowledge.Fromthisquestion,

we dene spei attributes assoiated to eah node, ategory or relationship, in setions 3.2.1 and

3.2.2.

3.2.1 Level attribute

To avoid building an unreadable graph, based on the idea that say that a omplex knowledge is

developedfromasetofbasiones,wesplitourgraphintomanydierentlevels.Agraphlevelindiates

thenumberofategoriesonerned:

|{cat

i

}|

^meaning^ardinality^of^set

{cat

i

}

^.^Thus,^eah^level^of^graph

(10)

level0,1,and 2respetively.

is omposed of a set of

C

^and

R

^nodes^that ^have ^the ^same

|cat

i

|

^. ^Thus, ^we ân ônsider ^the^level âs

an attribute of the node, denoted

lev

^,^that ^is ^dened ^as ^:

lev = |cat

_i

| − 1

^. Ôur îdea îs ^that â ^higher

levelnode an be built from lower level nodes. A low level node an onnet only to one higher level

byedges between

C

^nodes^of^lower ^level^and

R

^nodes^of ^higher^level.

Inonsequene, we redeneset

E

^:

E = {e

_(C_K_,R_J₎

|∀C

K

, R

J

∈ V ∧ K ⊆ J ∧ (C.lev = R.lev ∨ C.lev + 1 = R.lev)}

⁽⁴⁾

Note thatwe an expand thenumber oflevels inthe graph aswe need. Ifwe study

N

^ategories,

then at the level

l

^, ^we ^an ^have ⁱⁿ ^maximum _(N−l)!^N! ^nodes

C

^. Â ^high ^number ôf ^levels ân înrease

onsiderably thenumber ofnodesin the graph, however, inreality, we an nd manyategories that

never our together. For example, in [15℄, by studying

86

^dierent êntity âtegories ⁱⁿ â ^dataset ôf

heterogeneousontent,we found

879

ôuplesôf âtegories^that^neverôur^together âmongâ^set

7310

ofpossible oupleand only

38031

^present ^triplets ⁱⁿ^total^among

102340

^possible ^triplets.

In Fig.5, we show the onept for a 3-level graph. Conretely, we an model unary, binary, and

ternaryrelationships withthis graph.