HAL Id: hal-00875623
https://hal.archives-ouvertes.fr/hal-00875623
Preprint submitted on 22 Oct 2013
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A representation of contextual relationships knowledge in images
Nguyen Vu Hoang, Valérie Gouet-Brunet, Marta Rukoz
To cite this version:
Nguyen Vu Hoang, Valérie Gouet-Brunet, Marta Rukoz. A representation of contextual relationships
knowledge in images. 2012. �hal-00875623�
lamsade
LAMSADE
Laboratoire d Analyses et Modélisation de Systèmes pour l Aide à la Décision
UMR 7243
Mai 2012
A representation of contextual relationships knowledge in images
N. V. Hoang, V.Gouet-Brunet, M.Rukoz
CAHIER DU
320
A representation of ontextual relationships knowledge in images
Nguyen Vu Hoang
1,3
, ValérieGouet-Brunet
2
,Marta Rukoz
1,3
1:LAMSADE-UniversitéParis-Dauphine-PlaedeLattredeTassigny-F75775ParisCedex16
2:CEDRIC/CNAM-292,rueSaint-Martin-F75141ParisCedex03
3:UniversitéParisOuestNanterreLaDéfense-200,avenuedelaRépublique-F92001NanterreCedex
nguyenvu.hoangdauphine.fr, valerie.gouetnam.fr,marta.rukozdauphine.fr
23 mars 2012
Abstrat
Thisreportisfousedonthestudyofmethodsforimageretrievalinolletionofheterogeneous
ontents.Thespatialrelationshipsbetweenentitiesinanimageallowtoreatetheglobaldesription
oftheimagethatwealltheimageontext.Takingintoaounttheontextualspatialrelationships
inthesimilaritysearhofimagesanallowimprovingtheretrievalqualitybylimitingfalsealarms.
Wedenedtheontextofimageasthepreseneofentityategoriesandtheirspatialrelationships
intheimage.
By studying statistially the relationships between dierent entity ategories on LabelMe, a
symboliimages databasesof heterogeneous ontent, wereate a artographyoftheir spatial re-
lationships that an be integrated in a graph-based model of the ontextual relationships, the
prinipal ontribution of this report. This graph desribesthe general knowledge of everyentity
ategories.Spatialreasoning onthis knowledgegraph anhelp improving tasks ofimage proes-
singsuhasdetetionandloalization ofanentityategorybyusingthepreseneofanotherone.
Further,thismodelanbeappliedtorepresenttheontextofanimage. Thesimilaritysearhba-
sedonontextanbeahievedbyomparing thegraphs,then, ontextualsimilaritybetweentwo
imagesisevaluatedbythesimilaritybetweentheirgraphs.Thisworkwasevaluatedonthesymbo-
liimagedatabaseofLabelMe.Theexperimentsshoweditsrelevaneforimageretrievalbyontext.
Keywords :Image,similaritysearh,spatial relationships,imageontext.
1 Introdution
Theinterpretationofimagesbyamahinerequirestohavearepresentationofimagespreproessed
manuallyor automatially. Thisrepresentation an be builtfrom visualfeatures(suhasolor,shape
of elements in images) or higher level information (suh asspatial relationships between elements or
models of these elements). To date, it is still diult to build a robust model for automati image
say that what makes suh humans' possible aptitudes is their ability to interpret visual features of
images by using prior knowledge. Prior knowledge is very often related to the presene of multiple
entities inanimage and to thespatial information linkingthem,thatan bealledtheontext ofthe
image.Humans an inorporate this knowledge to analysetheimage and toreate personal semanti
onepts.Aording to [8 ℄,image interpretation an be lassied into threelevels ofomplexity:
Level 1 : interpretation is based mainly on primary featuressuh as olor, texture, shape, seg-
mentedregions, interestpointsand/orthespatialloation ofimageelements. Thesefeaturesare
rather objetive and their estimation is performed diretly, it does not require any knowledge
base. Many approahes of image retrieval or of image ategorization an be lassied into this
level (e.g.BagofFeaturesandall its derived approahes).
Level 2 : interpretation involves some degree of logial inferene onerning the image ontent.
At this level, queries are performed in order to retrieve entities of a given type or to retrieve a
speientityfromanother.Theneed ofaproessedknowledgebaseisobvious. Thisknowledge
basean ontain low-level information of entity ategories (e.g olor, shape), relationships bet-
ween ategories (e.g.orrelations,onditional probabilities, spatial relationships), andmore.
Level 3 : interpretation is based on symboli features. At this level, a signiant amount of
high-level reasoningaboutthemeaning and purposeon theentities or on bakground of images
an be involved. The result of this proessing isthat an image an be linked to a onept bya
subjetivejudgement.Todate,humansareonlyoneswhoanproposeaneetive interpretation
ofan image at thislevel.
Mostof theapproahesproposedlie between levels 1and 2.Itisstill diultto lie betweenlevels
2 and 3 that refer to high-level semanti image retrieval [12℄. The main eort is to onnet low-level
featuresto high-levelsemantis ofimages. The eetive approahesto dateare:
1. usingmahinelearningmethods inorderto assoiate high-levelonepts to low-levelfeatures,
2. takinginto aount userfeedbakinorderto improve subjetive onepts,
3. inferringvisual ontent basedon textualinformation extrated from image ontext,
4. usingentity ontology inorderto dene high-levelonepts.
Mostontentbasedimageretrievalsystemsexploit aombination oftwoormoreofthesemethods
in order to perform high-level semanti image retrieval (see [24, 8, 18 , 27, 6 ℄). Although the results
obtained are promising, designing systems that really understand image ontent at semanti level is
stilla openproblem.
In this paper, we propose a model to represent a general knowledge about relationships suh as
spatial onesbetween entity ategories existing inanimage database.Furthermore,this model an be
used to desribe the ontext of a given image. The denition of image ontext is disussed in the
setion2.2. Finally,we observe thatanimage maybelinked to multiplesubjetive interations,then,
we hope to attribute a semanti meaning to eah ontext image that an failitate image retrievalor
reognition tasks.Our work fallsinthethird and fourthategories ofapproahes listed above.In the
andonsider thatthese onepts areknown.
Inthesetion2,wepresentthedenitionofimageontextandseveralspatialrelationshipsbetween
ategories. The setion 3 presents the onepts and denitions of our graph model. In the next, we
disusstheevolutionapaityof the graphto the newknowledge and spatialreasoning inthesetion
4and 5.Finally, inthe setion6,we present several experiments to evaluateour graph model.
2 Initial denitions
In this setion, we present several denitions like spatial relationships and image ontext before
presenting our prinipal work.
2.1 Spatial relationship
In our framework, we are rstly interested in the representation of spatial relationships between
symboli objets inimages, alled entities. In CBIR,embedding suh information into image ontent
desription provides a better representation of the ontent as well as new senarios of interrogation.
Thespatial relationships an be theunary, binary,andternary relationships.
We all unary relationship, the relationship between an entity and its loalization in an image,
where loalization is dened as a region or an areaof the image. Areas of an image an be represen-
ted in dierent ways like quad-tree or quin-tree, see for example [20, 28 ℄. Sine we do not have any
knowledgeapriorioftheloationoftheategoriesintheimages, weproposetosplitimagesinaxed
numberof regularareas (i.e. equal size areas). First, we divideeah image ina xedsized grid. Eah
ell of this grid, alledatomi area, is represented bya ode. Fig.1and 2 depit a splitting in
9
or in16
dierent basi areas and theirsodes, respetively. We then ombine these odesto present more omplex areas,byexample for 9-area splitting, ode009
represents area( ) grouping together areas001
( ) and008
( ).001 008 064
002 016 128
004 032 256
Figure 1 Codesinunaryrelationshipbysplittinganimageinnine areas.
00001 00016 00256 04096
00002 00032 00512 08192
00004 00064 01024 16384
00008 00128 02048 32768
Figure 2 Codesinunaryrelationshipbysplittinganimagein 16areas.
A binary relationship links two entities of distint ategories together in an image. In last years,
lassied as topologial, diretional or distane-based approahes (see [11 ℄ for more details), and an
be applied on symboli objets or low levelfeatures. Here, we have foussed on relationships between
theentities of thedatabase desribed intermsof diretional relationships withapproah 9DSpa [17℄,
oftopologial relationships [9, 10℄and of a ombination of them with2Dprojetions[19 ℄. We do not
useorthogonal[3℄ and9DLTrelationship [2℄beause ofits inonvenienes mentioned in[17 ℄.
A ternary relationship desribesa relationship of a triplet of ategories. To our knowledge, a few
approaheswere proposedto desriberisp triangular relationships of threesymbolientities.Wean
mentionTSRapproah[13℄andourapproah
∆
-TSR(see[16℄). Byapplyingtoasetofheterogeneous symboli entities that do not have xed shape and size, these approahes annot desribed fully tri-angularspatial relationships between symboli entities sine theytakeinto aount onlytheenterof
eahentityasrepresentation of it.
2.2 Image ontext
Inanimage, thereognition or detetionofentityategoryrequiresdierent information fromthe
raw image data. Aording to [26℄, in the real world, there exists a strong relationship between the
environmentsand entities foundwithinitor between the entities.Entitiesarenever inisolation.They
an tend to o-vary with others entities and partiular environments for providing a rih olletion
of ontextual assoiations.The reognition or detetion will be aurate and quik ifentities usually
appear ina familiarbakground.Then, initially,we andene that theontext ofan image desribes
allpossible typesof relationshipbetween the entities in thisimage, or between theentities and bak-
ground of this image. The use of image ontext an bring a strong interest not only for reognizing
or deteting the entity ategory but also for image retrieval. For the reognition or detetion of an
entity ategory, it is evident to examine the general ontext of image if theloal features are insu-
ient (e.g.entityis small,orappearspartially). For image retrieval, theomparisonof image ontexts
anhelptolteroutthefalsealarmsbeforeenterinthestepofomparisonofvisualontentsofimages.
By usingthevisual featuresinimage,theontext an bedesribed byrelationships between loal
informations and global information of the image. This ontext denition an drive to a hard work
of image proessing. Another natural way of representing the ontext of an image is using the o-
ourrene relationships ofits entities. In thereal world, theo-ourrene might happen at a global
level, for example a bed room will predit a bed, or at a loal level, for example a table will predit
thepreseneofa hair. Aprobabilisti probleman bealso assoiated inthis ase.Moreomplex,the
spatialrelationships between entityategories inimagesan be taken intoaount.Ingeneral, thatis
diult to have an exat denition of ontext; eah ase of use an depend on a partiular ontext
denition.
Here, we try to study dierent relationship that ould be present in images. Aording to [14℄,
entities inanimage anbethings (e.g.ar,people) or stuff(e.g.road,buildings,more preisethat
aretheregionsinimages).In general,we an have vetypesof relationship :
Stu-Thing:textureregionsthat allows to preditthepresent ofan entity ategory.
Stu-Stu:relationships between regionsof images.
Sene-Thing : sene information suh as sale, global diretion that allows to determine the
loationof anentityategory.
Sene-Stu:seneinformationsuhassale,globaldiretionthatallowstodeterminetheloation
ofa region.
In our framework, we do not dierentiate the entities present in image as "Thing" or as "Stu"
beause we are interested in symboli objets that are represented by polygons. These entities are
lassiedsimplybyategory.Wedenetheimage ontext bythepreseneofentityategoriesinimage
and by thespatial relationships between these entity ategories. The presene of at least an instane
ofanentityategorywillonrmthepreseneofthisone.Thespatialrelationships betweenategories
inimagewillberepresentedinageneral way(e.g.probabilities).There aretwoprinipalwaysofusing
theontext inavision system:
Apriori:inthis way,theontext servesto loatethe entities,to limit thesearhingregion,and
toderease the retrievaltime(for examplethe approahesproposedby[26 , 14,25℄).
Aposteriori :the ontext servesto objetreognition iftheloalinformation isnot suient, it
anhelpto redue theambiguitiesofthepresentsof objetsinthesame sene(forexample the
approahesproposedby[23, 22,5 ℄).
There has been a growing interest in exploiting ontextual information for image retrieval, las-
siation or objet detetion, reognition. Dierent tehniques have been exploited to desribe the
ontext of image for this purpose. Intuitively, the spatial loations of objets and bakground sene
from global view an be used as inside-image ontext. Further, the ombination of objet detetion
andlassiationtaskstogetheranprovidenaturalomprehensiveontextforeahotherwithoutany
external assistane. Moreover, a knowledge database, onsidered asa external element, based onma-
hinelearningSVMorprobabilistitehnique,allowstoenhaneothertasksasloalization,detetion,
et. Ourapproah proposedinthenext setions isa priorione.
3 A Graph-based Knowledge Representation
In this setion, we present how to represent a knowledge between entity ategories in images by
usingagraph.Theoneptsanddenitionsofthisgrapharepresentedinsetion3.1.Toavoidbuilding
anunreadable graph,theattributes ofnodeandthegraph onstraintsaredisussedinsetion3.2and
3.3respetively.Finally,setion 3.4presentsabrief example oftheuseof thegraph.
3.1 Conept and denitions
In reality, events an be expressed by two notions : entity and relationship. For example, if we
have anevent "ourlaboratory invited aprofessor last month", then this event an be representedby
two entities : "our laboratory" and "a professor" that have a relationship "invite" of attribute "last
month". We knowthat a general knowledge presents a general event based on onrete eventswhih
happened. For example,basedon the previousevent, ageneral event ouldbe formed:"Laboratories
maybe linked bya relationship of type "invite". Based on this argument, we wouldliketo represent
the learnt knowledge by using a graph onept developed only on two notions of "ategory" and of
"relationship". In this graph, the instane of ategory (an entity) is not meaningful to guarantee a
generalrepresentationof aknowledge. However, entities arealwaysexamined beforebuildingagraph.
Thereason wasexplained previously :ageneral event is basedon partiularsevents.
Normally,in a lassi graph, a vertex (ora node) represents a ategory and an edgerepresents a
relationship. We observed thata "relationship" maybe unary,binary,ternary or n-nary relationship,
then a "relationship" an onern one or many ategories. A lassi graph an represent only the
binaryrelationships between two verties. By using a hypergraph, a generalization of a graph [4℄,an
edge an onnet any number of verties (see Fig.3(a)). However, this onnetion of a set of verties
is represented by only an edge. We know that, between two or more "ategories", there are many
"relationships".It means thatdierent edgesan have thesame end nodes, thenamultigraph [1℄an
beanotheralternativegraph.However,amultigraphallowstodesribemultiplerelations betweentwo
andonlytwoverties(seeFig.4(a)). Forour onept,wewouldlike tousetheadvantages ofthesetwo
graph models, and we know that a bipartite graph [29 ℄ an model the more general multigraph and
hypergraph (see examplesofrepresentation ofhypergraph inFig.3(b) andof multigraphinFig.4(b)).
(a)Hypergraph (b)Bipartite graph
Figure 3 An example ofhypergraph andits bipartite representation.
(a)Multigraph (b)Bipartite graph
Figure 4 Anexample of multigraphand its bipartite representation.
We would like to present multiple relations between multiple verties, that is why we deided to
represent arelationshipbyanode.Our graph,denoted
G
,isabipartite graph andontainstwo typesofnodes:aategorynode,denoted
C
andarelationshipnode,denotedR
.Wegiveadenitiontoeahtypeof node inour graph
G
:Aategory node
C
represents the existeneof aset of ategories ina same environment, i.e. in thesame database.For aset ofategoriesK = {cat
i}
,its representation node isC
{cati} orC
K.C
K an own dierent attributes desribing some information ofK
suh as visual features or adediatedobjetdetetionalgorithm.
Similarly, a relationship node
R
represents atype
of relationship between ategories in a setJ = {cat
j}
, we denote this nodeR
Jtype. From aR
typeJ , we an learn all possible ongurations of relationshiptype
involved fromJ
. In our framework, we are espeially interested in spatialrelationships,for example,relationship
type
anbethetopologial spatialrelationship [10 ℄ that desribesdierent ongurations :disjoint,joint, overlaps,insides ,et.Now, graph
G
is dened as:G = (V, E)
(1)Knowing that
V
is the setof nodesinthegraph :V = {C
K} ∪ {R
J}
(2)Thisgraphis anundireted graph.Then,
E
istheset ofedges :E = {e
(CK,RJ)|∀C
K, R
J∈ V ∧ K ⊆ J }
∀C
K, R
J∈ V ; e
(CK,RJ)⇒ e
(RJ,CK)∀C
K, C
J∈ V ; ∄e
(CK,CJ)
∀R
K, R
J∈ V ; ∄e
(RK,RJ)(3)
Figure5 givesanexample of graph(at3 levels).
3.2 Attributes of a node
Ontheotherhand,our requirementisthatgraph-basedrepresentationmustbesimpletoompute,
leartounderstand,andextendibletorepresentanewomplexgeneralknowledge.Fromthisquestion,
we dene spei attributes assoiated to eah node, ategory or relationship, in setions 3.2.1 and
3.2.2.
3.2.1 Level attribute
To avoid building an unreadable graph, based on the idea that say that a omplex knowledge is
developedfromasetofbasiones,wesplitourgraphintomanydierentlevels.Agraphlevelindiates
thenumberofategoriesonerned:
|{cat
i}|
meaningardinalityofset{cat
i}
.Thus,eahlevelofgraphlevel0,1,and 2respetively.
is omposed of a set of
C
andR
nodesthat have the same|cat
i|
. Thus, we an onsider thelevel asan attribute of the node, denoted
lev
,that is dened as :lev = |cat
i| − 1
. Our idea is that a higherlevelnode an be built from lower level nodes. A low level node an onnet only to one higher level
byedges between
C
nodesoflower levelandR
nodesof higherlevel.Inonsequene, we redeneset
E
:E = {e
(CK,RJ)|∀C
K, R
J∈ V ∧ K ⊆ J ∧ (C.lev = R.lev ∨ C.lev + 1 = R.lev)}
(4)Note thatwe an expand thenumber oflevels inthe graph aswe need. Ifwe study
N
ategories,then at the level
l
, we an have in maximum (N−l)!N! nodesC
. A high number of levels an inreaseonsiderably thenumber ofnodesin the graph, however, inreality, we an nd manyategories that
never our together. For example, in [15℄, by studying
86
dierent entity ategories in a dataset ofheterogeneousontent,we found
879
ouplesof ategoriesthatneverourtogether amongaset7310
ofpossible oupleand only
38031
present triplets intotalamong102340
possible triplets.In Fig.5, we show the onept for a 3-level graph. Conretely, we an model unary, binary, and
ternaryrelationships withthis graph.