HAL Id: inria-00115435
https://hal.inria.fr/inria-00115435v2
Submitted on 11 Dec 2006
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub-
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non,
detection of coherent motions
Thomas Veit, Frédéric Cao, Patrick Bouthemy
To cite this version:
Thomas Veit, Frédéric Cao, Patrick Bouthemy. An a contrario space-time grouping framework for the detection of coherent motions. [Research Report] RR-6061, INRIA. 2006, pp.33. �inria-00115435v2�
inria-00115435, version 2 - 11 Dec 2006
a p p o r t
d e r e c h e r c h e
9-6399ISRNINRIA/RR--6061--FR+ENG
Thème COG
An a contrario space-time grouping framework for the detection of coherent motions
Thomas Veit — Frédéric Cao — Patrick Bouthemy
N° 6061
Novembre 2006
ThomasVeit, FrédériCao ,Patrik Bouthemy
ThèmeCOG Systèmesognitifs
ProjetVista
Rapportdereherhe n°6061Novembre200633pages
Abstrat: This paperpresentsamethodfor detetingindependenttemporally-persistent
motionpatternsin image sequenes. Theresultisadesriptionof thedynamiontentof
videosequenes intermsof movingobjets,theirnumber,imagepostionand approximate
motion. Itprovidesforeahdetetedmotionpatternaloaltrajetoryaswellasaondene
levelin thedetetion. Themethod isbasedonloal motionmeasurementsextrated from
shortvideosegments. Thesemeasurementsaremappedinanadequategroupingspaewhere
independenttrajetoriesorrespondtodistintlusters. Theautomatilusterdetetionis
handled in ana ontrario framework,whih is generaland involvesno parameter tuning.
Themethod wassuessfullyapplied to real videosequenes featuringrigidand non-rigid
moving objets, stati and mobile ameras, and distrating motions. The output of this
method ouldinitializetrakingalgorithms. Appliations ofinterestare robot navigation,
ar-driverassistane,videosurveillaneandativityreognition.
Key-words: oherent motion detetion, loal trajetories, a ontrario grouping, visual
motionanalysis
Résumé : Ce doument présente une méthode pourdéteter des motifs de mouvements
indépendantspersistantsauoursdutemps. Cetteméthodepermetd'obtenirunedesription
duontenudynamiqued'uneséquenevidéoentermesd'objetsmobiles: leurnombre,leurs
positions dans l'image et leurs déplaements. Chaque motif de mouvement déteté est
aratérisé parune trajetoireloale et un niveaude onane. La méthode s'appuie sur
l'aumulationde mesuresloalesdedéplaement surdessegmentsvidéoourts. Dans un
espaedegroupementsoigneusementhoisi lestrajetoiresindépendantes orrespondentà
desgroupesdistintsdemesures. Unalgorithmededétetionaontrariopermetd'extraire
esgroupesautomatiquement. Laméthodeaététestéeavesuèssurdesséquenesvidéo
réellesauxontenusvariés: objetsmobilesrigidesetnon-rigides,amérastatiqueoumobile,
présenedemouvementsparasites. Lesélémentsde trajetoiresextraitsparette méthode
peuventservir à initialiser de manièrerobuste des algorithmes de suivi. Les appliations
possiblessont la navigation en robotique, l'assistane à la onduite, la vidéo-surveillane
ainsiquelareonnaissanedeontenus.
Mots-lés : détetion de mouvements ohérents, trajetoires loales, groupement a
ontratrio,analysedumouvementvisuel
1 Introdution
1.1 Problem setting
Ageneralprobleminmotionanalysisistheearlyreliabledetetionofpieesoftrajetoriesof
movingobjetsin naturalimage sequenes. Auratelyandeientlysolvingthisproblem
isofruialinterestforappliationssuh asrobotnavigationandar-driverassistane(in-
volvingmobileobstaledetetionandavoidane),orvideo-surveillaneandhumanativity
reognition. Aordingto Ullman[1℄, themostfundamental questionswhen analysingthe
dynamiontentofavideosequeneare(ininreasingorderofomplexity):
1. Aretheremovingobjetsintheobservedsene?
2. Howmany?
3. Wherearethey?
4. Whatistheirmotion?
Themethod proposedinthispaperaimsatansweringthesefourquestions withinaunied
framework. The overall objetive is to detet temporally-persistent independent motion
patterns. Inother words,the goalis to detetone short-term trajetoryfor eah moving
objet of the sene. Based on harateristi image features, loal motion measurements
areextrated from theimage sequeneand mapped intoa well-speiedmotionspae. In
this grouping spae, independent objets moving along trajetories form lusters. These
lustersaredetetedautomatiallybymeansofaninnovativeaontrariolusterdetetion
framework. The involved luster detetion algorithm is fully automati and provides a
ondenelevelforeah detetedobjettrajetory.
It seems to us that there is a gap to be lled between two types of issues. On one
hand, there are motion detetion methods. Most methods are atually loser to hange
detetion,sinetheymakedeisiononveryloaltime intervals,withno realsearh ofany
spatio-temporal oherene [2, 3℄. As a onsequene, signiant movingobjets annot be
distinguishedfrom parasitial motion. Thetemporal ontentalone isusually verynoisy;
hene, loal spatial (and possibly temporal) regularityis usually introdued, whih is the
simplest mean to enfore temporal oherene [4℄. On theother hand, if the position of a
givenmovingobjetisknown,eientmethodsallowone totrakthem. Manyalgorithms
arevariations orextensions ofthe elebratedKalmanlter. Reentprogressbasedonthe
non-linearpartilelteringapproahledtoveryimpressiveresultsabletohandleolusions,
shapedeformation,et[5,6,7℄. Theweakpointofthesemethodsistheirinitializationwhih
isusuallysupervised.
The method proposed in this paper may be onsidered as addressing simultaneously
oherentmotiondetetionandtrakinitialization. Thepurposeistodeideupontheexis-
teneofsmallpieesoftrajetoriesonshortdurations(typially10or20frames). Detetion
thresholdsforextratingthesepieesoftrajetoriesareomputedautomatially. Itislear
that suh thresholdsexist also from apereptual point of view. As an example, aslowly
moving objethas to be observedfor along timeto be deteted. Hene,there should be
a relation between the size of an objet, its veloity, the duration of observation and its
detetability. Whendealingwithdigitalimagesequenes,detetabilityisalsoinuenedby
image quality. Themethod desribedin this paperusesadetetion priniple, intuitedby
HelmholtzandformulatedbyDesolneux,MoisanandMorel[8℄(alsofollowingworksbyAt-
tneave[9℄andLowe[10℄). Itstatesthatapartiularongurationispereptuallyrelevantif
itannotourbyhane,i.e.,itontraditsageneralrandomstrutureoftheobservations.
1.2 Overall strategy
Thepurpose of thisworkis to extrat geometrialevidene formovingobjetsfrom aset
ofsuessivedigitalimages(about10-20). Morepreisely,isitpossibletoprovethatimage
parts alongasequenedisplayloally aoherentmotion,and deneapiee oftrajetory?
Withwhihdegreeofondene?
The strategy is the following. First, loal motion measurements are extrated from
suessivepairsof images. These measurementsarebasedonharateristiimage features
suhassimilarityinvariantpieesoflevellines[11℄,SIFTdesriptors[12℄orKLTfeatures[13,
14℄. These featureshavetobeloal enough,beauseofpartial olusions,shadows,et. If
thedurationofobservationisshortenough,themotionofobjetsisapproximatelyretilinear
withaonstantveloity. Thisveloity,aswellasthepositionoftheshapeelementattime
t= 0is,inthissimplease,ompletelydeterminedbythedisplaementbetweentwoimages.
This resultsin a pointin R4: two realoordinates forthe veloity and twoforthe initial
position. Now,ifthesepairsorrespondtothesamemovingobjetindierentframes,then
theorrespondingpointsform lustersin R4. As aonsequene,the detetionofpiees of
trajetoriesresultsinalusterdetetionproblem.
LetusonsiderM datapoints,X1,...,XM inR4,eahorrespondingtoaouple(initial position,veloity), possibly deteted at dierent instants. Following the same argument
asin [15℄, an aontrario method is adopted: assume all thepairs are asual, and do not
orrespondtoaoherenttrajetory. Then,itissoundtoassumethattheXiareindependent andidentiallydistributedaordingtoaprobabilitydistributiontobespeied. Itisvery
unlikelythat animportantproportionofthe Xi'sanbeobservedin asinglesmallregion
of R4. Whenever this is atually observed, then the hypothesis that the Xi are random
is ertainly false, and someof them should be grouped. Natural questions arise, that are
answeredinthispaper: howmanygroupsarethere(ifany)? Whihgroupsarerelevant? Is
itpossibletoquantifythemeaningfulnessofagroupofpoints? Howtoseletamongnested
groups?
Theoutlineofthepaperisthefollowing.Setion2presentssomerelatedwork.Setion3
desribeshow to extrat loal motion measurements based onimage features and how to
map them in an adequate motion grouping spae. Setion 4 introdues the a ontrario
groupingmethodanddetailsitsappliationtothedetetionofoherentmotions. Setion5
experimentallyvalidatesthetheory. ConlusionandperspetivesaregiveninSetion6.
2 Related work
Dierent approahesto exhibit temporal motionoherene in image sequenes have been
developed. A rst group of methods, attempts to diretly analyze the harateristis of
motionovertimeorto extrat somestrutures from thespae-time volume dened byan
imagesequene. Aseond lassofmethods addressesthedetetionofoherentmotionsas
agroupingproblem. Mostof thesemethodslakaneientlusteringframework. Finally,
ourmethodsharessomeingredientswithStrutureFromMotionmethods,namely,theuse
ofimagefeaturesandlusteringalgorithms.
In[16℄,Wixsonproposestoaumulatediretionallyonsistentoptialow. Anestimate
ofthetotalimagedistanemovedbyeahpixelduringthesequeneenablestodisriminate
betweenobjetsmovingwithaonsistentdiretionandparasitialmotion. Grynetal.[17℄
havespeiedevenmorepreisemotiontemplates,drivenbytheappliation,inotherwords
tradinggeneralityforbetteromputationaleieny. Dierentmethodsattempttoanalyze
thespae-timevolumeofimagesequenes. Forinstane,RiquebourgandBouthemy[18℄as
wellasSarkaretal.[19℄lookformotionstrutures(typiallyalignments)inspatio-temporal
slies. The sametype of idea is used by Kornprobstand Medioni [20℄ where trajetories
are the result of a vote. Another approah to oherent motion detetion developed by
Laptev et al. [21℄ is to exploit spae-time interest points. Fousing on the lass of peri-
odimotionsenablesfor exampleto extratpedestriansin lutteredenvironments. Oneof
the mostdiult issuesin that ontext is theautomati omputation of robustdetetion
thresholds.
If loal motionmeasurements aresuitably parametrized, the detetionof independent
oherentmotionsanbeviewedasalusteringproblem. YuilleandGrzywaz[22℄proposed
alustering approahafter suitablyrepresentingvisualpatterns, andattemptedtolassify
thetypialongurationsofvisualmotion. Aomplexobservationwouldbeaombination
of these elementary motion templates, that should be deteted by a grouping proedure.
However, their work remains formal with no omputational theory. Burgi et al.[23℄ pro-
pose a Bayesian framework along with a generative model of trajetory. More reently,
Gaoet al.[24℄ workedon motiondetetionvia lustering. Motioninformationis extrated
using edge elements whih are groupedaording to spatial proximityand motion persis-
teneovertime. Thelusteringstrategyreliesonseveraluser-setparameters. Thisertainly
harmsthegeneralityofthemethod.
ThesimilarityoftheingredientsinvolvedinourmethodwiththoseinvolvedinStruture
From Motion (SFM) methods might be misleading. The fous of SFM methods is more
onharaterizingthe3Dgeometryofthesenethanondetetingoherentmotionpatterns
[25,26℄. Thepreseneofoneorseveralmovingobjetsisassumedandthereforethedetetion
issue is not addressed. Furthermore, the features deteted in the image sequenes need
to be traked through all the sequene [27, 28℄. This requirement is obviously diult
to meet in the presene of olusions or noisy image sequenes. Fatorization methods
usuallyrelyonspetrallusteringforthelusteringstep. Thislusteringmethod,basedon
algebraimatrixmanipulations, isknowntobeverysensitivetonoise. Othermethods rely
oniterativeoptimizationmethodsto buildlusters,forexampleExpetation-Maximisation
orK-means[29℄. These methods requirethenumberoflusters tobespeied. Moreover,
the results are sensitive to initialization. An alternative is to resort to model seletion
to determine the number of moving objets. In [30℄, a rank onstraint is developed to
estimatethenumberofmovingobjets. TorrandMurray[31℄proposeastohastilustering
method to group loal motion measurements from several moving objets based on 3D
geometry. Theyaddressthedierentissuesoflustering,namelylustervalidityassessment
and mergingoflusters. Their method relies onthe ombinationof several heterogeneous
riteriainvolvingseveralparameters.Theirmethodisbasedontwoframesandthelustering
isthereforeratherbasedonshapethanonmotionohereneovertime.
3 Image features and loal displaements measurements
Thefeatures tobeextratedfrom images mustbeloal(beauseofpossiblepartialolu-
sions),stable, and invariantenoughto thedeformationsanobjetmay enounter through
asequene(approximate rigidmotion, ontrasthange...). Dierenttypeof features meet
theserequirements:
SimilarityInvariantPieesofLevelLines (SIPLL)[11℄,
SIFTdesriptors[12℄,
KLTfeatures[13℄.
Thereaderisreferredtotheseartilesfortheexatdenitionandtheomputationofthese
features. Eah type of features has its advantages and drawbaks. The three types of
features tested dierin terms of invarianeto geometrial transformations, disriminative
powerandomputationalload. Thersttypeoffeaturesisaloalpiee ofontrastedlevel
lines(isophotes),asdetailedin[32℄. Themainadvantageisthatitsassoiatedrepresentation
isinvariantwithrespettoontrasthangeandsimilaritytransformations. Whentheimage
resolutionisneenough,thisrsttypeoffeaturesisauratesinelevellinesloallyoinide
withedges. Onthe otherhand, theomputationalloadis abitheavy. Besides, satisfying
the largest invariane group is useful when attempting to math images if there is no a
prioriknowledgethattheyhavesomeontentinommon. Whenmathingtwoonseutive
images in a video,requiring suh a degreeof invariane may be unneessary. The seond
type of features are SIFT desriptors [12℄. They are slightly less invariant than SIPLL,
andlessintuitivefrom ageometri pointofviewbutfaster toompute. Theyhaveproved
very eient for mathing multiple views of a single sene. Still in dereasing order of
omplexity and invariane are KLT features [13, 33℄ obtained by orrelation of pathes
aroundinterestpoints(Harris points[34℄ in theoriginal version). In ontrastwith SIPLL
and SIFT desriptors, the KLT extration frameworkinludes theomputation ofa loal
displaementvetor. Letuspointoutthatourdetetionmethodisindependentofthetype
offeaturesandouldthereforeeasilyadapttoothertypeoffeatures.
Given a pair of suessive images of the sequene at time instants t and t+ 1, any
of thesefeatures enablesto omputeloal motionmeasurements. Inthe aseof SIPLL or
SIFTdesriptors,adisplaementmeasurementisobtainedbymathingafeatureintherst
frame with itsbest orrespondingfeature in the nextframe. Of ourse, when looking for
amath,thewhole imagedoesnotneedto beexplored. Sine objetdisplaementsinthe
image are limited(typially lessthan10 pixels betweentwoonseutiveframes), fousing
onaneighborhoodof thefeature positionin therstimage issuient. Forexample,itis
reasonabletorestritthemathingproesstofeaturesintheseondframewithinadistane
of20pixelsfromthepositionofthefeatureintherstframe. Now,thedierenebetween
theposition xt at time instant t andxt+1 at timeinstantt+ 1 providesthe displaement
v. ForKLT features, thedisplaement v is diretly omputed by anoptimization proess involvingbothimage frames[13℄. Letusdene thevetor(xref, v)∈R4 byxref=xt−t v.
Byrst order approximation, the veloity v is onstant and xref would be thetheoretial
initialpositionofthefeatureattimeinstantt= 0. Thishypothesisissoundiftheduration
ofobservation isshort enough. Moreover,letus point outthat theaim is notto measure
auratelythe harateristisof motion,but only to robustlydetet piees oftrajetories.
Hene,thishypothesisdoesnotneedtobesatisedveryaurately.
Now,a partof thesame moving objet at dierent time instants, ordierent parts of
the same moving objet should lead to approximately the same values of initial position
andveloity. Therefore,loalmotionmeasurementsareaumulatedoverseveralsuessive
pairsof frames. The totalnumber of framesshould be largeenoughso that lusters on-
tainasuientnumberofdatapointsin order tobedeteted. Thetotalobservationtime
should remain low so that the rst order approximationon the trajetory remains valid.
Typially, the number of frames involved ranges from 3 to 30. Let us emphasize that a
givenfeaturedoesnotneedtobetrakedthroughalltheframes. Thismakestheproposed
methodrobusttonoise,appearanehanges,aswellaspartialandglobalolusions. Fig.1
shematiallydesribeshowloaldisplaementmeasurementsorrespondingtoobjetsfol-
lowing trajetorieslead to lusters in the four-dimensionalgroupingspae (xref, v). Loal
motionmeasurementsin the imagesorresponding to thesametrajetoryaumulate and
formlustersinthegroupingspae(xref, v). Fig.2displaysthetwo-dimensionalprojetions of theouples(xref, v)∈R4 extrated from10 suessiveframesof ahighway surveillane sequene. Themiddleplotorrespondstoxref,i.e.,thevertialoordinatesvs. thehorizon-
tal oordinatesof the theoretial initial position. The rightplot orresponds to thepolar
oordinates of v, orientation vs. magnitude. Three lusters in R4 an be distinguished orresponding to the three moving objets that appear in the sene displayed in the left
image. Automatiallydeteting lustersin this four-dimensionalgroupingspaeresults in
detetingtheindependentmotionpatternsthataretemporallyoherent,inotherwordsthe
threemovingobjets. Loal motionmeasurementsorrespondingto thebakgroundofthe
sene are sattered in position and veloity diretion but highly onentrated at veloity
magnitude0. TheydonotformadistintlusterinR4.
Inordertodealwithmobileameras,dominantmotionestimationandmotionompen-
sationareapplied. A generalandrobustdominantmotionestimationalgorithmis applied
[35℄. Thedominantmotionisidentiedwithameramotion. Thisidentiationispossible
under some hypotheses suh as the image size of the moving objets and the absene of
signiantdepthdisontinuitiesinthebakground. Thesehypothesesareusuallyveriedin
typialsurveillanevideos. Onetheameramotionisompensated,loalmotionmeasure-
mentsorrespondingto thebakgrounddisplayalmostnullveloityexatlyasin thestati
ameraase.
Sinetheomputationalloadofthegroupingproedurediretlydependsonthenumber
ofloalmotionmeasurements,disardingloalmotionmeasurementsthatobviouslybelong
to thebakgrounddramatially savesomputation time. Twosimplestrategies todisard
bakgroundmeasurementsan be adopted. If for eah image of the sequene adetetion
map is available that indiates whih regions of the image belong to thebakgroundand
whih regionsaremoving,onlyfeaturesorrespondingtomovingregionsanbeproessed.
Forexample,suhadetetionmapanbeobtainedbyapplyinganautomatimovingregion
detetionasproposedin[36℄. ThisstrategyispreferredwhenworkingwithSIPLLorSIFT
desriptors. Theother strategyonsists in disardingallfeatureswith anestimated inter-
frameveloitymagnitudesmallerthanagiventhreshold,typially1pixel. Thisthreshold
orresponds to theimage sampling rate andis notverydemanding. This seond strategy
is preferred when working with KLT features. Features remaining after disarding those
belongingtothebakgroundaretermedmovingfeatures. Whenappliedtomovingfeatures,
the task of the lustering proedure is to detetgroups of features orresponding to eah
objetmovingindependentlyandonsistentlyovertime. Asimilarbakgroundsubtration
strategyisadoptedin[28℄.
4 Coherent motion detetion by a ontrario lustering
Thissetionpresentsaneientlusteringalgorithmthat enablestoanswerthequestions
ofSetion 1.1 in a uniedframework. Letus onsider aset of points{X1, ..., XM} in R4.
Doesthissetontainanygroup? Howmany,andhowmeaningfularethey? Thisproblemis
oneofthenumerousformsoflusteranalysis. Whilemanylassialeienttehniques[37℄
propose sound lusterandidates, the abovequestionsdo nothaveadenitiveanswer. In
partiular,itisdiulttomakearobustdeisionabouttheexisteneofagroup(knownas
theproblemofvalidity),orwhetheritshouldbeutintosubgroupsornot. Thisispreisely
theproblemsthissetiondealswith. Someideaspresentedherehavebeensomehowinspired
by Bok [38℄ or morereently by Gordon [39℄. A parallel work [15℄ developsa theory of
grouping,but foraompletelydierent appliation,namely planarshapereognition. For
thesakeofompleteness,themain resultsofthistheory aredeveloped herein theontext
ofmotionanalysis.
4.1 Number of false alarms of a group and luster validity
ThefatthatsomeoftheXi'smaybeagrouprevealsalakofindependeneofthesepoints.
Sinetheauseofthedependeneisunknown,modelingtheprobabilityofsuhaneventis
diult. Hene,theideaoftheaontrariodeisionisthat groupsanbedetetedaslarge
deviationsfromanindependenemodel. Letusintroduethefollowingbakgroundmodel.