An a contrario space-time grouping framework for the detection of coherent motions

(1)

HAL Id: inria-00115435

https://hal.inria.fr/inria-00115435v2

Submitted on 11 Dec 2006

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub-

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non,

detection of coherent motions

Thomas Veit, Frédéric Cao, Patrick Bouthemy

To cite this version:

Thomas Veit, Frédéric Cao, Patrick Bouthemy. An a contrario space-time grouping framework for the detection of coherent motions. [Research Report] RR-6061, INRIA. 2006, pp.33. �inria-00115435v2�

(2)

inria-00115435, version 2 - 11 Dec 2006

a p p o r t

d e r e c h e r c h e

9-6399ISRNINRIA/RR--6061--FR+ENG

Thème COG

An a contrario space-time grouping framework for the detection of coherent motions

Thomas Veit — Frédéric Cao — Patrick Bouthemy

N° 6061

Novembre 2006

(3)

(4)

ThomasVeit, FrédériCao ,Patrik Bouthemy

ThèmeCOG Systèmesognitifs

ProjetVista

Rapportdereherhe n°6061Novembre200633pages

Abstrat: This paperpresentsamethodfor detetingindependenttemporally-persistent

motionpatternsin image sequenes. Theresultisadesriptionof thedynamiontentof

videosequenes intermsof movingobjets,theirnumber,imagepostionand approximate

motion. Itprovidesforeahdetetedmotionpatternaloaltrajetoryaswellasaondene

levelin thedetetion. Themethod isbasedonloal motionmeasurementsextrated from

shortvideosegments. Thesemeasurementsaremappedinanadequategroupingspaewhere

independenttrajetoriesorrespondtodistintlusters. Theautomatilusterdetetionis

handled in ana ontrario framework,whih is generaland involvesno parameter tuning.

Themethod wassuessfullyapplied to real videosequenes featuringrigidand non-rigid

moving objets, stati and mobile ameras, and distrating motions. The output of this

method ouldinitializetrakingalgorithms. Appliations ofinterestare robot navigation,

ar-driverassistane,videosurveillaneandativityreognition.

Key-words: oherent motion detetion, loal trajetories, a ontrario grouping, visual

motionanalysis

(5)

Résumé : Ce doument présente une méthode pourdéteter des motifs de mouvements

indépendantspersistantsauoursdutemps. Cetteméthodepermetd'obtenirunedesription

duontenudynamiqued'uneséquenevidéoentermesd'objetsmobiles: leurnombre,leurs

positions dans l'image et leurs déplaements. Chaque motif de mouvement déteté est

aratérisé parune trajetoireloale et un niveaude onane. La méthode s'appuie sur

l'aumulationde mesuresloalesdedéplaement surdessegmentsvidéoourts. Dans un

espaedegroupementsoigneusementhoisi lestrajetoiresindépendantes orrespondentà

desgroupesdistintsdemesures. Unalgorithmededétetionaontrariopermetd'extraire

esgroupesautomatiquement. Laméthodeaététestéeavesuèssurdesséquenesvidéo

réellesauxontenusvariés: objetsmobilesrigidesetnon-rigides,amérastatiqueoumobile,

présenedemouvementsparasites. Lesélémentsde trajetoiresextraitsparette méthode

peuventservir à initialiser de manièrerobuste des algorithmes de suivi. Les appliations

possiblessont la navigation en robotique, l'assistane à la onduite, la vidéo-surveillane

ainsiquelareonnaissanedeontenus.

Mots-lés : détetion de mouvements ohérents, trajetoires loales, groupement a

ontratrio,analysedumouvementvisuel

(6)

1 Introdution

1.1 Problem setting

Ageneralprobleminmotionanalysisistheearlyreliabledetetionofpieesoftrajetoriesof

movingobjetsin naturalimage sequenes. Auratelyandeientlysolvingthisproblem

isofruialinterestforappliationssuh asrobotnavigationandar-driverassistane(in-

volvingmobileobstaledetetionandavoidane),orvideo-surveillaneandhumanativity

reognition. Aordingto Ullman[1℄, themostfundamental questionswhen analysingthe

dynamiontentofavideosequeneare(ininreasingorderofomplexity):

1. Aretheremovingobjetsintheobservedsene?

2. Howmany?

3. Wherearethey?

4. Whatistheirmotion?

Themethod proposedinthispaperaimsatansweringthesefourquestions withinaunied

framework. The overall objetive is to detet temporally-persistent independent motion

patterns. Inother words,the goalis to detetone short-term trajetoryfor eah moving

objet of the sene. Based on harateristi image features, loal motion measurements

areextrated from theimage sequeneand mapped intoa well-speiedmotionspae. In

this grouping spae, independent objets moving along trajetories form lusters. These

lustersaredetetedautomatiallybymeansofaninnovativeaontrariolusterdetetion

framework. The involved luster detetion algorithm is fully automati and provides a

ondenelevelforeah detetedobjettrajetory.

It seems to us that there is a gap to be lled between two types of issues. On one

hand, there are motion detetion methods. Most methods are atually loser to hange

detetion,sinetheymakedeisiononveryloaltime intervals,withno realsearh ofany

spatio-temporal oherene [2, 3℄. As a onsequene, signiant movingobjets annot be

distinguishedfrom parasitial motion. Thetemporal ontentalone isusually verynoisy;

hene, loal spatial (and possibly temporal) regularityis usually introdued, whih is the

simplest mean to enfore temporal oherene [4℄. On theother hand, if the position of a

givenmovingobjetisknown,eientmethodsallowone totrakthem. Manyalgorithms

arevariations orextensions ofthe elebratedKalmanlter. Reentprogressbasedonthe

non-linearpartilelteringapproahledtoveryimpressiveresultsabletohandleolusions,

shapedeformation,et[5,6,7℄. Theweakpointofthesemethodsistheirinitializationwhih

isusuallysupervised.

The method proposed in this paper may be onsidered as addressing simultaneously

oherentmotiondetetionandtrakinitialization. Thepurposeistodeideupontheexis-

teneofsmallpieesoftrajetoriesonshortdurations(typially10or20frames). Detetion

thresholdsforextratingthesepieesoftrajetoriesareomputedautomatially. Itislear

that suh thresholdsexist also from apereptual point of view. As an example, aslowly

(7)

moving objethas to be observedfor along timeto be deteted. Hene,there should be

a relation between the size of an objet, its veloity, the duration of observation and its

detetability. Whendealingwithdigitalimagesequenes,detetabilityisalsoinuenedby

image quality. Themethod desribedin this paperusesadetetion priniple, intuitedby

HelmholtzandformulatedbyDesolneux,MoisanandMorel[8℄(alsofollowingworksbyAt-

tneave[9℄andLowe[10℄). Itstatesthatapartiularongurationispereptuallyrelevantif

itannotourbyhane,i.e.,itontraditsageneralrandomstrutureoftheobservations.

1.2 Overall strategy

Thepurpose of thisworkis to extrat geometrialevidene formovingobjetsfrom aset

ofsuessivedigitalimages(about10-20). Morepreisely,isitpossibletoprovethatimage

parts alongasequenedisplayloally aoherentmotion,and deneapiee oftrajetory?

Withwhihdegreeofondene?

The strategy is the following. First, loal motion measurements are extrated from

suessivepairsof images. These measurementsarebasedonharateristiimage features

suhassimilarityinvariantpieesoflevellines[11℄,SIFTdesriptors[12℄orKLTfeatures[13,

14℄. These featureshavetobeloal enough,beauseofpartial olusions,shadows,et. If

thedurationofobservationisshortenough,themotionofobjetsisapproximatelyretilinear

withaonstantveloity. Thisveloity,aswellasthepositionoftheshapeelementattime

t= 0îs,ⁱⁿ^this^simpleâse,ômpletely^determined^by^thedisplaementbetweentwoimages.

This resultsin a pointin R⁴^: ^two ^realôordinates ^for^the ^veloity ând ^two^for^the înitial

position. Now,ifthesepairsorrespondtothesamemovingobjetindierentframes,then

theorrespondingpointsform lustersin R⁴^. Âs âônsequene,^the ^detetionôf^piees ôf

trajetoriesresultsinalusterdetetionproblem.

LetusonsiderM ^data^points,X1^,^...,XM ⁱⁿR⁴^,^eahorrespondingtoaouple(initial position,veloity), possibly deteted at dierent instants. Following the same argument

asin [15℄, an aontrario method is adopted: assume all thepairs are asual, and do not

orrespondtoaoherenttrajetory. Then,itissoundtoassumethattheXi^areindependent andidentiallydistributedaordingtoaprobabilitydistributiontobespeied. Itisvery

unlikelythat animportantproportionofthe Xi^'sân^beôbservedⁱⁿ â^single^small^region

of R⁴^. ^Whenever ^this îs âtually ôbserved, ^then ^the ^hypothesis ^that ^the Xi âre ^random

is ertainly false, and someof them should be grouped. Natural questions arise, that are

answeredinthispaper: howmanygroupsarethere(ifany)? Whihgroupsarerelevant? Is

itpossibletoquantifythemeaningfulnessofagroupofpoints? Howtoseletamongnested

groups?

Theoutlineofthepaperisthefollowing.Setion2presentssomerelatedwork.Setion3

desribeshow to extrat loal motion measurements based onimage features and how to

map them in an adequate motion grouping spae. Setion 4 introdues the a ontrario

groupingmethodanddetailsitsappliationtothedetetionofoherentmotions. Setion5

experimentallyvalidatesthetheory. ConlusionandperspetivesaregiveninSetion6.

(8)

2 Related work

Dierent approahesto exhibit temporal motionoherene in image sequenes have been

developed. A rst group of methods, attempts to diretly analyze the harateristis of

motionovertimeorto extrat somestrutures from thespae-time volume dened byan

imagesequene. Aseond lassofmethods addressesthedetetionofoherentmotionsas

agroupingproblem. Mostof thesemethodslakaneientlusteringframework. Finally,

ourmethodsharessomeingredientswithStrutureFromMotionmethods,namely,theuse

ofimagefeaturesandlusteringalgorithms.

In[16℄,Wixsonproposestoaumulatediretionallyonsistentoptialow. Anestimate

ofthetotalimagedistanemovedbyeahpixelduringthesequeneenablestodisriminate

betweenobjetsmovingwithaonsistentdiretionandparasitialmotion. Grynetal.[17℄

havespeiedevenmorepreisemotiontemplates,drivenbytheappliation,inotherwords

tradinggeneralityforbetteromputationaleieny. Dierentmethodsattempttoanalyze

thespae-timevolumeofimagesequenes. Forinstane,RiquebourgandBouthemy[18℄as

wellasSarkaretal.[19℄lookformotionstrutures(typiallyalignments)inspatio-temporal

slies. The sametype of idea is used by Kornprobstand Medioni [20℄ where trajetories

are the result of a vote. Another approah to oherent motion detetion developed by

Laptev et al. [21℄ is to exploit spae-time interest points. Fousing on the lass of peri-

odimotionsenablesfor exampleto extratpedestriansin lutteredenvironments. Oneof

the mostdiult issuesin that ontext is theautomati omputation of robustdetetion

thresholds.

If loal motionmeasurements aresuitably parametrized, the detetionof independent

oherentmotionsanbeviewedasalusteringproblem. YuilleandGrzywaz[22℄proposed

alustering approahafter suitablyrepresentingvisualpatterns, andattemptedtolassify

thetypialongurationsofvisualmotion. Aomplexobservationwouldbeaombination

of these elementary motion templates, that should be deteted by a grouping proedure.

However, their work remains formal with no omputational theory. Burgi et al.[23℄ pro-

pose a Bayesian framework along with a generative model of trajetory. More reently,

Gaoet al.[24℄ workedon motiondetetionvia lustering. Motioninformationis extrated

using edge elements whih are groupedaording to spatial proximityand motion persis-

teneovertime. Thelusteringstrategyreliesonseveraluser-setparameters. Thisertainly

harmsthegeneralityofthemethod.

ThesimilarityoftheingredientsinvolvedinourmethodwiththoseinvolvedinStruture

From Motion (SFM) methods might be misleading. The fous of SFM methods is more

onharaterizingthe3Dgeometryofthesenethanondetetingoherentmotionpatterns

[25,26℄. Thepreseneofoneorseveralmovingobjetsisassumedandthereforethedetetion

issue is not addressed. Furthermore, the features deteted in the image sequenes need

to be traked through all the sequene [27, 28℄. This requirement is obviously diult

to meet in the presene of olusions or noisy image sequenes. Fatorization methods

usuallyrelyonspetrallusteringforthelusteringstep. Thislusteringmethod,basedon

algebraimatrixmanipulations, isknowntobeverysensitivetonoise. Othermethods rely

oniterativeoptimizationmethodsto buildlusters,forexampleExpetation-Maximisation

(9)

orK-means[29℄. These methods requirethenumberoflusters tobespeied. Moreover,

the results are sensitive to initialization. An alternative is to resort to model seletion

to determine the number of moving objets. In [30℄, a rank onstraint is developed to

estimatethenumberofmovingobjets. TorrandMurray[31℄proposeastohastilustering

method to group loal motion measurements from several moving objets based on 3D

geometry. Theyaddressthedierentissuesoflustering,namelylustervalidityassessment

and mergingoflusters. Their method relies onthe ombinationof several heterogeneous

riteriainvolvingseveralparameters.Theirmethodisbasedontwoframesandthelustering

isthereforeratherbasedonshapethanonmotionohereneovertime.

3 Image features and loal displaements measurements

Thefeatures tobeextratedfrom images mustbeloal(beauseofpossiblepartialolu-

sions),stable, and invariantenoughto thedeformationsanobjetmay enounter through

asequene(approximate rigidmotion, ontrasthange...). Dierenttypeof features meet

theserequirements:

SimilarityInvariantPieesofLevelLines (SIPLL)[11℄,

SIFTdesriptors[12℄,

KLTfeatures[13℄.

Thereaderisreferredtotheseartilesfortheexatdenitionandtheomputationofthese

features. Eah type of features has its advantages and drawbaks. The three types of

features tested dierin terms of invarianeto geometrial transformations, disriminative

powerandomputationalload. Thersttypeoffeaturesisaloalpiee ofontrastedlevel

lines(isophotes),asdetailedin[32℄. Themainadvantageisthatitsassoiatedrepresentation

isinvariantwithrespettoontrasthangeandsimilaritytransformations. Whentheimage

resolutionisneenough,thisrsttypeoffeaturesisauratesinelevellinesloallyoinide

withedges. Onthe otherhand, theomputationalloadis abitheavy. Besides, satisfying

the largest invariane group is useful when attempting to math images if there is no a

prioriknowledgethattheyhavesomeontentinommon. Whenmathingtwoonseutive

images in a video,requiring suh a degreeof invariane may be unneessary. The seond

type of features are SIFT desriptors [12℄. They are slightly less invariant than SIPLL,

andlessintuitivefrom ageometri pointofviewbutfaster toompute. Theyhaveproved

very eient for mathing multiple views of a single sene. Still in dereasing order of

omplexity and invariane are KLT features [13, 33℄ obtained by orrelation of pathes

aroundinterestpoints(Harris points[34℄ in theoriginal version). In ontrastwith SIPLL

and SIFT desriptors, the KLT extration frameworkinludes theomputation ofa loal

displaementvetor. Letuspointoutthatourdetetionmethodisindependentofthetype

offeaturesandouldthereforeeasilyadapttoothertypeoffeatures.

Given a pair of suessive images of the sequene at time instants t ^and t+ 1^, ^any

of thesefeatures enablesto omputeloal motionmeasurements. Inthe aseof SIPLL or

(10)

SIFTdesriptors,adisplaementmeasurementisobtainedbymathingafeatureintherst

frame with itsbest orrespondingfeature in the nextframe. Of ourse, when looking for

amath,thewhole imagedoesnotneedto beexplored. Sine objetdisplaementsinthe

image are limited(typially lessthan10 pixels betweentwoonseutiveframes), fousing

onaneighborhoodof thefeature positionin therstimage issuient. Forexample,itis

reasonabletorestritthemathingproesstofeaturesintheseondframewithinadistane

of20pixelsfromthepositionofthefeatureintherstframe. Now,thedierenebetween

theposition xt ât ^time înstant t ândxt+1 ât ^timeînstantt+ 1 ^provides^the displaement

v^. ^Fôr^KLT ^features, ^thedisplaement v îs ^diretly ômputed ^by ânoptimization proess involvingbothimage frames[13℄. Letusdene thevetor(x^ref, v)∈R⁴ ^byx^ref=xt−t v.

Byrst order approximation, the veloity v îs ônstant ând x^ref ^would ^be ^the^theoretial

initialpositionofthefeatureattimeinstantt= 0^. ^This^hypothesis^is^sound^if^the^duration

ofobservation isshort enough. Moreover,letus point outthat theaim is notto measure

auratelythe harateristisof motion,but only to robustlydetet piees oftrajetories.

Hene,thishypothesisdoesnotneedtobesatisedveryaurately.

Now,a partof thesame moving objet at dierent time instants, ordierent parts of

the same moving objet should lead to approximately the same values of initial position

andveloity. Therefore,loalmotionmeasurementsareaumulatedoverseveralsuessive

pairsof frames. The totalnumber of framesshould be largeenoughso that lusters on-

tainasuientnumberofdatapointsin order tobedeteted. Thetotalobservationtime

should remain low so that the rst order approximationon the trajetory remains valid.

Typially, the number of frames involved ranges from 3 to 30. Let us emphasize that a

givenfeaturedoesnotneedtobetrakedthroughalltheframes. Thismakestheproposed

methodrobusttonoise,appearanehanges,aswellaspartialandglobalolusions. Fig.1

shematiallydesribeshowloaldisplaementmeasurementsorrespondingtoobjetsfol-

lowing trajetorieslead to lusters in the four-dimensionalgroupingspae (x^ref, v)^. ^Loal

motionmeasurementsin the imagesorresponding to thesametrajetoryaumulate and

formlustersinthegroupingspae(x^ref, v)^. ^Fig.²^displays^thetwo-dimensionalprojetions of theouples(x^ref, v)∈R⁴ êxtrated ^from¹⁰ ^suessive^framesôf â^highway surveillane sequene. Themiddleplotorrespondstox^ref^,î.e.,^the^vertialôordinates^vs. ^the^horizon-

tal oordinatesof the theoretial initial position. The rightplot orresponds to thepolar

oordinates of v^, orientation vs. magnitude. Three lusters in R⁴ ^an ^be distinguished orresponding to the three moving objets that appear in the sene displayed in the left

image. Automatiallydeteting lustersin this four-dimensionalgroupingspaeresults in

detetingtheindependentmotionpatternsthataretemporallyoherent,inotherwordsthe

threemovingobjets. Loal motionmeasurementsorrespondingto thebakgroundofthe

sene are sattered in position and veloity diretion but highly onentrated at veloity

magnitude0. TheydonotformadistintlusterinR⁴^.

Inordertodealwithmobileameras,dominantmotionestimationandmotionompen-

sationareapplied. A generalandrobustdominantmotionestimationalgorithmis applied

[35℄. Thedominantmotionisidentiedwithameramotion. Thisidentiationispossible

under some hypotheses suh as the image size of the moving objets and the absene of

(11)

signiantdepthdisontinuitiesinthebakground. Thesehypothesesareusuallyveriedin

typialsurveillanevideos. Onetheameramotionisompensated,loalmotionmeasure-

mentsorrespondingto thebakgrounddisplayalmostnullveloityexatlyasin thestati

ameraase.

Sinetheomputationalloadofthegroupingproedurediretlydependsonthenumber

ofloalmotionmeasurements,disardingloalmotionmeasurementsthatobviouslybelong

to thebakgrounddramatially savesomputation time. Twosimplestrategies todisard

bakgroundmeasurementsan be adopted. If for eah image of the sequene adetetion

map is available that indiates whih regions of the image belong to thebakgroundand

whih regionsaremoving,onlyfeaturesorrespondingtomovingregionsanbeproessed.

Forexample,suhadetetionmapanbeobtainedbyapplyinganautomatimovingregion

detetionasproposedin[36℄. ThisstrategyispreferredwhenworkingwithSIPLLorSIFT

desriptors. Theother strategyonsists in disardingallfeatureswith anestimated inter-

frameveloitymagnitudesmallerthanagiventhreshold,typially1pixel. Thisthreshold

orresponds to theimage sampling rate andis notverydemanding. This seond strategy

is preferred when working with KLT features. Features remaining after disarding those

belongingtothebakgroundaretermedmovingfeatures. Whenappliedtomovingfeatures,

the task of the lustering proedure is to detetgroups of features orresponding to eah

objetmovingindependentlyandonsistentlyovertime. Asimilarbakgroundsubtration

strategyisadoptedin[28℄.

4 Coherent motion detetion by a ontrario lustering

Thissetionpresentsaneientlusteringalgorithmthat enablestoanswerthequestions

ofSetion 1.1 in a uniedframework. Letus onsider aset of points{X1, ..., XM} ⁱⁿ R⁴^.

Doesthissetontainanygroup? Howmany,andhowmeaningfularethey? Thisproblemis

oneofthenumerousformsoflusteranalysis. Whilemanylassialeienttehniques[37℄

propose sound lusterandidates, the abovequestionsdo nothaveadenitiveanswer. In

partiular,itisdiulttomakearobustdeisionabouttheexisteneofagroup(knownas

theproblemofvalidity),orwhetheritshouldbeutintosubgroupsornot. Thisispreisely

theproblemsthissetiondealswith. Someideaspresentedherehavebeensomehowinspired

by Bok [38℄ or morereently by Gordon [39℄. A parallel work [15℄ developsa theory of

grouping,but foraompletelydierent appliation,namely planarshapereognition. For

thesakeofompleteness,themain resultsofthistheory aredeveloped herein theontext

ofmotionanalysis.

4.1 Number of false alarms of a group and luster validity

ThefatthatsomeoftheXi^'s^may^beâ^group^revealsâ^lakôfindependeneofthesepoints.

Sinetheauseofthedependeneisunknown,modelingtheprobabilityofsuhaneventis

diult. Hene,theideaoftheaontrariodeisionisthat groupsanbedetetedaslarge

deviationsfromanindependenemodel. Letusintroduethefollowingbakgroundmodel.