• Aucun résultat trouvé

GESTURE RECOGNITION AS A MEANS OF HUMAN- MAcHINE INTERFACE

N/A
N/A
Protected

Academic year: 2022

Partager "GESTURE RECOGNITION AS A MEANS OF HUMAN- MAcHINE INTERFACE"

Copied!
204
0
0

Texte intégral

(1)
(2)
(3)
(4)
(5)
(6)

GESTURE RECOGNITION AS A MEANS OF HUMAN- MAcHINE INTERFACE

By

@RODNEYD.HALE,B.ENG.

A THEsISSUBMITI'ED TO THE SCHOOLOF GRADUAIE STUDIES IN PARTIAL FuLfILMENT OF THE

REQUlREMENTS FOR TIlEDEGREE OF MASTER OF ENGINEERING

FACULTV OF ENGINEERING AND APPLIED SCIENCE MEMORIAL UNIVERSITY OF NEWFOUNDLAND

JULY,1998

ST.JOHN ' S NEWFOUNDLAND CANADA

(7)

.+.

National Library ofCanada Acquisitionsand Bibliographic 5e rvices 39SWllIinglonSIrINl OlIawaON K1A0N4

c...-

Bibliotheque nalionale du Canada Acquisitionset services bibliographiques 395... W.uinglon

= O N K1AON"

The author has granteda non- exclusive licence allowing the NationalLIbraryofCanada to repr oduce, loan. distribute or sell copiesof this thesisinmicroform, paperorelectronicformats.

The authorretains ownership of the copyrightinthis thesis.Neitherthe thesis nor substantial extracts fromit may be printed or otherwise reproduced withoutthe author's permission.

L'auteura acccrdeunelicence non exclusivepermettant

a

la Bibliothequenationaledu Canadade reproduire, prsrer,distnbuer au vendre des copiesde cettethesesous laforme de microfiche/film,.de repr oductionsur papier ausur format

electronique. .

L'auteurconservelaproprietedu droitd'auteur quiprotegecettethese.

Ni la thesenidesextraits substantiels de celle-cine doiventwe imprimes ou autrementreproduitssans son autorisation.

0-612-36 129 -2

Canada

(8)

Abstract

The development of areliable multi-modalhuman-machine interfacebasmanypotential applications.The interface with a personal computerhas becomevery common yet many disabled users havelimi ted access due10the restrictiveness of the current interface.An improved interface wouldimprovethequality oflife for disabled users andbas applications in conuolling machinery in anindustrialsettin g. Many different types of gestures ranging from head gestures, headpointing.hand and arm gestures are being investigated. A wide variety of classification techniques are available.These techniques rangefrom simpleclustering routinestocomplexadaptive routines.This work compares therecognitionresults of four pattern recognition techniques, the k-nearestneighbor, a Maha.lanobisdistance classifier,a rulebased classifier and hidden Markov models.The techniqueswere tested on a set of six hand gestures captured using The Flock of Birds data collectionsystem. The best average recognitionresult was 97"/0obtained from the k- nearest neighborclassifier,the Mahalanobisdistanceclassifierhad an averagerecognition rateat 92%, the rulebased classifierhadan average recognition rateat 89%and the bidde nMarkov modelsbadthe lowest average recognition resultsat 83%. The hidden Markov modelsare the most complexof thefour techniques studied. Althoughthe averagerecognition resultswerelower.they are richinmalhematical structure and can be usedto model very complex observations.

(9)

Acknowledgments

The pursuit of graduate work has many challenges, both financialandacademic. The completion of this workwasmade possibleinpartby the help of colleagues, friends, and family. I would specifically liketothankDr.Ray Gosine for his academic supervision and financial support, Dr.John Molgaard for his academic supervision, the Facultyof Engineeringand Applied Scienceand the School of Graduate Studies for partial financial support. lowe many thankstomycolleagues, including Chris Fowlerand Blair Carroll.

who provided constant support and tolerance.Finally,I would liketothankJennifer and myfamilyfor continued supportinmy pursuits.

iii

(10)

CODtents ListofFigtns....

ListofTahles...

Listof Symbols .

Chapter1-lntroduction .

1.0Back gro und .

1.1ScopeofResearch . 1.2Thesis Layout . Chapter2-Litera ture Review...

2.0lntroduc tion•. . . . .. . 2.1GestureRecogni tio n . 2.2SpeechRecogniti on . 2.3Character/ScriptRecognition

vii viii

. ix

.. . . .... .... .. ... . . ..1 ..•. . ... .... . . .. • •1 ....3 ....3

....5 ...5

. 5

. 11

...13

Chapter3 - PatternRecognitionSystems .17

3.0Introdu cti on . .. 17

3.1Gesture Recognition Systems . . 17

3.l.1Transducer.. .. . .•. .18

3.L.2Feature extraction/segmentation ... . 19

3.1.3Gesture recognition. . . .. • •... ... .. .... .. . 20 3.1.4Action .. .. . . ... . . ... . ... ..•. .,. . . .. . •.20

32TypesofGestures-StaticandDynamic 20

3.3TreIDOrFilter. . .... ... .. ... .. . ... ... ..••21 3.3.1Optimizing a TremorFilter... .... .•.... . ... 22

3.4 Wmdowing-Feature Extractionand Segmenution 25

3.5Feature Space Reduction.. .26

Chapter4 -Techni ques UsedinGesture Recognition. ... ... ... .29

4.0lntrodu ction ... ...•.. .. .. .29

4.1Hidden MarkovModels. .. ... . . . • . .. .29 4.1.1ADescription of Hidden MarkovMod e ls.. . ... . ..30 4.1.2 A Mathem aticalDescription of Hidden Mark o vModels. .3 1

4.1.3 Recognition. . 34

4.1.4Mod el re-estimation, training...•. .34

4.2The NearestNeighbor Algorithm •... ... . . . 37

4.3MahaIan ob isDistan ce Classifier 39

4.4VectorQuantization... . .••... . 40 4.4.1Choo singanInitialReproductio n Alpbabct. . 43

iv

(11)

4.5Rule-BasedGesture Recognition. . . . .• . . ... . ... ... . 44

4.5.1EstimatingtheUnderlying PDF , 44

4.5.1Rule Generation . ... ...•... ...48

4.5.3An Illustrative Exampl e _ _....•51

Chapter5• ApplicatioDtoHand GestureRecognition•...•...__ 55 5.0Introduction...•._ _ ... . ... . _ 55

5.1DataCollection _ 55

5.1.1FOBCalibration . . ... . . • . . . ... . 57 5.2 Gesture Segxnentatioo _. .. .. . ... . . .. .... . . ... . .••61

5.3Feature Extraction . .62

5.4 K· Nearest Neighbor. ... .. .65

5.5 MahalanobisDistanceClassifier.. . .. •69

5.6Rule BasedRecognition

n

5.7HiddenMarkovModels.. ••. .._ 80

5.8Discussion•... . . . 86

Chapter6-ConclusionsandFuture Work .. . 6.0Conclusions••.

6.1 FutureWork...

References.

AppendixA - PDFEstimate Plots....

Appendix B-Matlab ScriptFiles . Bibliography ... ...•.

. 90

. 90

..94 .. ...95 . .•98 ...117 ...184

(12)

Listof Figures

Figure 3.1-GeneralizedGesture Recognition System...••• . 18 Figure3.2-Tremor Filter .• • . . •.. . . . ... .. • • .. ... . .. .••.••. 21 Figure3.3-ErrorProbabilityinSingleThreshold Movem ent Dctcction... . . •.. ..23 Figure 3.4 •ErrorProbabilityinDualThresholdMovement Detection 24 Figure 3..5-Wmdowinglfeatureextraction.. . .. .... . . .... . . •... .. .. . . . ....25 Figure4.1 - GeneralizedHMMArt:hitecture(4states).. • •••.•... . .•.32

Figure4.2a-Samplingand Quantizationofa Signal 41

Figure4.2b-Vector Quantization..•. . ... .•... . . •41 Figure4.3-Probabi lity DensityFunctionofSingle-mode.SingleClass and Selection

Windowfor RuleGeneration. . ....• •. ... . ... . . ...49 Figure4.4-AGeneralized Connectivity Diagramfor RuleGeneration 50 Figure4.5 -Probability Density Reconstruc tionofDataSam pled from the Normal

Distribution.•...•... . 52

Figure4.6 •ProbabilityDensityFunction of aTwo-mode.Single Class.TwoFeature

~. .53

Figure4.7 -ContourLinesatThresholdPlanewithOverlayingRuleWindows 54 Figure5.1- Data collectionsetup . ... ... . . . .... .. .. ..•56 Figure 5.2 - GestureTypes. .. .. . . .... •... . .•••... . ...•..57 Figure5.3- Dynamic Cali bratio nTest ... ..•58

Figure 5.4 - Dynamic Calibration; zoom image . .. . •59

Figure 5..5 -Normal Probability Plot.Gesture3.FeatureI.••••.. ... .. . ...71

Figure 5.6-Normal Probability Plot.Gesture3.Feature 2 72

Figure 5.7-pdfestimate;gesture3(features2&3)...•.•.... ...•73 Figure5.8- ThresholdIntersecti on•.• ...•.... ... ... .. . ....•.75 Figure5.9-pdfestimate;.gestureI(featuresI&.3).•••... ... ... . ... . .•....76 Figure5.IO-pdfestimate; gesture 2 (features I &2) 77 Figure5.11- pdfestimate;gesture6 (features I & 3)...••... . ... . . ..••..

n

Figure5.12- pdfestimate; gesture 5 (features1&3)..•... . ... . ... ..78 Figure 5.I3-pdf estimat e;gesture 5 (features 2&3) 79 Figure 5.14- ObservationSequenceComparison; gestures3&4. . ... ... .. •84 Figure5.15 - ObservationSequenceComparison;gesmres 5&6... .... ....• .. . ..85

(13)

Table5_1-CalibrationResults .... .• .• . ...._.. . 60 Table 5.2 - Confusion Matrix; KNN -1c=1...•...•. ••..•.•.. _ 67

Table 5.3-Confusion Matrix;KNN-k-3 . 67

Table 5.4- Confusion Matrix; KNN -Ir-=5.• • • • • • • •••••• • •. • ••_•_•• .•.••• •• •.68

TableS.S- Confusion Matrix;Mahalaoobisdistance 70

Table5.6-Confusion Matrix;Rule based recognition 76

Table5.7-Confusion Matrix;HMM - N=8.Q=8 _.__ 83

Table5_8-Recogni tionResults (per gesture) _ _ _ . "87

Table 5.9-Average RecognitionResults(allgestures)..•.... •... ... ...88

vii

(14)

FOB FIFO T N pdf

·.P r

P c HMM A

.

BN Q ),

o

P(QlM)

'"

P ,

t, ft (,; 6 V P, d<Y.y) N, K, m, q

A"

S d(x,t) A, P(A,.J D.

List ofSymbols

Flock of Birds FirstInFirst Out filter overalltremor level buffer length probability density function probability error feature vector correlation covariance correlation matrix HiddenMarkov model state transition probability matrix current state probability matrix initial stateprobabilitymatrix number of modelstates number of possible observation symbols Hidden Markov Modelparameterdescription observationsequence

probability of 0given M

probability of first t symbols given 0 and M probability of symbols t10Tgiven the model state, 0 and M stateprobability given 0 and M

re-estimation of1t

consecutivestate probabilities given 0 and M re-estimationof elementsof matrix A re-estimation of elementsof matrix B volume

Parzen density estimate distance function total number ofsamplesin class i class covariance matrix classmcan

Nleve l k-dimensional mapping reproduction alphabetor codebook partition

mapping distortion

initial reproduction alphabet orcodebook minimum distortion partition average distortion

viii

(15)

asctofm features

multivariateconditionaldensityfunction given class Cj windowheight

stepsize

ix

(16)

C HAPTER 1

INTR ODUCTI O N

1.0 BatkgtouDd

Researchersforanumber ofyearshave beensearchingforalternate improved means of human-machineinteractio n.While acommon human-machineinterfacetodayis the interfacewith a personalcomputer,thecomm unica tio ns link between the user andthe processor isstill quiterestrictive.his argued that the full potentialof using personal computershasDOtbeenrealized dueinpart to the restrictions associatedwith the common interfac essuch as keyboardsandmice.Despitemany advancesincomputi ng power, the user'smain interface hasDOtchangedsignifican tlysincethecartytypewriters.

lmprovementsintheinterfacewith acomputer can benefit bothable-bodiedanddisabled persons.Theconven tional keyboard and mousecomputerinterface can be very difficult tooperateif onehas even a mildmotorcontroldisability.The developmentof an improvedinterface wou ld not only improvethe access.butwouldreduce thehandicap of not havingaccess to a computer,andtherefore significantlyimprove productivi ty.

(17)

lfweconsider the possibilityofcontrollingother machinesthrough an improved human- computer interface.thepossible applicationsbroadensignifican tly.The improved access toconventional computer applicati onsrelated 00office work DOw extendstoinclude the control cfrobotic workcellswhichcanenhancethequality oflife fordisabledpersons, and tothe contro lo{mac:bineryinharshor dangerous work places providinganobvi0U5 improvementinwork safety.

Researchershave been studyingmanydifferent hardware and software arrangements. Thehardware include systems for trackinghand/ fingergestures , head tracking,eye-gaze tracking and head pointing.Thesehardwaresystems range from completelynonintrusive trackingsystems,whichinclude video tape tracking similar tosomegait analysis systems 00attachingsensors10 the subject's head orband.Eachof these systems is more orless adaptabletodifferentapplicationsdependingonthe requiremen ts oCthatparticular application. Generally, iftrae:kingisrequired overafairlylargearea,motion trae:king with attachedsensors can becumbersome.However.if precise motiontrackingover smallranges is required, attachingsensorscan providelargeamountsofprecise data.

The variationinthe softwaresystems beingdevelo ped is even morediversethanthe hardwaresystems. Thereareseveral issues which mustbeaddressed bysoftware designers when buildinga human-machine interface. Signal Processing techniquesused forgesture segmentationanddata filtering must be addressed.Thetypeof hardwareused largelydictat esthe approach;video tracking requiresimage processing techniqueswhile

(18)

attachedsensortJackingrequiresthe analysis ofstreamsof geometricanddynami cdata.

Thenext problemischoosingan applicablepattern recognition techniq ue.Thereare manyoptionsandthechoicedependsonthe sophisticationoftbe system. Thecommon approach istobuild asystem.wbich canbeeasily transferredtoa widerangeofusers. Thisrequirementinvolves designingadaptiv esoftwaresystems.,makingthesystem complex.

Despite the amount of researchoverthepast years. there are currentlyno commercial systems capable offulfillin gthe variousdeman ds foranimp roved human-m achine interface .Systemshavebeen developed for specific users buttheseare generally expens ive and noteasily transferred to others.

1.1ScopeorResn.rclI

Thegoalofthisresearchistooutlinesomeof the potentialpatternrecognitio n techniques applicableto asensorbasedgesturetrackingsystem.Four techniques aredescribed and tested on aset of six band gestures.Recognition results for these fourtechniq ues are compared.Abrief discuss ionofsomeof theimplementation issuesrelating tothese pattern recognition techniques is included.Allgestur eswereanalyzedoff-lineand no attempt wasmadeto optimizeany oftberecognition systems .

1.2ThesisLayoul

The following section,chapter 2,includes a discussion of relevantpublicationsinthe area

(19)

ofpattern recognition. Specifically,literaturerelatedtothe areas of gesture, speech, word andscript recognition are included Chapter3isan introductiontothe major components of a gesture recognition sysIem.Chapter 4 providesadetaileddescription of thefour pattern recognition techniquestestedThetestingprocedureandrecognitionresultsfor the six gestures studied arcdescnbedinchapler5, followedby concludingremarlcsin chapter 6.

(20)

CHAPTER 2

LITERA TURE REVIEW

1.0Introduction

Throughouttheresearchcomm unity there is a growingnwnberoCproj ects exploring gesture, speech,andcharact er recognitio n. Thischapter will provid ean overvi ew of someof lhe researchperformedinthefield ofpattemlgesture recognition. specifically research applicabl e tothe developmentofa human-m achine interface.

2.1Gesture RecogaidoD

Muchresearchhas focusedon recognizing gestures performedbydifferent partsof the humanbody. 'Ibe gesturetypesstudied have varied considerably,ashavethe techniques usedforrecognition.Researchers at the UniversityorOxford.haveinvestigated recognizingarmgestures using accelerometers,The goalistousenaturalarm gestures, meas uredusingaccelerom eters.[0aidpeoplewith athetoid cerebral palsywhohave impaired speech andmolarcontrol (Hanington etaI.•1995).

Sixsingleaxisaccelerometersare attachedtothe subject'sforearm torecord thegesture

(21)

movement. Accelerometers havebeenchosenfur peoplewithathetoid cerebralpalsy who have difficulty targeting specifically withlimbs but have sufficientcontrol to make dynamic gestw'esthatare recognizable.Thecomputer recognition uses template matching which measures the difference between thetestgestureand the reference templates. The classccrrespcndiegto thereference templatewhichbestmatches the test gestureis recognized. Initial resultswithable-bodiedUSCf$familiarwith the systemhad recognitionrates around 95%andapproximately80% for theable-bodiedusers unfami liarwith the system.Resultsfrom users with athet oid cere bral palsyhad lower recognition ratesat approximat ely 60% (Harrington etal.,1995).

Takakashi etaI.at the TsulcuhaResearch Center,have proposed a spottingalgorithm to recognizethe meaning ofhwnangesturesfrommotion images.Hwn angesturesare observed as a sequenceof images througha camera, and a spatia-temporal vector field (edge feature images) is derived. To make the features robust, a spatial-reduction, temporal-averaging operation,andsaturation open.tionisadaptedto each vector element (Takahashi etal.,1994).The system consists ofan algorithm which extracts spatial- temporal vectorfields(composed ofthrecdimensional edges)frominputimage sequencesandmakes standard sequencepatterns thatcorrespondto gestures. The recogniti onprocess consists of spotting recognition by Conti nuousDynamic Programming.Some experiments havebeen performedtostudytheinfluence ofclothing andbackground variability. The initialresults indicatethemethod is robustagainst variationsinclothing andbackgro und.

(22)

BirkandMoeslundat the Aalborg Universityin Denmark haveinvesti gat ed using principal componentanalysisas a means ofhandgesturerecognition(Birk &Moeslund, 1996).Atrainingset of lowresolution imagesfor25lettenfromthe hand alphabet were collectedandanalyzed.They report off-linerecognitionresultsat99%.No recognition rateswen:quoted forthe online implementation.Tbegestures studied were static gestures where the onlyobject in theimage was theusers'band.

Astandard formof hum an-buman communicationfor peoplewithhearingdisabilitiesis the useofthe AmericanSignLanguage.Thislanguage contains approxim ate ly6000 gesturesrepresen tin g common wo rdsand anoptionof finger spelling for oth erless comm on words.Stamer and Pentlandat the MassachusettsInstituteofTec:hnology have investigated usingvideo capture forreal-timerecognitionof a subset ofgestures from the American SignLanguage(Stamer&Pentland. 1995).Gesturesare recognized using a set of Hidden Markovmodelswhichdo not explicitlymodeltheindividual fingers.

Twoexperimen tswereperformed usingatest set of500sentencesand a40gesture or wo rdlexicon.Enthe firstexperim enttheuserwore coloredgloves.,ayellowglov e onthe righthand,and a red gloveon theleft hand. Thevideoimageswere of the wholepenon and not specificallythe bands.Wearingthe colored glovesmade locatingand segmentation of thehandsless complicated.This experimentresultedinarecogni tio n me of96%. The secondexperimentinvolv ed videocaptureof the samesetof gestures withoutthegloves.Enthis experiment, the hueand saturationspecificallyassociated with

(23)

skin tone was used10locate and segment the hands. Recognition rates for this experiment were somewhat lower at83%(Stamer&Pentland, 1995).

Kimat the ffiM Research Center, has investigated the on-line recognition of band gestures using feature analysis(Kim.1988).The goal of theresearchwas to improvethe use ofspreadsheet applications by providing a gestural means ofcommand control. A number of gestures arerequiredfor a single command. Thegesturedata is filtered in a number of stages and finally directional features areusedfor recognition.Testingwas perfonnedon 18 gestures collected from twenty subjects with a reported recognition rate of about80%(Kim,1988).

Brown et al. have developed an eyegaze and headpointing system to aid persons with physical and/or mental disabilities to communicateina learning environment. The goal of the project was to determinefeatures, which are currently unavailable,in eyegazelheadpointing assistive technology.The project designed and developedeyegaze and headtracking hardware and software based on proven educational and training theory of peoplewith mental. retardation (Brown et al.,1991).Testing was performed on subjects with varying degrees of mental retardation and physical disabilities.The results with this new eygazetechnique are much improved from earlier versions but is still somewhat primitive (Brown et al.,1991).

Shein etaI.at the Hugh MacMillan Medical Center,have developed. a software

(24)

application that displaysan on-screenkeyboard. WIVIK (Windows Visual Keyboard) (SheinetaI.•1991).The user can entertextusing the WIVIK keyboard whichis displayed in a movable,re-sizeablewindow,with anypointing devicethatemulatesa Microsoft Mouse.Thepossibilitiesforthe pointing device includetouchscreensand headpointing.Allwindow's optionsare availableand the keyboard canbe customized.

No testing was perfonnedusing the WIVIK.In-housetechnical evaluationshavebeen carriedout to ensure the WIVIK does function as desired (Sheinetal.,199 1).

Tew and Gray, with the RehabilitationTechnologyResearchGroup,University of York, have used an electronicpointing deviceto emulateseveral mouse operations(Tew&

Gray,1993).The pointingdevicereplacesthe mousemovements and thespecial gesture movements of the pointer replacethemanualoperation of themousebuttons.Thespecial gesture movementsreferredtoarea leftbuttonclick,a rightbutton click, holddown mousebutton, cancellastgesture, pause and stationary pointer.The dynamic programming gesture recognitionalgoritlun utilizes a referencetemplate.Theincoming signals arecomparedto asetof templates.Tests were conducted on 10 normalsubjects with recognitionratesrangingfrom96-99"10for the various gestures observed.

Harwin, at the University of Cambridge, has used The Polhemus 3Space Isotrackto

capture the headgesturesofsubj ects withcerebral palsy.Thesubjects spelledwords, with theaid of avirtual.head stick whichconveyed their intention.Harwin then recognizedthese gestures using HiddenMarkovmodels.Testing indicated recognition

(25)

rates were around 90% (Harwin, 1991).

Perricos, also at the University of Cambridge,bas also used the Polhemus 3Space Isctrackto capture the headgesturesof subjectswith cerebral palsy (penicos, 1995). The subjects performed a series ofeight gestures which could be used to emulatemouse movements and mouse button clicking. Perricos used Dynamic Time Warping to recognize these gestures.Testingindicatedrecognitionrates between 85 and 90010.

Cairns,at the University of Dundee, compared the recognition rates of Dynamic Programming techniques, Hidden Markov models,and Artificial Neural Networks for hand and head gesture data collected using thePolhemus 3SpaceIsotrack (Cairns ,1993).

Able-bodied subjectsperfo nn ed five handgestures and the disabled subjectsperfo nn ed gestures theyfound easy to make. The recognition ratesusing theArtificialNeural Networks on data collected from able bodied subjects were not encouraging and further testing was not performed .Thetwo dynamic recognitionmethods,Dynamic Programming and hidden Markovmodels,performed fairlywellondata collectedfrom disabledUSeBwith 77% and70% recognition rates respectively.

A wide variety of systems have been developed andtested with some success in laboratory settings.There is howeveragap in theresearch which 1inks the many techniqu esto a system which is applicableto a wide variety ofpco plewith varying needs.

10

(26)

1.1SpeecbRec:oguitiOD

Researchershave been searchingforafastand reliablemeans ofspeech recognitionfor a nwnberof years, One of the more obvious uses ofa speechrc:cognitioninterface isthe Castandeasy access tocomp uterwordprocessing applications. Other uses could include an improved means of access formany disabled people.Commercial speechrecognition systems are available and computersystems canDOWbepurc hased withspeech recognitioncapabilities.Many ofthese systems require excessive demands on process ing capabilities makin g practicaluse imposs ible. These systemsaregeneral ly subject to interferen ce from back groundnoi seandhavelimited vocabularies. Muchresearc his still underw aytoimprove these systems.

Companies producing popularsoftware packages like Miscosott- Windows are researching speech recognition techniquesas additions to theirplatform packages. Huang etal.withtheMicroso~corporation have been refiningandextending Sponx-D technologies since199 3 in order to develop practical speech recognition atMicroso~.

They have developed Whisper (Wmdows HighlyIntelligent Speech Recognizer)which offers speechinput capabilities andcanbescaled tomeet different computer platfonn configurations(Huang etal., 1995).Some features ofthe system includecontinuous speechrecognitio n,speaker-indepen dence, noise robustness,en-line adaptation, dynamic:

vocabulariesandgrammars.Resultsbased ontesting ofa260word Windowscontin uous comm and-and-contro ltaslc, produ cedan average speaker-indepen dentwordrecognition erro r rateofl.4 % onat160utteran ce testing set(Huan getal.,1995).Thesystem is

II

(27)

reported to om inreal-tim e,

Otherresearchen haveinvestigated sophisticatedpattern~gnjtiontec hni q ues andtheir applicationtospeechrecognition.MorganandBourlard.withthe Internati o nal Computer Science Institute,Berk eley,CA,havestudied theuseofhybridbiddenMarkovmodels andArtificialNeuralNetworksto speecbrecognitio n problems (Morgan&Bourlard, 1995).Theattractiontomo resophisticated. algorithmsistohaveapplicabilityto more complexsystems thatcan incorporate deep acoustic and linguistic context.Thesehybrid systemshave been implementedonfairlysimplesystemsonlywitherrorratesbased.on 1000wordvocabularytestsaround5.8%.

Otherapplicationareas for speech recognition research istheimproved transmission and recognitionof vocal signals, specificall y overtelephon e lines.deVethandBoves, with theUniversityofNijmegen,are investigatin g techniqueswhichreduce therecogni tion degradati ondueto transferchanlcteristics of themicrophone andthetransfer channel (deV eth&Boves,199 7).Thisresearch involv estheimprovement of anexisting method ofnoisereductionwhich intumleadsto higherrecognitionrates.Adataset of911 utterancesof a connected sixdigitutterancewas testedusinga hidden Markov mod el classifier. The reponedrecognitionrate was approximately 98%.

Inadditionto improvementstotherecognition algorithms, researchers are investigating improved voice acquisition systemsinan attem ptto improve speechrecognition rates .

12

(28)

Omologoetat.are investigatingthe use of different numbers ofsensors andtheir placements (Omologoet al.•1997).Anarray of directionalmicrophones were arranged and testedinquietand noisy environments. Asetor ContinuousDensityHiddenMarkov modelswereused.The resultswith themicrophonearrayinthenoisy environment showed an improvementinrecognition resultsrelativeto results obtained usinga single microphone.Testing using a test setof2316 words characterizedby aword dictionary of 343 words indicated recognitionrates at&4%.

2.3 Cba ra cte r!S<:ri pt Recogn ition

Areaswhich can benefit fromtheimprovementsinscriptrecognition range from automated processing of banking documents to an improved means of computerdata entry.Due 10 thelarge amountof variationinthe character shapes andtheoverlapping nature of handwrittenwords.automaticrecognitionis a difficult problem.Many variations of segmentation based wordrecognition. and context based word recognition are currentlybeingexplored.Mohamed and Gader, with the University of Missouri.have investigated usingsegmentation-tree hidden Markov modeling and segmentation-based dynamic programming techniques(Mohamed&Gader,1996). The segmentation-free techniqueis attractive and can improverecognitionresultssignificantly when handwritten

words arenot easily segmented.This techniqueuses continuous density Markov character models 10construct wordmodels.Thedynam ic programmingsegmentatio n- based techniqueis lexicon based.Each wordis furthersegmented intoprimitives which represent singlecharacters. Whenbothoftbcsetechniques werelested together. a

13

(29)

recognition rate of97"10was reported (Mohamed&Gader, 1996).

Cursiveword recognition can alsobequiteuseful when sorting mail by recognizing ZIP codes.Gillies, at theEnvironmental Research Institute of Michigan,hasresearched discretehidden Markov models for recognizing handwrittenaddressblocks from the US Mail service(G illies,1992).The approach is to develophidden Markov modelsfor individualletters and to combinethese modelsto form a modelfor each word in the lexicon. Through a processof feature detectionandvector quantization thecursiveword imagesare transformed into observation sequenceswhich are then analyzed bythehidden Markov models.Testing was performed on a set of296 cursive words with arecognition rateof73%(Gillies,1992).

Another popular techniqueusedinpattern recognition and script recognition isArtificial Neural Networks .Wu et al., with theUniversity of Sydney,are studying a special network known as the self-organizing map 0Nu etaI.,1994). The self-organizing map is a sheet-likenetwork which builds representativesby analyzing input patterns and determining the most appropriatemodel.Using classifier based self-organizing maps can be viewed as a k-nearestneighbor classifier(WUetaI.,1994). The training set included a setof 10,426binary imagesof handwritten digits. Thissetcontained the characters 0, 1,2,3,4, 5,6 ,7,8and9.Sixty-four featureswere:extracted from theseimages which wereusedintrainingthe neural network.Specifically,WUetal,proposed usingtwo-- layer self-organizing maps. Thefirstlayer classifiesthe input patterns into subspaces,

14

(30)

similarinputpatterns are self-ammgedtobeneareachotherinthemap.The second layer thenself~rganizesthe inputpatternsandare differentiated into specified classes using the higher resolution maps located in the second layer.Testin g was performed using afurther 10,426handwrittendigits. A recognitionac:curacyof9P ;'wasreported based on thistestset(WUetaI..1994 ).

Other pattern recognition techniqueshavemo re recentlybeenapplied to handwri tten script recognition including variations offuzzylogic classifiers.Malaviya etal.,at the German Natio nal ResearchCenter forInfonnatio n Techno logy, haveinvesti gat ed. using automaticfuzzygenera ted rules for recognitionofhandwri tten script (Malaviy aetal.•

1996).Thefuzzyrules used for recognition contain the feature information extracted from the trainingdata set. The approachisbased on a multi-levelfuzzyrulebased paradigm.The segmentationof the handwritten script is performed by thefirstlayer or thelow-leve l layer ofthe recognitionscheme.Malaviyaet a1. have developeda dedicatedfuzzylanguage FOHDEL (Fuzzy On-Line Handwriting Description Language) to overcome someS}11tacticalunbiguities involved with thefuzzystructural features.

The hybrid approachutilizesfuzzystatisticalmeasuresand neural networksin combination with expert knowledgeto enhance thefinalclassification decisio n. Testset usedconsisted of 1800 distinctsymbolsfromten differentwriters with arecognition performance of92% (Malaviyaetal.,1996).

Past researchhas been extensiveandmuchresearch is continuing, allpushing towards 15

(31)

attainmentof an improved human-machineinterface.Somecommercial systems are available.specifically in theareaof speech andscript recognition. There is considerable room for the improvemen tofexistin g systems and specificallythe developmentofnew syst emsfor new applications.

16

(32)

CHAPTER 3

PATTERNRECOG NITIONSYSTEMS

3.0IntroductiOIl

Mostpatternrecognitio nsystems have a simi lar underl ying structureregardless of the application. Thepurpose of lhis chapteris to provide an overviewoftypicalpattern recognitio nsystems,specifically handgesture recognition systems.Secti on3.1provides a description of a generalizedgesture recognitio nsystem,section3.2provides abrief descriptionofthe different types of gestures, section 3.3pro vides a descriptionof segmentation techniques. and section 3.4provides a descriptionof feature extraction and segmentation through windowing.

J.tGestu~RecognitionSystems

The functionofa gesture recognitionsystemisto provide a means oftaking the transduced gesturesfrom the userandofproducing some meaningfulinpu toraction. Thereare a number of commonstagesto any gesture recogni tion processasshown in figure3.1.

17

(33)

,--

rqnanti:ll,

____ _ _______ :;::Ied _

Trusdueer

i)

J

Fe. turec lun ctioal

~ Segme ntat ioa

---:.>

GestureRet OIll.itioD.

Actioa:

J

l.ollo lic _ ort ruotioD., usismcdonricc

Figure3.1-Geaen..liud GestureRecoga itiODSystem

Therole cfeach oftbese stages is as follows:

3.1. 1 Treusduee r-

This stage convertsmovements into a representationusefulforfurthercomputer process ing.Inthe workreponedhere,the gestures aretrackedusing a sixdegree-of.

freedom measuringdevice calledthe Flock ofBirds (FOB)configuredto track the positionandorientati o n ofthe receiver.Eachreceiveriscapable ofmalcing from 10to 144 measurementsperseccod ofeach oCthethreepositions.,x,y andz,andthethree orientations.,roll..,pitchandyaw,when the receiver is within 3feetoCthe transmitter (Ascension, 1995).

The FOBdeterminesposition andorientation bytransmittingapulsedDCmagnetic field that issimultaneouslymeas ured bythe sensorsinthe flock. Each receiver independently

18

(34)

computes it' s position and orientationfrom the measuredmagneticfieldcharac teristics and transfers this infonnationto the host computer.

3.1.2Feature extractlon/segmentatlon

This stage transforms the data stream representingthe position andorientationof the transduced gesture intodatarepresentingsinglegestures.The segmentationand feature extract ion process canbe very difficultand canaffect the overall performance ofa gesture recogniti on system.Thereareseveraloptionsto cons ider in this stage:

Segment thetransduceddata stream into individual gestures bymeansofatremor filter (section3.3) andextractappropriatefeatures whichrepresent the gesture.

Thisproduces a singlefeature vector representing a singlegesture.

ii. Extract features from the transduceddata stream bymeans of windowing(section 3.4)and segmentintoindividual gestures based on the features extractedor segment the data stream using a tremor filter and extract features from the segmented data stream by means of windowing.This producesa sequence of feature vectors which represent a gesture.

The extraction of appropriatefeatur es can bechallenging.Choos ing thefeatures which bestdescrib ethe gesturescaninvolve the use of conventionalstatistical feature extraction by PrincipalComponentAnalysisor by selectingfeatureswhichintuitivelyrepresentthe gestures.

19

(35)

3.1.3Gesture rrrognitioD

This stage takes the fearures representing an individual gesneeand classifies it as a particular gesture or as not recognized (spurious).This isusually performedby comparin gitto a predefined set of gestures determinedoff-line. There are many techniquesavailable and thechoice:depends on the system requirements.Thisstag e of the process isusuaUythe most time intensive for any recogni tio nsystem (Cairns. 1993).

Typically,the classifierwillberestricted torunin real-time which canlimi t the number of gestures it can recognize andthe degree ofcomplexityofeachgesture.

3.1.4Action

Oncethegesture hasbeenrecognized, the user'sintentioncanberelayedtoan applicationwhich then perfonnssometask. This stage oftheproc ess is application dependent.For example, bandgestures canbe usedto controlwheelchairs. roboticwork stations or machiningequipment.

3.2TypesofGestu res ·Staticnd Dynamic

There aretwobasic typesof gestures:staticand dynamicgestures.Static gestures are the simplestfonnandrepresent gestures such as finger spelling languages,where eachICUer is represented by a static positionandorien tationof thehand and fingers. Time varying movementsarcnot invol vedinstaticgestures. Eachgesture isrepresentedby asingle vector containing position andorientatio ndata.

20

(36)

Dynamicgesturesaremore complexand representgestu:ressuchastheBriti sh Sign Language whereeachwordisformedbythemovementofthebands overatimeperiod..

Eachgestureisrepresented bya sequenceofvecto rsdetailingthefonnationofthe gesture overtime.

3.JTttmorFUter

Gestures are a difficultformofdata to represent visuallyespeciall ywhen theyconsist of bothpositionalandangular measurements.Before gestures can be analyzedbymost reco gnition algori thms theymustbe segmented.Thisinvolvesdetermining thebeginnin g andthe end of eachindividualgesture.

Theaveragelevelof movement oftbe transduced band position readily equateswith a measureof tremoror mo vementwhich can beused asameansofgesture segm entan c n.This approach involves having the user intentionallypro vide periods of

-

,~

[itl

.. ~ 100

.

~~

00 50 100 150 200 250 )00

Tbu indcx - -Tbrcsbo ld -- -SCl mclI1 Figure3.2 -TremorFilter littleor no movement between gestures. The tremor filter thendetectstheseperi ods of reduced movementsor..still"periodsat thestart orend ofeachgesture(figure 3.2).

21

(37)

Thetremorfilter consists of afixedIc::ngthFlFO(firstin,firstout)buffer.Thetremorin eachaxisismeasured bytak:i.ngthevariance ofthetransducerpositionover the fixed time spancorrespo ndin gto the bufferlength(perricos,1995).Thetremorleve lisdefined

where: N-buffer Ic::ngth

j - CUlTCntdimension of thettansduceddalavector X; - j...element of thebuffer

X;;- meanvalue ofxover thebufferlength

3.1

Ifthe positiondoesnot changeinaparticulardirectionor dimensionX;j'"Xi'therefore Tj=O.Theoveralltremor levelcanbe expressedasaEuclideandistance

T '~

~f;r IJ-

3..3.1 Opti m izio gaTrem orFilter

Two variablesaffecttheperfonnanceof a tremorfilter;lhe threshold valueT which 3.2

distinguishesintentio nal hand movementfrom meretremorandthebufferlength N.The simplesttremor filter is a singlethreshold filter.Afixedthreshold is setandanything abovethisthresholdisconsideredmovementwhileeverything belowthis thresholdis

(38)

considere dstill(figure3.3).The optimum tremor levelhereisthelevel atwhic h the still probability densityfunction(pdf) andthe movementpdf intersects.

A disadvantagetothisfilteristhe possibilitythat the tremorlevel mayfallbelow this thresholdduringa gesture.Assumingthat bothdistributions arenormal, the probability

- -r"mo~rh~tlhold

Oprimum TremorTh~uhold

r~tmo~LHtl

Figure3.3 -Error Probabililyin Single Threshold Movement Detection

ofbeingin the moving state and classifying thehand as stillis ana:typ e error (Devore, 1987) represen tedby thehatchedportion infigure 3.3.The shaded region in figure3.3 represents theprobability ofclassi fying the handasmoving whileinthestillstate or

p

type error (Devore,1987).Itshould be clear that inthisinstance it is notpossibl e to reducetheprobabi lityofa beta type errorwithoutincreasingtheprobability ofanalpha type error.

Fromaprob abilistic view,a single tremorthreshold appears topresent theoptimu m 23

(39)

solution to the problem.Inpractice however,errors occurring in the segmentation task thatcouldlead to a recognition errorwill result from the tremor level briefly crossing the thresholdleve l when the hand is in a given state.There are two possibilities here:

the tremorinthestillstatebrieflyrisesabovethethreshold into themovingstate as wouldoccurinthe case of jitter.Inthis case thejitter would be considereda

ii. thetremorinthemovingstate briefly falls belowthe thresholdvalue in the middle of agesture.Inthiscase the movement isprematurelysegmented as a gesture.

Whatisrequired fora reliabletremor filteristhe maximization of the probabilityof remainingina given stateonce the hand isinthatstate.Inprobabilitytheorythis requires minimizing the beta typeerrors for each state. Apractical solutionis tohave twotremorthresholdsasinfigure3.4. Ifthe hand isina stillstate, itis consideredstill

Slfllpdl

~Mo.I"6Trulo,.n.rflhoUi SlIl IT.. ",orn.rflhold

Figure 3.4- ErrorProbabUityinDual Thresbold Movement Detectl.oD untilthe tremorlevel surpassesthe still threshold.Ifthehand isina movementstate, itis

24

(40)

consideredmoving until the tremorlevelcrosses the movingthreshold. Gesturesare now requiredtobe moredistinct. Thehandmustbe morestillat theendofa gesturethan withthesingle thresholdcase andthebeginningof the gesture willbedetectedat alarger handtremorthanwith the singlethreshold case.

Thesecondvariablewb.ichaffectstheperformance of a tremorfilteristhebuffer length.The appropriatebuffer lengthcanbedeterminedexperimentally.Itshouldbe notedthatfora givenbufferlength thetwo tremorthresholds are uniqueandmustbe recalculated if the bufferlength changes.

3.4Windowing - FeatureExtraction and Segmentation

Analyzinga portionorwindow of the datastreamis widelyusedinsignalandimage

Transduced datascream x , ~ roll pite" yaw

r, r, r,

- ,

B,

o,T

Fflaturfl yeetors

r, y, r, », B, 0, .. ilId""

r,

r,

»,

"

B, 0,

1.

$~qM~IIC~of

r, y, r,

"

B, 0,'1

», y, r,

- ,

B, 0,

=> =>

f~aNnnpn U lftill g a.~cto,~

r, y, r,

- ,

B, 0, willdow g~sNn

r, y, r,

- ,

B, 0,

r, y, r.

- ,

B,

o ,:L

r, y, r,

- ,

B, 0,

l' 1

1

:r ...

~

. ~"

Figure3.5 -Wlndowing/featureemaction 25

(41)

processing.Theprocess ofanalyzing onlyaportion ofthe signalat atimehasalso been usedinautomatic speech recognitionsystems.Aportion oftbesignalfromthesensor, posstbly a microphone,isgenerally transformedinto the frequency domain forfurther analysis. Thisprocess canalsobeapplied totheamiofgesturerecognition.POrUons(a window) ofthe datastream from the transducer canbetransformed(featureextraction) priortoanalys is byarecognitionsystem(figure3.5).Thisprocesscanallowthe recogni tion ofmore complexgestures withouthaving 10usemore com plexfeatures.Itis intuitivethatdescribingaportionofagesturerequiresfewer features than the descriptio n ofan entire complex gesture.

The windo wsizeisan imponantfactorwhich canaffect performance.Itisclearthatthe window size can nolexceedthe length ofasingle gesture.This will make recognizing individual gestures impossible.Thewindow size should be keptassmallaspossible without adding computational expenseby over characterizingagesture.The appropriate window size can be detcnnincd experimentally.

3.5 Featu reSpace Reduction

Feature selection canprove difficultforanum ber of reasons.Selecting theappropriat e features which willgivethebestclass ificati onwilloftenlead to selectin g a larg enumber of featur es.Althoughthismayprovidebetterclassificationresults,with largenumbersof features,computationsmaybecomeunmanageableandimprac tical.Aswell,larg e numbers offeatures (increasingthe dimensionality ofthe feature space) will makethe

26

(42)

feature space increasingly sparseif the number oftraining vectorsisnot increased.

A number of solutions to this problemexist.Two commonsolutionsare the use of factoranalysis or principalcomponentanalysis(Duda & Hart,1973). Boththe areas of factoranalysis and principal component analysisare extensiveand willnot be discussed here.Components of each canbe usedto find a lower dimensionalrepresentationby combiningor removing features which are highly correlated andby combining or removing features whichhave low inter-feature variance.

A hierarchical clusteringprocedure can be usedtoreduce the dimensionality usingthe correlationbetween two features.The correlation between two features can be defined as (Duda&Hart,1973)

p._

a

'-

'1 .fOYu

3.4

where oijis the covariance of the features i andj;0;;andOjjare the variances offeatures i andj respectively. Completelyuncorrelatedfeatures willhavep/=Oand completely correlated features will havePij~1.[fthecorrelation between two features is abovea specified threshold,those featuresare combined, orboth can be removed without combining,if sufficient datarepresentation is conserved. A hierarchicaldimensional reductionprocedure can be followed (Gamage,1993).

I.Setdor the dimensionalityasthenumber of features in the dataset.

27

(43)

2.Calculate lhe correlationmatrixof features

(aeJ .

J.Detenninethe most correlatedpair (a.;and

aeJ.

4.Combine~and

a.;.

removeone; or removeoneorboth initially.

S.Adjust dimensionalitydappropriately.

6.Gotostep2.

Thisprocedurecanbefollowed untilthedesired dimensionalityisreached oruntillhe largestcorrelationbetweenanytwo features falls belowa preset value.Inthis thesis.this proc edur ewas followeduntilthe largest correlationbetweenfeatures fellbelow adesired lev el andthenfurtherdimensi onal reducti o n wasperfo rmedby removing feature swith lo w inter-feature variance.

28

(44)

CHAPTER 4

TECHNIQUESUSEDIN GESTURERECOGNITION

4.0Introduction

The roleof the gesture recognition algorithm isto classify the pre-processed gestures as oneor a setof previouslydefined gestures, or as a nongesture.There are manypattern recognition techniques available, examples include heuristic methods, statisti cal classification.clusteringandtemplate matching, andsyntactical methods.Sections4.1to 4.5describe thetechni ques consideredin thisresearc h.

4.1Hidden MarkovModels

Hidden Markovmodels(HMM)havebeenusedextensivelyinthe area ofspeech recognition.Recently ,however.theuseofhiddenMarkovmod els has becomemore widespread.Themodelsare veryrichinmathematical structure and hence canform the theoreticalbasisfor use in a widerange of applications(Rabiner,1989).Theyhave been usedrecentlyinthearea of bead gesture recognition(Harwin,1991 andCairns,1993).

There are differentformsoffil\.W'swithvaryingarchitecture depending on the 29

(45)

application.The simplestfonnofHMM and the mostwidelyusedisthe discrete HMM.

Inthisinstanceweconsider the process wewishto model to have emittedor produceda setofdiscreteobservationswhichcan be modeledThediscreteHMM will bediscussed inmore detailinfollowin g sections.

Incertain applicatio ns.,thelimitatio nsimposed bya discreteHMMmaynotbe acceptable.Forthese casesacontinuousHMM canbe used.1be principaldifferenceis thatcontinuous HMM'shave continuous observation density functions.

4.1.1 A Description of HiddeoMarkovModels

Beforeintroducingthemathematicalrepresentationof aHMM,beginwith a conceptual representation ofwhataHMMrepresents. Perhapstheeasiestwayto illustratea HMM conceptuallyisthroughtheuscofLevinsoo'sgenie(Levinsonetal.,1983 ).Cons idera geniewhohasseveral urnsathisdisposaLEachurnholdsa numberof different colored balls.Thegeniepicks a bailfromoneoftheurnsandshowsittoyou(a single observation symbol).Hethen replacestheballandselects another ball fromanother urn (possibly from the sameurn).Hedoesthis Ntimesafter whichyou havea discrete observ ationsequence0,of N observations .Thegenie does have a preferencein choosingan urn whichwedonot know.

Wenowwish todetermi ne,based 00 the set of observations shown to us by the genie.the numberofumsathis disposalandthepropo rtionof colored balls in each urn (model

30

(46)

estimation ).Once we havedeterminedthe rulesthe genie usedtoproducethe observation sequence.we can nowdetenninethe probabilitythatanoIher observation sequencewasproducedbythesame genieusingthesame Dumber ofurnscontaining the sameproporti on of colored baIls (n:cogoition).

Consider next a simplifiedspeech recognitio nexamp le.[fthe word isbrokeninto a set ofphonem es, the sequenceofthesephonem escanbereco gni zed as aparticularword.A different set of phon em es willrepresent a different word.Asubset of these phonem es taken asanobserv atio n sequ ence canbe modeledandusedforrecogni tio n.Itshould be clearatthispoint thatthis approac h can beappliedtomany temporaleven ts. TheonJy requirementisthatwehaveaset of observati onswhic hweknow represents aword or gesture (mode lestimation).We can then usethis model todetermin etheprobabilitythat a differentset of observations (pattern) wasproducedby thai model.

4.1.2A MathematicalDesCriptiODof RiddeDMarkovModels

A hidden Markov model can be described mathematicallyby aset of matricesA..Band 'It.The matrix Aisthe matrix of state transitionprob ab ilities, the matrix B is the matrix ofprob abilities ofobservingaparticular symbol whil einthat stale,andthe1'tmatrixis thematrixof prob abil itiesofbcin gina particular staleinitiall y (Rabiner, 1989 ).

Determiningthesematri ces isthe problemof model estima tio n andtraining.

31

(47)

Figure4.1 -Genera lized HMMArehrteetnre(4 states)

Consider asystem with4possiblestates(figure 4.1).Inthe most general model architecture, itis possibleto move from anygiven statetoanoth er ata giventime (ergodic model).Itis alsoposs ible to remaininthecurrentstat e.Anarbitrarymodel for this system will havea NxN matrix A givenby:

whe real2is theprobability of movingfrom stale Ito2 given you areinstateI.

Assuming thatwhileineach state itis possible to observeoneofQsym bols (010

0:2,.

0Q),themodel will havean NxQ matrix Bgivenby:

32

4.1

(48)

6,(°1)bl(O,).•. .6,(0';

B"6,(0 ,)b,(O,).. ..6,

(°';

'; 0,> ' ; 0,> .... ';0,1

4.2

wherebl{OJismeprobabilityof observing symbol0,whileinstate1.Finally x,me initial state distribution, canbegivenby:

4.3

wbeee,istheprobability ofbeinginstateIinitially.

Itisnow possibleto comp letelydefinea particul ar HMM (l)usingthe characteristic matrices:

4.4

Givenanobservati onsequenc e0-(°1,0,•...,

Or}.

modelparametersA.,B and'Itcanbe estima tedwhich defin e anewHMM, or0 canbeusedwithan existing HMM modelto determinetheprobabilitythat thecbservatic nsequencewasproducedbythatmodel, P(OIM).This leadstothe issueofbowto compute theseprobabilities ,P(OIM) .andhow touse agivenobservationsequencetoestimate,orre-estimate,A.,B.and'Jr.

33

(49)

4.1.3Rrro&nitioa

Beginby definingthe forward probability oftbefirstt symbolsoccurringand the model beinginstate;attimet,giventhe modelM.

4.5

A recursivemethodcanthenbeusedto calculate the probability of the sequences of symbols up tot+1 occurringconditional on thestate.theobservationsequenceO.and the modelM.

4.6

Initialization.or theprobability of the first observationsymbolis given by:

4.1

Oncetheprecedingforwardprobabilitycalculationhasbeen performedandCtrbas been calculated. theprobabili tyoftheobservationsequence0 beingprod ucedgiventhemodel Mcanbecalculatedas:

4.8

4.1.4Model re-estimadoD,training

Model re-estimation isperformedusing the Baum-welcbalgorithm(RAbiner, 1989 ).

34

(50)

Define thebackwardprobab ility,theprobability oftbesymbolsequence from ttoT occwringgiventheslate.theobservationsequence0 and the modelM.

~TisinitiaUysettoaNXI matrixofl's.

. . .

Combining4.6and4.9;theprobability of beinginstateiattime t giventhe observatio n sequence 0 and themod elM canbecalculatedas:

lJP)~,(i)=P(OlM) "'F("tate(t)"l IO.M) ap)~,(,) a.,(I)~,(i) Y,(f) :F(O IA{) :-.- - -

~••(ijP.(ij Tofollow a proper probability mass function

4.10

4.11

The re-estimetionof1thor theprobability ofoccupyingstateiat~Ican bewritten as:

4.U

Definenowanoth ervariable~as the probab ilityofbeinginstale Sj at time t andstale Sjat timet+Jgiventhe observationsequence0andthe model M:

)5

(51)

F.,(/J)..";(I)Q,b, (O,.,)P,. I(J)

~ ~

",(i)Q,b, (O,.llP"I(J)

The model param eterAcan nowbere-estimatedas;

EF.Pi l

d=-" '- -

'1

~ y,(J)

andB can be re-estimatedas:

t

Y,(/)

b(t) "'~

, t Y , (J)

Againtofollow a proper probability mass function

t d=1 1,{,l s N

/"1 '

b bj(t)=, IsjsN

4.13

4.14

4.15

4.16

The probability ofanobservationsequence0beingprodu cedby theModelMcar. now becalculated basedonthere-estimatedmodelmatricesA andB.Foramoredetai led discussionofthetheoryof hiddenMarkov models andsomeimplementation issues. refer

36

(52)

to Rabiner.1989.

4.1TheNearest Nei£hborAI;orith m

Oneofthedecision rules usedinthis thesisiscalled thek-oearestocighbor decision rule.J[is a decision rolebasedon oooparamcaic: estimationof me probability density fora number of categories.The oooparamcaic:estimatio nprocedureiscalled the Pancnestimationwlticb.providesanestimate ofthe classconditionalde nsitics.Fora set ofNindependentsamplesyI.

'i. .._.

y''''a volumeVcenteredat

9

andkbeingthe numbe rofneighborsinconsideration.it canbeshownthatthe densityestimatecanbe expres sed as (Therr ien.1989)

- t

P,f.i)=;;y 4.17

Wecanfixthenumberof samples.k:.and letthevolume V be determinedsothatthe regionaround

9

containjustIi:samples.We can then defin eV as thatofan m-dimensional"hyperspbere" centered around

9.

Visthevolumeofaregioninspace definedby

d(j.y)sr 4.18

wherethe distance function d isthe Euclidean distance andris the radius of thehyper sphere.

Lettheradiusofthehyper sphere foreachclassbedeterminedsothatthehypersp herc 37

(53)

enclosesjustIesamples ofthe dass,thatisrbethedistance totheIe"nearest neighbor of

9-

lei:N,bethetotalnumber ofsamplesofclassi andrepresentthedensity fuoction usingequation 4.17.

The decisionruleforminimumprobability oferrorsbecomes(Therrien,1989)

.,

lIN'V'1> N/(N.+N2 ) lIN2V'2~",,<N.+N2 )

. ,

4.19

" >

v;-

< I

.,

This decision rule fixes thenumber of samplesofeachclass andcomparesthe volumes oftbetwohyper spheres.This is thegroupedform ofmenearest neighbor dec ision rule.

IfViis smallerthanV20this impliesthat there aremore samplesofWIinthevicinityof

9

thanare samp lesofwJ.Thepoint

9

isther ef ore assignedto w•.

Anotherform of thek-near estneighborrule,andprobablythemost common,isthe pooledform..Thehype r spherearound

9

isdetermined[0includeIetotal samples regardlessofclass.Thisprocedureresultsincqualvolumesforthe twoclassesanda

38

(54)

different numberof samplesk, for eachclass.Thedecisionrulebecomes t,

-.

>

t;

< I

",

4..20

9

isassignedto the classthathasthe largest number of samplesinthe hyper sphere(k sOOuldbelaken tobe an oddnumber toavoidties).

Thereisthepossibility her etohavewe ighteddecisionrules.Thatisif thereisloss associatedwith choosingaparticularclass,thealgori thmcanbeweightedorbiased to onlychoosethe class whichmay causedamageswhen the vectorisclear lyamember of thatgroup.

TbeMahalanobisdistanceclass ifier(Therrien,1989)usesadiscriminate functionto classify an unknowngesture. Ibis parametric classifierusesthe trainingdata for parameter estimation onlyandthisdatais notusedduringclassification.The assumption hereisthattheunderlyingdata is normallydistributedalong eacb feature axes.This leadsto the assumptionthat the class clusters are byper-ellipsoids.The discriminate functionis definedby:

39

4.21

(55)

where:xistheunknowngesture Iii;isthemeanofclasst

K.-1isthe inverse ofclassicovariancematri:J:

TheMahalanobi sdistancetothemeanofeachclassiscalculated.andtheunknown gestureisclassi fiedasthe class correspondingtolbcminimumdistance.The inclusionof theclass covariance matrixK.inthe distancecalculationprovid es ameasure ofthe varianceineachfeature axisandthe correlationbetweenfeatures .Featureaxeswithhigh variationwill factor differentl ythanthose withlow variation.

Thetraining gesturesare used onlytodetermine the classmeans.ID;.and the class covariancematrices,K;.nusclassifierismuchfaster thanthe k-nearestneighbor classifier.

4.4VtttorQuantizatioa

Vectorquantizationisanimpo rtant applicatio nrelated toclusterin gwhichhasbeen wide lyusedinthe field of digitalsignalprocessing andcomm unication.Itis oftenused asapreprocessorto aDynamicProgrammingorhiddenMarkovmodelclassification techniques usedinmanyspeechrecognition systems.The range of possibilitieswhich describea gesture can be divided into anumber of levels.Given a samplegesture. it will bedenoted bya discrete value representing the particularlevel whichbest represents it.

Figure 4.2aillustratesbowacontin uous eventcan be transformedinto a series of discrete 40

(56)

values.Figure4.2billustra tes how vector quantization canbeused tomapeachdiscrete value toa reprodaeticn vector.

Tm.c -:.- Tm.c illdc 1t....

Figu re 4.23-Sam pUngand Quantizati on ofa Signa l([herrie n,1989) ConsideranNlevelk-dimensicna l

quantizerasa mapping.q,that assigns toeachinputvector, x-<:x...• xk-t).a reproductionvector.x-q(x). drawn from a finitereproduction alphabet.

A..'"'{y;;i=l•...•N}.Thereproduction

alphabet(orcodebook)A,and the F"1gUl'e4.2b· VectorQun tiza tion(Therrie n.

1989) partition,S={Si;i-l•.._.N}of the

input vectorspace into the sets Si-{x:q(x)=Y;J ofinputvectors mappinginto thei·

reproduction vector (orcodeword) completelydescribes the quantizer q (Lindeetal., 1980).

Assuming the distortioncausedby the mapping ofthe input vectors is somenon-negative value denoted d(x.l).the goalistominimize:thisdistortion during mapping.There are

41

(57)

many differentdistortion measures.The most commonfor mathematical convenienceis thesquared-enordistortion givenby:

4.2'

Theproc ess of mapping an input vector follows foursteps.

(I)Initializati on:For an N levelquantizer, with a distortion threshold e:tO.an initi alNdeve lreproductionalphabetAo.andatraining sequence{~; j:O•...•0.11.set

(2)GivenA",-{Y;;i-I•..•n- II.findtheminimumdisto rtio n partitionP(A.}-{S I;

i-I•..• N}of thetraining sequence:~e"Siif d("i.y.)s.d(Xj.yJ for alln,Findtheaverage distortion

DaD ({ A:• •.P (A:

))) "..!..E.:~n

"I ..T~.d(x J')I

(3)Determineifminimumdistortionisreached:

4.23

4.24

If the distortionis belowthe minimum.threshold,haltwithA",asthe finalreproduction alphabet,otherwisecontinueto step4.

(4)Find theoptimalreprodu cti onalphabet x(P(A,..))=o{x(SJ;i-I, ..•N} for P(A,..).

SetA... e x(P(A,.».Replacembym+l and goto 2.

42

(58)

While designingthequantizer, only partitions of theinput alphabet or training sequence areconsidered.Oncethefinalcodebook

A".

isobtained, itis usedon newdata outside thetrainingsequenc ewith theoptimum nearest-neighbor rule or an optimum partition of k-dimensionalEuclideanspace .

4.4.1Choosing anInitialReproduction Alphabet

Thereareseveralways tochoosethe initial reproduction alphabe tA,..One methodisto usea uniform quantizer overallor most of thesourcealphabet.Asecond techniqu e,

"sp litting",isusefulwhenaquantizeristobe designedof successivelyhigher ratesuntil achievin gan acceptablelevelof distortion. Consider aM-levelquantiz er withM=-2R Ra{l.I, ...untilaninitialguessfor an appropriateNdevelquantizerisachieved . A simple algorithmusingthistechnique(Linde etal.,1980)is:

(1)Initiall y set R=O (M =I)and defineA,.(lrx(A).the centroidoftheentire alphabet.

(2) Giventhe reproductionalphabetA,.(M)containingMvectors{y;;i= l •..,M}.

"split" eachvectorYiintotwo closevectorsYi+eandYi·e,whereeisa fixed perturbationvector.The collectionA of{y i+E,y-e,iel•..,M}has2Mvectors.

ReplaceM by 2M.

(3)IfM=N.setA,.=AJ.M'Jand thatis theinitialreproduction alphabet. Ifnot,run thesame procedure for an M-IevelquantizeronA,..(M)toproduce a good reproductionalphabetA,,(M)andreturn to step 2.

Usingthissplittingalgorithm onatraining sequence,the initial quantizer isa one-level 43

(59)

quantizer consistingof the centro idof the trainingseq uence. This vec tor is then"split"

intotwovectorsand a two-lev elquantizeralgorithmis om on thispair to obtain a good two-levelquantizer .Each ofthesetwo vectorsisthen splitand the algorithm is omto produceagood four-levelquantizer and so on resultinginfixed-pointquantizersforI. 2.

4,8•...• Nlevel s.

4.5Rule-BasedGestureRecognitiOD

This sectio n describ esthe generation of rules which canbeusedto recognize gestures.

Thestatisti cal theoryrequiredto estimate theunderlying probabilitydensityfunctions (pdf)foreachclass isoutlinedinitially.Theprocessleading10the generation of rulesis thendescribed. Asho rt description ofthe application of these techni q uesasapplied to knownunderlying distributions (the Gauss ian distribution) isincluded to illustrat ethe effectofvaryingsam p ling parametersand todemonstratethetechni que.

4.5.1EstimatingtheUnderlyingPDF

Co nsiderasetofmfeatures ack,k=1.2•...•m.Thissetof featuresadequately describesa singlegesture,CjThere are nsam plesof this gesture ; the featurevector representin gthis gesture ismcolwnns wideand nrows long;~l' ~'...•a&.a}.Therules for this gesturewillbe centered around thehigh density regio ns of theR"feature space.Itis there forenecessarytoestimatethedensity function.Theprocedure forestim atin g the densityfunction isdesc ribedbelow(Gamag e etat.•1996).

44

(60)

Theprobab ili tyPthatafeature vectorill:ofgroupCJbelonging tothe region Ris given by

4.25

where~Jis the multivari ateconditional probability density function.given that the classis Cptheprobability density functionof 11<.Withn samplesofclass

<;

drawn from thedensi tyfunctio np(adCJ ,theprobab ility thatIesamplesbelongto the regionRis given bythebinomial law

4.26

The expectedvalue ofkisgiven by E(k)=nP.The maximumlike lihood estimateforPis given as

. .

p=-

.

Ifthe regionisasswnedtobe small, P can be approximatedas

4.21

4.28

whereiU:isthepointwithin the regio n Rand V isthe volumeofthehyper cube enclosed by R. Combining 4.25,4.27 and 4.28 p(GICJ canbe estimated as

45

(61)

4.29

If thesample size islarge,kinwill convergeto P.For fixedvolumeV, tJUdCj)isan

p fp{g£JC)d~

p~C)=-=-'---

Y

fll

d!!£. 4.30

With a finite sam ple,Vcan be expressedas afunction ofn.Generall y,as n increases V should decreas e.For a univariate casethe lengthh should decreas eas

rr'"

(Gam age er al.,1996) .Generalizedfor m-dimensicns ,the volume can beexpressed as

4.31

Thehinequation 4.3 1 isthe size ofeachsubran ge. With allfeature vecto rsnorm alized over theirrespectiveranges,h can bethesame foreach class.Inm-dimensional spaceh canalsobecons idered as thelength of eachside of thehypercub e.Determiningthesize ofhisshown in an illustrative example of thereproduction ofseveralGaussian distribution s.

46

(62)

Define thefunction ¢l(u) as

¢l(Il)={I O

IIlJs i 1"=1,2, otherwise

This function ¢l(u) definesa unit hyper cubeinm-dimensional space centered at the origin.The function

4.32

4.33

is a unit hyper cubecentered at~and willbe unityifJ&;belongsto the hyper cube of volume V•. The function will be zero otherwise.Thenumber of samplesIe,.in the hyper cube can now be determined as

t••

i- f;r1J'l. .( '" -"")

h where!!£.!!£tECJ

Theestima te ofP<idCj)can now be written as

Ii- I

~ao -",,)

p~C)= -L,..- =---=.

J ft,_,V. h

4.34

4.35

Thewindowsizedefinedby the sub-rangeheighth is an important factorindetennining the estimateoftXa&lCj).Thenumberof timesthis window is sampledover the feature data is also important.known asthe step size. The effect cfthe variation ofthese

47

(63)

variableson the abilitytoreproduce aknowndistribution is discussedinsection4.5.3.

The highdensity regions will correspondtothelocal maximaof

t'hdC;).

whic hcanbe determinedbytaking the partial derivativesofl'h&ICJwith respecttoeachfeature

ac..

andequatingtozero.The secondpartialderivativeswithrespecttoallfeatures,mustbe negativefora maximum.Thiscanbe writtenas

fo",1,..1.2...

'V.t..1.2•..•m

4.36

4.5.2RuleGneratioD

Withan estimateofthe underlyingprobab ility densityfunction for eachclass,thenext stepis10 generate therulesusedfor recognition.Us ingBayestheorem P(CjlW=l',,{a&ICJP(C); where

P(CJ

istheaprioriprobabi lityofclass~.Consider a singl e mode one dimens ional case (asingle feature).The estimateof~CJis illustra tedinfigure4.3

Thegray shaded regionbetweenactandactrepresentsthe rule for thisclass.The area of the shaded regionis setequal 10thearea under thecurve fromac,totez.Thatis. the window height wisgivenby

48

(64)

Thesinglerule for this class

wi n

beIF3C.<ac<lIC]then the <:lassisC.-The 4.37

intersecti onbetweenthe curveand a predefinedthreshold determine the valuesof actand

Figure 4.3•Probabili tyDen sityFunction

or

Sing le-mode.,Slagle Class and selection Wind owrorRule Generatio n(Camage.,1993)

ac:-

The threshold can bedetermined through an iterative process.A minimum allowableerror is selected and compared to the errorassociatedwithme set threshold lev el.Theerroris given by

II!'rror"l-

.'.

!p(acIC1)P(C t)

K.

49

4.38

(65)

This is the probability that a givenfeature vectorac,will falloutside the rule window and notget classified.

A slightlydifferentapproach to rule generationisoutlinedbyGamage(Gamage,1993).

This techniqueuses the possible relationshipsamong the features within each group to generaterules.Theprocess is illustratedinfigure 4.4by a connectivity diagram.The set ofm features arc arrangedincolumns ofasccnding order.Thedata.values of eachfeature

AC.2J

Arclatlo nship Eearure2 linc segmcnt 'acz)

y AC

2t j

It-I "",,-_~_====---

....

Arclatiooship ACDj

Featurem

~ AC.II

AC ••,

AC ~ AC"", AC_

- - "- , -. - - - -

- - A - - - -

ACt.. ACu/ AC.Jj

Ovcrlapping - - - -

relationships

AC~ AC...

Figu re4.4 .A Generalized Con nectivity Diagram for RilleOeneratten(Gamaa;e.

1993)

for a particulargroupareplottedinthe correspondin g column. The feature correspon din g to a particulargesture arc joined by line segments.Arelationshipisfonned by them- t

so

(66)

lines connecting asingle gesture.Thesubrangesforeach colwnnare determinedby a windowandthenumber ofrelationshipsthatfallwithinthatsubrangeismaximized,The subrangesan:denoted A<;. wherek..iandjan:the feature number. the subrangenumber andthegroupnum ber respectively.Therules foraparticulargroupan:determined by the density oftherelationships inthatgroup.

4.5.JAn IUQstrative Eum ple

The variationofthe samplingwindowsize determin edbyh and the samp ling rate determin ed bythestepsize willaltertheappearan ce oftheestimatedprobabi litydensity function.Asanexerciseinpredictingappropriatewindowandstep sizes basedon the samplesize,datawas collectedfrom known distributio nsandtheprobabili tydensity functions estimated.Aset of200 datapointswere sampledfrom thenormaldistribut ion, normalizedtoforce the data between 0 and Iand the techniquesdetailedinsection 4.5.1 applied.Theresultingdisai butionestimateisshowninfigure4.5.

The windowsizehinthisexampleisset 10 0.3 andthestepsizeis10.Ibismeans thai the datawere sampledsteplh-33 times fromtherange from

a

toI.At eachstepthe density of the windowortengthl¥-O.3was estimatedresultinginthe curvesho wn.The combination ofthisparticular handstepsizeforasamplesize of 200reprod ucesthe normal curve reasonablewell.Again thisexample will havea singleruleforeach class.

5\

Références

Documents relatifs

Section 4 is the core of this paper: we apply the decision making method, which is theoretically presented in [9], to our specific problem, and we use its formalism as a framework

CONCLUSION AND FUTURE WORK In the context of an “air percussion” electronic musical instrument, we presented here a data correction and pre- diction algorithm based on LPC method..

It directly applies LSTMs -but without applying CNNs beforehand- to the skeletal data (and to the handcrafted features). Introducing a convolution step before the LSTMs could

We present here our hierarchical system to address the problem of automotive hand gesture recognition for driver vehicle interaction. In particular, we only

Accelerometer data analysis consists of extracting information on time spent at different levels of activity.. Then, this information is generally used in a

We choose the state number of HMM for each gesture according to the experiment results and find that the recognition rate is maximum when the state number is 11 states for the

While hand-crafted feature- based approaches have used informative joints to improve recognition accuracy, deep learning approaches were based on attention mechanism to

Abstract— In this paper, we propose a new hand gesture recognition method based on skeletal data by learning SPD matrices with neural networks.. We model the hand skeleton as a