HIERARCHICAL ATTRIBUTED GRAPH REPRESENTATION AND
R ECOGNITION OF HANDWRITTEN C H IN E SE CHARACTERS
By
tt
Ying Ren , B.Sc.A thesis sub mitt e dto the Schoo lof Graduate Studies inpa rti alfulfill mentof the
requirementsfor the degreeof M1ISt. Cl'of Science
Dcpurrmcntof ComputerScience McmorinlUnlverelty of Newfoundland
Augu st,t901
St.John's Newfonnd'land Canada
1+1
Naliona1l lbra ryorcanaoa Blbliothc('JuC ll,l !lClO'la1e oucaoooo
Acquisil'Onsand Dsocnondes ecqorsncos ot
Bjbliographic ServicesI3fill'lCh desSClvi<':CShibhO\J"lplllq'k'~
395WClilnglOlI STrCCI 3(I~.""'W" IO"!lI,..
~IAotI)o'!"'oo ~\I~~~P'I"""J
The author has granted an irrevocable non-exclusivelicence allowing the National Library of Canada to reproduce, toan.
distribute or sell copies of his/herthesis byany meansand in any form or format,making thisthesis availa ble to Interested person s.
The author retainsownershipof the copyright in his/herthesis.
Neither the thesisnorsubstan tial extractsfromitmay be printedor otherwise reproduced without his/herpermission.
L'auteuraeccorde unelicence irrevocable at non exc lusive permettant
a
la Bibllothequ e nationals du Canad a de reproduire,preter,
distribu erau vendre descopies de sathese de quelque manlere et sous quelqueform eque ce solt pour mettredes exemplairesdecette thesea
la disp osition des personn eslnte res aees.L'aut eur conse rv e lapropri etedu droit d'auteur qui proteg e sa these. Nilathesenides extraits subs tan ti els de celle-ci ne doivent etre impnmes au autreme nt reprodults sans son auto ris aUon.
IS BN \:)·Jl:i - 781Jl -7
Canada
A BS T RACT
ThisthesisIlTl'S('nlsl\syslt"l1whichi"(";opahll' ufrt'nJgll izill~h;L1ulwriu .'u(' hill' '''' ' characters.The 1,irT llrcl.;c;l1illtrihlltt'l:lgr;'llhT1'pn'Sl'ul ntiulI(11,\ (:It), a hnd, ·\",.J graph.is illlrod lln'tiloCII'St"rih., tl,t· "Irlld "r al.111tlslal iMlj,·,,1illr"n ll;ll i" lI ur1.;11111·
written Chinese characters.Till'firs tIj'w \.1,'S,-r ilo\'Srmlintls''' 111w1;,Ij,'1I1' ),.-1\\" "11 radicalswithinitd' il Till'lt' l",IIll' ",'\,(>n.II ,,\"<·,'!l";"rilu':!slrulws111111 ....1,.1;,,""II<'1.wo'('1I stro kesin arad ienl.Wit h11,\ <:11,lin-1'l't'o ll,lIil,iuliptu l'l'SS1."'"11111""11shlll'l. ·task
"r
gra ph matching. A ('u"lrllll cliu llllIHp pill1!;11fillll li, I;,t"lll; 11Il",It·, grill' llishu.n ..lufl·'l.
Thlsapprcachrnntoh-rnnLIIl'\';uiilliull"o r llt\ (:llwhirl,t,.(I,,"ttllPillsL,,1.1iLjl'S;1I111
variabilitk'9orhandwrlucu(~hilu~'..!t;lri u"I.,rsTI'Su ltiultrrulII,lilr"n'utwtitil1~slyl.'!!.
Severalrule!! han'"''1'n111'1...1til n'-i\rra llgl'tl."ot. l" t urtill'wrt in 'S..
r
tl...Rt OlI.I,sill orderto avoidth,~(,OIl1hilliltut illlc~x"l"siollinl ...r'~ ltill"rOlI,11ml,td.ill"',1101...1....JlAG R,t hc mo cld cllltft!lllSL' isUt~alli1....1asa11I'1l·tI ,&',\1<SIll SIllulti·wa yhI S 'slrlldu r<'.
localdecisionsatclilft'T\'lllk-vr-lsurtill'Irl'<,t..li.ltlIIrurrc'SltlilUlill~111"." 1,IHlt'lI'lo,t inthedatab ase.TileIlllltd liligl'tcwc':<Si1'l\1't)'dlj"i. 'lIt1111<1llfTurlll . "HIl.I111'1 w.-11tltl' syste m can «cquirct1~Jlr"1'Icllli\tiutl.sorc;IIMI1I:t"tSbyn 1"ll.tlll11Jt IU'''''ss . Si,v.'n,1 IIA·
Gllsorsam plesorII r:hlll'i lt:ll: r GillIII'sYllllll'siwr1iutunsill,ll,I!,lII\C:f("r ll l"dill t llt t ,'r
whichcan thenbeindllf ll'(lill tilt:IIll1tld,J;ltll l,I'SI',IIIHfl,litjllll,till:It'l,rllill,ll,pron 'ss canupdate thcmodels
"r
dH.fM:letswil l.tImIII\C: lh"ftlwirSMIII,I,,...Tll<:sysll:m isimplement ed ill G 0/1It~III'S/~I.120 fllllnill~UISC;/O S(V" rs illll·:t ,lj .iii
ACKNOWLEDGEMENTS
I wishIu '·XI>rr'.SSIll)'tlumksluIllysuporvisnrDr.SiweiLIIforhilicons t a nt en- l:" ur; I W ~ Il Wll t,illsightfu lglli llalwc,Hill!cunst.ructivesuggestion. \Vithonthls generous c:ulItriIIlJtioll,it wuullllwimp ossible to give thisthosisitscurrentquality.Jwouldlike to thank1,1lC'Sys1.l'1lISSlipportsllllrrOI'jlrovidi llgall tll['lrt'll'iUII}illisisla lLC('duringtile n.J"I Iu-l orIllyIl'M'al ck 1,1111 lib"ve-ry griltcrullUthc AdmiuistraLh'estarrwhohave 111'11"...1inOlll'wuy or IIllulllt'l"hi11ll'pTl'pa r;lti(J11 of U,isthesis,[IIaddition,Iwould lik,'to lI('knuwl l·, lg{·I,lwlillllllf iillsupportreceivcdIromtheDepartment of Computer Sri"llfl'andlill'Srll " ,,1ofGradual!' St.udies, Specialtllllllksare dueto myfellow gr<llllllll,'studl'lIlsundgood frit'lId.s,lindinJlllrLicula r10 ToddWareha m,Anthony Wiltg l\('y Szeto,Hclnmt Hoth,;IIHISeanIJogilllfortheirvaluablecommentsam}
Il.se (1l1suggestions. 1 wou ld likl'liS,'the rbauccto thankPrnr. Jane Foltz, Pa t r icia Murphy,aurlHcXIIfortln-irIll'l llan dassis t nucc.
"'''i.~1J,....,i.~i._dn/i nl/"I /oIlly jNIn'III.'fI/Ill,,,!/...;.•tr r fm·/I"i'·f l"· OI 'l 'fjn","I /{,mf/ !jl ,,w/
1/"n"I1~"_of mynh". ,/li oll.
T able of Content s
(;hnptcrIIlllru.]•..-tj ' Il' .
1.1SlnlO't 'lwuftl"'sp l"1l1., 1.2{Jr~lIl1il.i11j"ll ..
r
111"11 ,,'Si~•.1
•••••• ••••;1
2.l llil rud lll'li,,". 2.2() u·lill,·II"III]lI' rill"1I('('It
2.2. 1l-j'atIlTl'il1m l~' si"._
.. ...8 .10
.. _..••... ... .... .... ... .••....•.10 2.2.2 Tiuu-1"1"111"111'"
"r
lUIII'S.•Iirt"<: ticlll!<.or cxtn'l,wlI_. .112.2.5,\ u;"rsis.J,y.syllllw"is . 2.2.001111'r111<'111<1l1" ••••••••.••• ••••••••
2.:l 1Ii1luhnilkn ;111,]priltll ,. J(Til. 2.:l .1Sl;llisl inlll t,r!lllill!U'S.. ,
2.3 .1.1'1"-1111,1:'1,- mOll d.iug... . . •. • 2.3.1.2Trau"rtl rulal~lll.
12
...13
...1 ·'
. ... . •1·\
. .•..•15 ... 16 ...16 ... .1;
2.3.1.:1 Sl rtlk,',lisl ril ,"ti" ll. 2.3.1.·1B'll'kgHlIlllll[I'a t lll'",1islrit'llli" 11.
2.3.1.7 CUllll,ill"l iullSd W IIll'.•.
2.3 .2 Sl l lll·tlll'" llo'dllli,l' lt's ..
2.3 .2.1(:rmtllllill'lllo-lllt,,1 .
Chap ter3I'rt·prol·I'ssin.l.\.
3.1lntrudurtion: 3.2Thinning.
I;
.IS ...•... . •'211 .'211 ... ...'21
..~t;
..~j
..:.!~l
•..:.!~J ..;1lI
3.3Tl'l,r illf,. . ;\:,
3..5SI.I'Okf·~rtlll l'ill,l!,. . -I.'",
3..5.1C(Jtllll~'lioli"luslf'rill~. ..-17
3 . .5.2Ikrl an,ll,]f,rl"S1('riUIl;•.. . ·17 3•.5.3llitl1lryIlil"isiflll,-llIslt-rill,!!, . . ..·IX 3 ..5....l)isla lll:f'UI"il S'1r<'lIlf'lll dllslnill).!,. .•... .... . ... ... . . .-I!l
4.l l lllr"d u' li" 1I ... ..... . ..'"11
.
):}4.1("'Il~1.roll"'i" u"fhi,·r..r.-l,i,.;,) ;,llr; I>II1I·,1gral'lI. . ,....•G·\
1.1 .1Il w li"ld1,llrilollll'c!I;rill'hnmslr1wlioll. ... ..6·1
a.1
11Iln " llwli"1l..•.•• • .•.G·)
... .Gi
.nr
:;,2IIn,Iic-;,1ililr il ",(",],l!,l"ill'lluIH ( ('bi ng.., 6S .. ... 78 ...81
....8.1
Ch np ter 6~IUlh'lIhll\I,;,sl'(hglll1izl'It11)',\Hcu-rcgeueous~llllli · l':a)'Tree ....89
0,1huruduvtiuu. . .. .• •,89
0.2t!d ,'rlJh,' IH·tJllS 1I11lIl i,w; I.\·In" orguuizutiou.. . .. ... .. . . .. . .... •.!lO
r..2 .1 IIllsit"nllll"l'pl sfu rIIII' Iw1t'rng('lIemlsIllulli-I\'il)"trcr- !l1
6.3('u l1sl ru r l i' lIl,,r1111lIti-w,,)'trcc-
...93 ..101
6..1.1S,·ar"],ill/:;,I~"rililin... .111 1
0.5~11'11I"1'Yn-dm-tlon. ..II :!
(i.1i.1:\11'1l 1'!I'~'1',·,lucli,m witII r;"linllli,II,,·,1lixt . .II :!
7.1lntrodurtion... . 11-"
7.2,\ 11rihlllt' d I!.nq,hs.\"1I1IHOS is. . . , ,.II!I
7. 2.1Syllll sis1' \'illwI1i,,"ofl\1"UIl '\ (:s.. . 1:!11 7.2.2SY1l1 11l'Sis,,\',,1\1;,1illu"f1\\,,,IIAC:s.. . 1:!7 7.2.3 SYll111l'sis
o r
I\\'''rillli,'aliltll'il, nlo'dj!,1'''l'lts..7.3 tJl'clillt'llH'nn"I,·1,1;,1al",sl'withIIII' Sl"1I1,1,'S .
7.3.1.1 TIll'llIlslf,JI'd,ilrar:1,'rs .
. I'!!J
.1:10 ..I:H
•...1:1:1 ...1:1-1
7.:~.1.:l·n,.·I,...1r..rrr ..!i....I"
7.:~.:J.1Tn,m,f"rnlilli'lIl
GIllljl l c r 1'l (:'"WIIlSioll"•... .... ... .... . . ....• ..•. .
tl.1SlIlllm;'I.\· "r n lllt ril>11IiOJl" . .
ix
.•I:Uj
•.I:J5 .111 ..1-11
••1-1:1
..152
... ... . ••. .1!i2
ll('rC rl'IICI' !i
... .. .• ... . • .. ....I!i·j ...15$
.. 1·55 ...155 ...1-59
....160
List of F igures
Figure1.1 Structureofprup'N'd~pl t'l ll. ..Ii
FigUl"l' a.1 EigllllI\·iglih uri lll-\l'ixl' bof;1pi:-,,!/Ill' :11
Figurea.2 E:wmpll'or,.\'(' 1"C'I"u siu nI,yZhang-Slll'l1;,II-\••rilhill. . :\:I Fignwa. a C'o lllllO trislllLsoftlu-IllilillitlJ.!,alJ.\lJl'itl tllls• ...:1...'
Figure:1..1 ('ode>s ;,11<1,l i n,<"t i.,l\ Nfu rl''n'''l!liln\dialII ,·".Iiufl,. . ,:Ili
Figure:.t•.; Elwudillg ('Xillllph..,. ...:\7
FigUte:I.fi ('] i1ssilic al iun
"r
r.lu-lint·XI'gilll'lll,N,_ •.• •••:I!IFigure :ti Falsr-slwkc' (·Iilllillil lin).\. .... . .,11
•.•.••..••. • .,1:1
•.J.I
•.•...•..••••.,Hi ...~x
Figure 3.12 lindicalsllill ll'1111bt:.~ql,rrlt~lll.etlwith
e-c'!'
IlI ll 1JI·C']' ~x... ,!,fl
xi
Fi.l;llrt!1.~ S1.nWI.llr alnlllll,illation s ofradica ls... . .... .53
Fi~lln!1.:1 Il",li"11 1allrilml"d",nl]Jh. . .... 56
Fip;lll"l'1.'1 Anll"li,lal,' d,aral"1wauditsIIAGB. ...58
1"iJ.;lIl"1',IfI flitsofthl'"lIl ri.·s illthomljiH:ellcy matrixofradi ca l
l'
611·'i)..\II I"I'·U i lisill)..\"ll!ry " farlj ill','n' '}·lllil t rix to rl'pn.'St'lItthf'
.... ... ... ... . . (l3 FiJ.;1Il"l· ,1.7 (\JIlsl rlll'l,iuliofII,\CII"fa handwrittenCllilll'SC cha racte r , ....,66
Fip-un'[I.:l A,fjar"'IIl'yIli alCi'TS:l and IJ,..
...69 ... ,,71
. 74
Figu re'!i.,! i\1i\I (~uf ,I1111,1IJ ..Iter I'xdlilligilig the orderofVIandV3 .•••15
Fignn'!i'!j H'lllil'ill
s:
111 111itsmodel. ,.79Fihlll'"[J.li Exalll!,!"f"r "I'llt-ring slruk,·s" itllli I, IlsingHilleI ,, 82 Figlll"t,[,.7 1·:Xilllll'lc·forun l" I'illgslrok,·s"andI, withHull' 2 , 83 Fi~:lLn '.'i.S Ex;wlpl,' furI1nk ring strokesII findbwith Rule3
Figul'l '~j . !l Exallll,I.,fur()l'Il"rili gstrokcs (/andbwith ltulc-l Fi,t.:lIl'l' .'i.ltl EXilllll)'"forunl,'ring slfuk,'s inarndicnl
... 83 ... .. 8·1 ... . ..,85
xii
FigureG.::! Tee...•tt1Jrc-'S<'llling tl"· IJOlrtil i"n..
r
a partinln "''''li''lll' ..!lliFigure6.:1 Tr c...•tt'PN"N.'UliIiS lht'l'i'lrl iliulIufit~t..-tionintoJ:;rlI lll ' S •••••••!l;
figure6..1 TTL...•tt'J,n-st'lIlilll; tl..·Il;lrl il i,,"tlr;1,:;r"'111". • ••!11'l
FigurelL5 RCI'T('St'lllillgrllllriu'kr
) ~
whhapa thill till"In.., !I!IFigure6.S HIHlin,11isl,uf11/1,\/.1,' . ....11:\
Figure6.9 Clllllprl'Ssi flll \"t'C'lur• • ••••••••••IIIi
•...I:!l;
..J:!;
...I:VI
Figurei.G Tablenetworknr~a ' l i ~~l l ill'I.Ifm"d.·1r1"lltl",s" 1~17 ...I:I~
Figurei.8 EXilllllJlt,..
r
II,\(:i/lt('~r;lli"l1 141/.. ...11:l
xiii
Fi~llrr~7.101[1\(;ill t l'~ril1.i ullstr uctureof 11I1inputcharacter .. .147
Fi~ 11 1'1'7,11 Ta l,ll'network1Ji'l~lllli~a1.iOIiofrlata baxe.•.. . •.1-18
Fi';llfl'7.11 HAC;inh'g l'lI ti ulluud111\(;intcgrntionilltheupdating .,.• •..1-19 Upd al iu/!,f..'slills aft( 'rSll~Jl8 "", , 150 Figur«i.I'1Fiu111upd at ill gresul tsfortheinputcharact er ." 151
1"iAlln~";.1 Chlll"1w!.l·rs;'lIIpll'Sbe,jugrl'jl.'d cdormisclasslficd, , 157
Chapter 1
Introduction
Hand writt enChinesecharacterrl'Cogn iliu lIis wI'1lklll/WlituI,,· awry,lillin.ltpnolt Icm. Because of ibpraglllatir.value andits lh1rtir.1I1ariLyilllilt!!;.·1..1111'1.,...•Spiln ' recognition,tiletopic11a.~allriu:k ..1111<1.11)'rc."Sl'iudIITlialllillil"1,,'('UIIl"HIll" urtil<' mod Ch.ll1clI~i n&researchllllhjl~dsill thefield1I{ patternn",('~lIitjUlL. '1'111"lIlajur difficultiescome lrcm thefcllcwingIactors: (I) There arcllIuteth.m ruttyllrIJusillul Chi nesecha racters.Overa tenthof thcinilrt~flllllll lunl y1/~·. 1ill.lil;lyIif.'IWilll"
1988].Suchalargenlllllhcrorcharad l.·fcak~flri,'SIllak.'ll1I1'~r.!t"u~I:; l iIOllV'''y .Iilli cultand slow. (2)Tilestrnc urresufChines eI:hat ll d"r:!liT!:wry.:"rll pl i':i~1.f'11.MlIJlY
shapes,stylca,positionsIIfraJir.alll,and,Iirl:d ifllill IIrIml~llisuf!.11I:!'ol r..k!",.
IIIspiteof thesedifficulti es,manytechniques havebeendeveloped forsolvingthe
"rohlclII.TIleYhaska lly canbe classifiedintotwo maj orapproaches,namely, the
staii~ticalllletfJOdandthe struc tural method.Typical statisticalapproach es areoor-
relnuo n matching[Yalllll.6hita , ct 11.1.19821,bac kgroundfea ture distribution[Natio, etul.1981],backgroundanalysis[AkamatusandKomori19811, strokeanalysis[Me r- ishit n,etel.19881,andorthogonalexp ansion[Ar akawa1983J.Correlat ion matchi ng typically requ ires some forms ofnormalization. However,nor malizationhas onlya limitedcapability to compensa tefor the huge varietyof writ ing styles,suchasred- icnlpositions,direct ion andlengthof strokes.Thelastfour approachesdepend on extractedfeatures such as strokelength,strokedirect ion,strokesequence,andre- lationbetween strokes. Usually these features arerepresentedasafeaturevect or
«ndthe characterrecognitionisperformed by selectingtherefer encecharac terwith theminimum distnncefromthe inputcharacte r. However, these featurestend tobe unst a ble ashnudwritingdiffersfromperson toperson.Onemethodtoovercomesuch difficultiesisto shifttheburdento the users byimposingconstra intsontheir writi ng IWakahara and Umeda1983).However,not all userswill acceptthisand somemay havedillicultiesobserving the specified rules.Anotherwayis to increasethenu mber of 1II0delsfor eachcharactercategoryinthe dat abase to allow forthevaria tion caused hytheil1~ ~abili1yofthefea tures.Unfort unately,thiswillincreasethe databaseofthe Chinesecha ractersenormously, For exa mple, the hand writte n Kanjidat abaseFTL8 ('(llItnins152,9 60sam plesofonly 881differentKanji charact ers. Eachcharactercat-
character5('1,thelIIalc l.ill& IINn'SS willht' \1.'ry lim\""ulI""IlIillt; .
J\lt ho u&!1 Chlncso char a d eI'll han'a\1.'r)'rmnl'liralt~1,.tnlt·lIm',lilt·)·'In',.Inu·, lur edaccord ingto~llmeruleswhicharl'illlll'llt'utll'ulufwfilill~"1)'1.'S.Tllisslflwlu n' canbedividedintothreelevels:thewhlll.' dMrarlt'f11'\'''' ,tilt,r;"lit,.,IIt'vl,I,:\11'11111' str okelevel. The gt'Omcl ric dl1lf adcr isliCll of tIll' slru kt'll \'IlrytllSUIIIl'.'",It'lllwit.h diffe rent writers ,hUIthclrsl' ilti lll rdat inll s,l11i1gl~lllll'l,ri l'1'1l111iAIlr<1 1iullS <HI'IISII;IP,/
wel lmaintained. '1'111:",:Ilw!,l'rtil:;;1:1111Ilt,fq;;ll'lll~1liSiUVilrillllt.r.'iltlll'l':<,!luil11ak l' stru ctural appn!HChl'llmoreattrndi ve:fur1I1l'fl...·l.guitili llu(II,uulw ri1.l.c-1I(:hin,'lII' characters,Olle line orn'lll'ar d l lI'itllillslrndu rill;11'I' TtI;wlll'sr"p n""'lllsil.·llilr;u"lpr patternas ast ringandl\&rAllmlllr(I:ill ll:r omt t'xt,frl"ur tl'St r idl 'llnllllt'xiSl'llsit i\'!'
~fammarsuchallan illdexN I gra mmaroraprog ra n llm,,1Aral mm,r).0111.1ot''''rst'r ft.r tlla lpar t icula r gramma rj"Imiltlurec o gnizetilt:pallt'fll !ZIIO'ur.anel XiaI~JS;I;Tili andLiu1980;ZI1ilO19!)()!,Slrin ll;gralll lJl<ir:!nrclint ,tllWi,tful;u"III.'xillll't'II<II1~1t1..
handleverycomplicatedcharAclcf!iwithS('\'('r....lralljcal~,TI1I1shj~Il('rdillll'lIsj' OII i11 gra m ma rs(lree,plcx,or",(:1,)Uti:tl'lilli tl"f.1II1WI~Vt:t,hiKllI'r .lillll'lIsi"Uid J,!,rilllllllllt!l arc milc hmoteclllllplicl\ll~1and IIII:kpracl k al value. Auuthtitp PIUiU ,.i.sIII1IS1' patternmatchingillstCllfJll(parsiJ1~[Cllll nan.1Cllt~llll~I!JII~J.1.''1:011111Khu,I!JM!II.
A cha racte rpnttcmisrepresente d itS Il relatlouulI\raphill wlIii'll1I1l~Vi:rlin 'sw!', rese n tthest rokeswhiletho arcsreprt~cllttherd aLiunshil1 I.elW....-ustn ,k<::i, Tim
f)1~~ pi t.~ tlw~/~IUr V,IIII,il~I'~,cu rre ntgra p happroachesde notIullyuscthe str uct ural IIrCJp l ~r ti f'~an.ldllnotpro vide effoctivoorganization ofthehugemodeldatabasefor f,~~talld,U:C" lIri'l",~,:arclli n~. Forl,lti~rea so n,a newmethod wh ichis notsens itive to wril,illAstyll~Si~prop()st~ 1anditolfersusers ahighdeg ree offlex ibilityfo r cllcc- tivl ~rl'cogllil,ifJllofhandwrit t enCh i1Lt'S,~char a cle rs . Thismet hodisderived fr o mthe stahlefea t llrl'Sof thecharnctcrs.Howeve r,illtilerecognitionpro ces s,some of the 1I11SI.1I1.1.'fl"I1.ur. 's~!Idlastheoricntnticnsofthe strokesarcalso necessary,Hicra rchi- cidIIt trill1lll~1grill,hs aredt' vdo]lc, ltodcs c rlbe bothluvarla ntaudunst ablofe at u res, wlth;lIlj'II·"lIl:ymatrlres1IS1't!1,0rep wst'nttheseatt ributedgraphs. The bits ofthe
"lIlri,'~illa mat.t ixd(~snilJl'theitltrihutellsetassociate dwitha vertexoranedge, 1I,'Spd011lilt'1.l1l~bit-wise rcprescnt atlon,IIcostfunc tion isintroducedtoma pthe grnl'hIIf allinputrhnractcr10thatclitsmodel.This approachcallprovide som etol- ,'nUll"!'toLllI'va ri ,llioliSofdl<lflld er swrittenin differentstyles,For gr a ph ma t ching,
S,'\TI";IIrlllt,sareappliedtore-arrange theorde rof tilevert ices of thegraphinorder
1/1il\'(li,11,lwl"OIllhi llat uri a lcxplosic.r. Furthermore, themo deldatab aseisorganized '1.~aIlI'tl' tIIgl'Il"OIIS11ll111.i. wayI,r\.'(~according to thespat ialrelati onsbetweenra di cals, ttu-numlu-r ofs\.rok{'s p,'rradirnl,andthegeometricconfigura lionof strokes in each nulirnl. Irsill~thchil'f,1rl"h;f,11attributedgraphre pre sentatio n andthemult i-waytree urgilll;'m t,;ullIIf1hec1,,1.ahase,theefficien cyofthe mntchlngprecess canbeimproved t"llllsidt'rll hly.
1.1 Structure o f the syst em
oftwo functionalparts. Thoflrst part isIromIhI' HH,.l'hll~,"llLi IULl'illllpl,'s ofachara c t er" ,to tilehex"''I'"hlp networktlr~nllbwl,iullofIllt,,!t,\,lal ilh ilM'~.TIlt' figureilIll~tralt'lltheprocedur e furbuildingaudIIptla l,illgtil\'111",11·1 .lal;,loa...-. TIll"
rest of the flowchar t showst.hesl~nllillIlilti.whirl!IlI'r[' ll'lllStill'1"\'{"ull;u;1.;m ttnsk.
Therecognition procedure mnyhr-did,I, ,]iutctllrl"(' k'vcls:lull',illl,t·rll"'(liatl·,,1111\
high. Thelow-level essentially ilivolVt'sthin lli ng,skt'ldulltrill' jUg,:"'p;I1"'IIL 1Ilt'rging, andstrokegrouping. At Lil,' iutcrruediatc-Icvel, l.hr-hh-rurehknl;Il.l rihul.,·,1grilt.11 rcprcsontaticn
or
OleinputIlilllllwriu" ueharurteris gt'ltl'rat.t'fl.'1'111'r''('IJKlli ti"u ill the hlghlevel involvesmult.i-waytn 'I'~~,;trl'hi lll!:,l!:1"ill'llIImtcl. il%.111<1 IlI<lppillJ.\<,usl compuLat ion.1.2 Organ iz a tion of the thesis
The thesis isorgmli;(edinto eigbldla p l.('r~ . Clral, t.f'rlwuilltmd lln~~(,xistill~ll:dl niqu cs forrecognit ion ofCllilifOSl:dlarartl~rssuchas"j;ldli llt~prinkrl(;llillt'sl ~dHlrar"
te rrecognition,
o r r·
lineCldlll~Sf~characterrecog nitiou,IUItIun-lirwChirrl'1\f~f,h.. rndn r['cognitio n.ChapterthreeclcsniheshowlireloralprlJpf:rli(~fjfall illl'utdlil r••ctnr arc chtaincdand()rga ll i~cdforfllrl, ll l~rimag eIlilillysisilll,1n~"rl :S' :lIl.a l,irJII .(:ll«l'lo:r fourisdevotedtothelril:rardriciLlal,lri],ukdgraphrcpresenf.utionf,fIJilut!wril.l. :uOht alnthebinary image of of aninputhandwritten Chlnese charartc r
l'rl!pr'orCl<slng
Thinning
Skeleton tracing
~lcrginKsegmen ts
~<'tlht,input ' No
-<:
~~ Milr lt'r I""" ~llltc1lillg?
TIII'rh,lmderis rt'ro~lIil.t'd
Yes
Figllfl'1, [SII'IIl'lUt('of proposI'llsystem
Chinese rharar-tcrslIudIllI'fOI1s1rucliuIl
u r
III<'hit'm rt'hir,,1"tlrilolllt..l,:;r;'l' h rl'I,r. ,·scntat ton.Chal'l l."fIhl:!:i\'l"!llllt' Illt,thudfurrhi'fil(-h-r""'uI:,nil i"Il,,\ t"l",1Illil l'l'illF;
funcrlo uis introduced to11\/llr h till'allr;IIllIt'l1~ri'llhuf au il111111dlilr;u'h "filll,lt !Lill of itsmodel,IIIChlllllt'rsix,lIlI'hc1e"ftl"I'lll'lll1~1tI111ti'\\"il~'trt'"list'lltu"TF;il lli~,'lh., model dlltllhast" ror r<lst itllilitl'f n rith''''';IUhi l1j;is.Ii"fll"st'll.lIi''''''11'' 111111'IIlu ltiWilY tree,thesea rchprt)CI~SScallhe Ili,,;,I,"<1iliinilnutuln-rt1r';;1111'1., ill" III1I'1I1, 1''1';si'"lsill
iug procedureror huildillgilllll llp dirl.illl!;1.1)('II1lJd.·llla l ah;I.~I 'isdl,~t'1"i l ,,·dillCI'ilpll'l"
SC\"Cll,Chaptercil;h1p;i\'I'"IIll'1·'lUdll"i"II.~illII 11"IS,~ill1I'lliTt'di"ll"fu rIurf.lu-rTI'st '"rl"ll,
Chapte r 2
Survey of T echniques for Chinese Character R ecognition
2.1 Int r o ducti o n
('hilra d I'Trl.'I'uJ;nitill llis II slIhsd
IIr
pattcrurecognition. lt was character rcrognil ion tlllltIl:<ln'till'illl"1'1l1i\"'sfo rmilkingpatternrcc ogn it.iouandimageanalysismaturefit·ldsursc'iI'IIl'l '. Chillesl'dlilrlldwrceoguition(CCH)offeredIIchallengethat was
ill(,(·tlilillk.,~·ilsp ,'("lsTqln'st'1l111'-i\'('orIII('largerworldorpatternrecognition. CeB.
prll\'hlt~linilll1mrl a ll1.II'HY tu inputChinesecharactersinIImassivemanner.Inrecent .\"I'i1TS,al'tJIlSill.' rah ll'muounrorworkhasbeendone onCeltTheCeRtechniques rnuIll'rll1ssi!il,d101\1IhTl,('ruaiura lt'gnrit'S,namely, I)printed
cen
whichis the rl'"ul;lli li tl ll\Ifsl'l',"ilk('hi li l' SI'f' ll l ls(SmJIIg.lllar k.Kal,etc),2)oil-lim'ha nd writt encellwhichisthl'a...·ugnilitJll ufsi1l1;1,'ha11<l \\"ril1t'nChinl's,-..hilr,u·lt'r~_wlll'n'not onlythe c1lilril("ler ill1.lg'-bUI also Ih'-lilllill!!:illf" rll lOl l iuliuft'a dlsl n ,k ,'is1,nl\' i ,I,~I_
3}handwritten
r eB
whirlIistln-rcrognitiou"fsilll;l..hamlwrittvn('hi,u'S('dlOlr"d ,'rs whicharclI11fOlIIlC("\t...l .lull 1101wrhtenill'·ill1igrOl ph y.~IIfar,prilll,',l ('('Il,11,,1,111 linehandwrif.tcuCCHS)'s1.t'lIISMealnwly<ll"ilil"I,lt' illtill'lIl;lr kl'l 1'1,11""Tt"'lllli' llil'S for doillgumlu-font.prinled C{'1l ar.. ill·ail.,I,l..in tlu'lal ,ural" ri,'S.Hnudwritu-n('('II, however.isstill rar Irum'willI;Ilrildicill \Ll'lI\\'l~I~'jI.Atypir <lltCnsy~tCl1lill.-hltl.'Stilt' f"lIowinl;fllnd iUllill(·tJ1HIHIIl"lIl,s.011ringpn-
images.Ccrteiutyp cs of Hoist' (·l1nl... ,·lilllill;llt' dilli.hlsl'" iu l. l'll tun·lim'Ililll'lw,.il 1,1~1ICCIt.tIll' stroke's llosit iuli.•lin,,·Hun_i1ll1ll"nll,l,1litfl'rupt.ur ....1wllil,'itsll'" k,'is drawn.Tlw rd llrt"il.ist· ilsic rtuIJ111iJi11 1.1w slruk" S!·'11l"1...,os" ra( 'ltilit'sl, ,.hi1rild,I'r f"r theon-HueCCll.Thinning isn'ljlli n '.J III n'IJlln'llll'illf"rmal ,i..11w·,'d..,1fill'1111'lII'xl step. After[lTcprUt;l'Ssillg,tilel')(tritcl cdrl';lllln ","f t.ln-1\1·1...liml'!Isj'III;11d"r" IIl;llrix willbeusedtomatch ilg" illstilsd"fPI·"..~1.llrt·,1rCilt,mt'S"fl.lll'rt'f"I,I,lI, ....1Cllirll'S!' characters.ThelJ('st1Il;11(~hiligwillI",1I~..r11.u i,I'·lIliry !.lll'd" II'a"1.<'r.'1'"1"1',11"",Ill"
recognit iontilll(~IUIIIto1\(:111,:\'(' itIliMItfl't'lJll"i l ilJllnth',i'1111111 1sti,W'II 'l"t'l;ni 1.il)l!
subset ofChinesedlilr;,clerillloslllidlJ4rollp.~,tllf'lI l,lll: lil,n l ,IIS':flllliliitli"nidl'lIl,ili.'s thecharacters fromeachgr<JIlI'.
10
"J(,gi,~sinceltisgiven. The problems of printedChinesecharact errecognition,
on-lineChinese cha r acl er recognition,andha ndwritt e nChin ese cheree ter reco gni- ticnwiJJhediscussed.
2.2 O n -line handwri tten CCR
Ch i-lineImlu lwritlcllCc.:1lis simpler th an printedCCRand off-Iine handwritt enCe R
Il('cn.u~ethotnechiuecancatc htheaccura tesequenceofstrokes.Inon-lineCeR,the
syst e monlyneedstorecognizelessthanone hundreddifferentkinds ofstrokes. As
loug/1.l\theumchinccallrecog nise thest rokesandtherela tionshi pol the str okes,itcan
n'l'UgniZt:III'to several thousand s Chinesecharacters.Manymeth odsareavailable furon-line classificnt.ionof handwritten Chine sechara c t e r.Theyaredescrib edbelow.
2,2.1 Featur e analys is
Aset offl'ahur~canreprese nta.handwritten Chinesecharact e r.Thefea t uresmight I'l'Ims('d0111IH'sta tic properties ofthe cha rac ler, thedynami c prope rt ies,orboth.
TheIcaturescanbe binary . With binaryfeatures,thename assignedtoa known rharnctvrisoften determinedbya decisiontree[Ha n a ki,et al. 1976 ; Haneki and Yamazaki19801.A dislldvll.ntal;c cfthismethodisthatit may110tproduc e alte rn a t ive dHlr 1H'll'rchain-s,whicharc usuallydesirablefor pest.processing.Recently ,II.binary lIl'dsiolltreeUH' Ssi mplefeat u restoredu cethesetof candidate cha ract ers toasmall
~l'tforsubseq uentan nl ysi~by complexfeatures[Kerrick and Bcvik 19881.Thefeatures
11
can alsobenon binary.A fixednumberof nonbinaryfeaturesiseomutouinpllllc'TLI recogniti on ,and many classificat ion mclilodsarcavail a blefo r dividingsuchaf.'utHr,' space intodecisionreg ions. Forexa m ple,II.multi-st agecll\5sifi"rwith~('ll<'r:lltree struct ure based on the divid ingofWalshcoefficients11Mbocul!"Vt.[o!1I'11I(:u,d tIL 19831·
2.2.2 Time seque nce ofzo nes.dlroc tlo ns,01'extremes
These methodsrelyprimar ily on dynamicinformatio n.A sequenceof"",1".17."Il'·s canrepresentl\cha r a c te r [Engdahl1977;llanekinnd Yamnaaki1!lllO]. The~" Il ( 'SM,' specifiedbydividingupthe rect anglethat sur rounds th e wriHt'lIrha rl,t'I.I'f,1I1l'1I 111l' ch ara cter is superimpose d onthe rect an gle,and1111.'SC I IUCIl Ct'ofl,Ollt'Strll.v,:r~",lI,y thepentipisdetermined. Thissequence, ora correspo ndingseque nceufr"iltIHl'J;, assigns a name tothe unknow n char ac te r,ofte n hy ex a ct matchfmllln,Iidinuaryor zone sequen ces .A sj milarmethodusesthesequenceofdirec:tiollsflfW'll till lIl" l ion duringthewritingofacha r ac ter(Cha ngnudLaInaiCraneandS;IV"it~I!J77;Grolll~r 19681.Using fourprimitive directio ns(up, down,left, right) ,one systemf",J,~dtill' firs tfour directionsof thesequence and thenclassifie d thedmnld crIlytaM,~l""kul' wherethe table had 256entri es IGroner1968].As the1I1111lhcr,Adiwd i"l1 ~uml time intervalsincreases,tablelookupbecome slesspradical,nurltill:S'~fj lll:Il':':Sart' compared by curvematchin g.
12 2.2.3 Curve matching
Cur velIla tdlingis apo pul arimageprocessing method,Curves fromanunknown characte rarc matchedagainstthoseof prototyp e cha racters.Thenameofthepro- totypethd hestIiLIL1c!Icstheunknownis assigned to the unknown. Thecurves are usuall y functionsoftimeauch asprepr ocessed:randyvalu es, orthedirectio nangle of the tangenttolIl Ctrajectoryof the writin g [Ishigaki and Morishita1988; Ishii ]08CijOdaka,otal.19861. UsingFreeman code,n characterhas bee n dividedintoten rl'giolls[u,otill.19671.SinceChinesecharactersconsist mostly ofstraigh tstrokes, upproximntj ngtheirstrokesbya sma llnumberoffixed points has been fo undsue- n::;s(1l1[Odnka,et111.19821.Analle rna tivcto thematchingoffunct ion s of time is thematching of."'ouriercoefficien tsobtaine d fromtheJ(I)andY(l)cur ves[Arakawa, t't"I.1978;lmpedovo19lH).This meth odisappr opri atewhenthe characters can herepresentedbyareason ably small number of Fourier coefficients.Since straig ht- Hill'strokes requirehigh-or derFourier coefficients,this method hasbeenusefulfor cbar ncl.crs co nsist ing mostl y of curved strokes, like numera ls,or of concatenations of many str aightstrokes, like Chinesecha ra cters (Ara kawa, etal. 19781.Cur ve match- illgbecomesequivalent to pattern matchinginfeatur e space whenthenumberof I'lJin lllc1mrnclc rizillg thecurveisconstantand thereisa one-to-onecorrespondence IO<la ka,ctnl.IUtl:!l.Thisis alinear alignmentof the pointsofthecurve.However, dueto uonliucn ritics,the bestfitis usuallynot a linear ma tchi ngor alignme nt.For manysequencecompa riso n pro blems,r/l1.~fil'lilli/chillYhas beensuccessful[Ikeda,et
at. 1978j Setoand Adachi 1985;Wakalnuaann Umcda I!hl:i;Yllshitlll;\lHISllk".' 1982J.Becameelasticmatchingiscomputationallyintensive,tIl('I'r"tot YJl"sart'n'·
quiredto be first pruned10 reducetile co mp utation. Application of a 1000al allin,' transform a tion canenhance the shape discriminat ionofe1asti.~matl'hin~.USill);lln- pointcorrespondencefromclasticmatchingbetweeninputandrd l'fl ' l 1l' I'puttr-rns.a deformation"ectorfieldisgeneratedand then approximatedby meansofill'fa tiw' applicationsof local affinetransforma tions .Finnlly, furtlwrclusf.ic1I1l.tch iul;lu-Lw....-n the input patternand thedcfor rnerlreferen cepnttcru superil11l' nst'(11Iylow ortl,'r 1"<'111 affine transformat ion com po ne nts enhances shnpe discrimin;ltion ,I.lllvlllg tIll'r-rrue rate(Wakahata1988),
2.2.4 Strok e codes
Thestrokecodemethod classifiessubpartsof a chnrectcrunrl thenitlelltilit·stilt, character(romthesequenceofclassified subpart:.IGr(~lI.lIinsuI1l1 Yhupl!lli2jlIilll;- Hua1988 ;Kuo,ct al. 1988;Linand Teai19881.Onesystemuses7f)slrok" codl:Nof constitue ntshape s to specifyandrecogn izemerethan threethousunrlKnuji,:!mr,wt"rs {Yurugi, et al.19851.Strokeclassificationuses the sequence
o r
dir",:li"l1IllI,;ll~S.Tlu-Il decisiontrees ofstrokecode sequencesunder relati ve positionalconstraintsI'llslr"kc.~classifythe radicalor character.Anoth(·rsystemusesilformalism hasell111'''":UI initial stroke·sequence decision treeantipositionIllatdrirogICh" n, d:.1.I!lHijl.TI,i.~
for malism hastheadvantageofusing thefeat uresI)rstflJk(~s,stmkl~'~'~IIIl"II C'~,allli
14 I;f:Ql/lci ric relat ions butavoids thedisadvantagescaused by the instabilityofallof llic abovefeatures.
2.2.5 Analysi s-by-synthcsis
Yet anothe rapproachis analysis-by-synthesis,somet imescalledrecogni tion -by-gener a -Jion,Sever nlstudiesarcconcerned with the modelingofhandwritin g gene ratio n[Ya- sulmre1975;Morisllila, ctal.1988).These methodsusually use strokesandrules fureonneeringthem10buildcharacte rs.Characters generatedfromthe inventoryof Hlroh's cons tit uteide alls cdstandard representat ions of thech aracters.An apprOX- imat iontoreal handwrittencharacters callbe attained byspecifyi ngthesestrokes with math em a tical modelsthatdescribethe motion ofthe pen tip as a functionof umc.Theil aImnd wrilten charactercan be divided into strokes.The stro kesare cles- sified using themodel parame ters,andthe strokeseque nces[Yoshidaand Eden 19731.
A similarapproachusesdynamicprog rammingto matchrealandmode ledstro kes
IWakahraandUmcda 19841.Duetoitsoptimality pro perly,dynamic progra mming
ca nheused 10 obtainthe minimum dist a nce betweenan inputand arefer e nce pattern tohandlethe proble mcuusudbythe distortionexistinginthe in putpattern.
2.2.6 Other methods
Perceptualstudies ha vebeenins t rumen talinthe developmentof pair wise distinction methods whichsepa ra tes each pair of characterstha tmightbe confused. Studying
the way humans distinguishbetweensuch pairsledto athlvry ufchaml"lI'fSI"' M...I on functional att ribu teslCox,eral.1982;Wata nabe.l.'tal.1!I,sSI.I'airdidi nd i" l1 by fundionalattributeshlU'led10robustr~ogn it ionlI\t"thOlt••Ilotahlytl.al intI...
commercial,ystem by Percept Sometim esthesame IIttribllt edilfell.·nti al ....IImr,' than a pairofcharadersISakai,et al.191341.Anothermethollrejlrt'llenh a dU\fad" r bythenumber,order,andrelativeposit ion ofshokes; SOlUeshokt'~awIlividl..liut..
more parts,pnrt icula tlythose ofcharacterswith few1I1ro ku[1{lIIO.t·tal.l~ lT l\l.'l'llt' statistical method of Mar kov modele ispartlculaely~uitl~hlcroedYlll~mkInrOfllliltiu1\.
[Farag19791uses afirst orderMar kovmodelwitheight~t<tlCIIcorrt'lIptllldinf1, t"l'ip;ht pen-ti pdirections.Asyste m unifiesthe statisti clliandIiYlltl\clicalilllpn t<u:llt...fur"II' lineChinesecharact errecognition[Tai andLiu 19901,AfllIlly atlrill1lktlIinitt··..tlltt·
auto m aton isintroducedforstroke recognit ion.AccortlilL l:,tutl...intrinlli....strllt'lllft ""f Chinese characters,a two-dimensional eherecteriSlrllndoflucdintol\0111'fliIlU'II~i"II,,1 att ributed string on the basis oford erarrangcmenh ofChillC5Ccharad ers.SlIdl stringscanbe easily recognizedby templatematching.
2.3 Handw ri t ten and pri nt ed CCft
Handwr itt en(off-line) and printedCCRarcmoredifficultHlILnon-linecell.IlI'f:I~USt·
tlteformeroneisperformed....rter thewriting orprinting is enmplc tcdandtll<~:d'm~
huno temporalordynamic information~uchlUIlIuUllJcr{IfIIlr...lte!l,orr!,:r."tIl"
strokes,direct ion of thewritingforea.ch stroke,orspeedof till:writingwit hinl'ad1
16 .~1. r<Jkf'. Mor'~JVf,r,lHltl<lw riLlf~1l
cell.
is tile roost difficult aspectofcharacte rrccogni- tion,IW";'IIH"lll'~ lI"is'~saL dCIJWlIlsendthedistor tio nsinstructuresarcdealt with silllllltall'~Ollsly,eSI',~dallytholargo scopeofdisto rt io ns produc edbydiffere nt writ ers . TJll~1,I'f:llllifPU~SlIS1·d illhaudwrlttenandprinte dce ll
canhe roughlydividedinto .~llLlisLif:;llandstrnctural approiu:hcs. The sta tist icaldecisionor decision-t heore t ic~Ipprll..rhinv" lv.,;the use of transforruutlon functions,distrihut iondecision lunc- l.ilJlIs OTtheirI'lluivafl'l1lFnuctions,.~ll("husn;lYI~sianclassifier,statisticalequivalent bhn-k r!ilssili,' r, IUI'I.~"uu. Tho sYlIliu:ticalorstructura lapproachusesvarioustwo- dillll'll.siullalgrll l1lrnilr.~forcharacterdl~~cripti{)n,parsersforanalyzingthe str uct ure ofauunknown charnctcr, Hud'lUrillllt.cu graphsfor describing charact ercomponents andCUlllllOlwllls rdilt.illllships ,
2.3 .1 Statistical te ch niques
Tetup lnt enmt.ch ing
Thet'ilflil'St.lllprmwhforCeB\\',1Srepo rte dtwentyfiveyearsago {Casey and Nagy, 1!Jli(il.'I'lli'illll,lllIrs usedoneofthesiruplcst patternrecogniti ontechniques:tem pla te- lIlatdl illg,1':" ,.]1 rharill'l('ris asslguediltemplate or maskwhichis ama tri xofhlack 'Hulwhit.· plxols. Tod., s~i ryagivcn rhnractcrsample,itsmatr ixis compa redto all tentplntos. Classifir1l1ionis achievedifOIlCofthetempla tesprovidesasufficiently
~,It)dnwtchIII1111' dmrarh'rSil1l1]llt·.To speedthematching,atwo stagematch ing III'm't'SS was i1l11'1ldUrt'll.Thill. is, similarcharactersweregrouped first , thenmasks
wer e gro upedand finallyindividualmaskswere employed.Ing"lll' fal,1I1is mctluul involved expensivepixel-by-pixelcomparisonoflilt'matrjx oftheinJlutdl1lt,H'krami thetemplat e.In addition , suchmethod isonlyapplicnhle toprintcilcb:,tad" r~in which the size and positionof theradicals can benlmostCOI1~ll\lItaile! sta hl,'.111 thecaseof handwritt en characters , normaliza tionof acha nlcterdu.,~1I0t n...'ssar ily mean normalizationofaradicalwhich const itutesthe subpctte-uof lI... rI':\f<,<'t'·f.
Tr-ans fo r mn t jo n
Fourier,Hadamard,and KL(Karhuncn-Locvc]trl'llsforlllati olls1mVI'h.'t~lLnppli"l!tn pri ntedChinese character recognit ion[Nakata,d nl.1972;Cu,t'l:11.1~11l;1;Sakai, etel. 1976;Leung 19851,butonly KL hasbccnused forhorhhuudwtittcuIUIlI pri nted CCR.One of themostattractiveproperti es ofth"lwu·lIilllt'llSillllalI·~,u ril,r tran sformisits abilityto recognizeposit ion-shift edpatt ernssinceitOhsefV"Sthe magnitud e spect rumand ignoresthe phase.IliswellrecognizedtlmttI",pre(isinn ofcenter-locat ion is a problem forthescanner,and itisaliticipat "l1 lhatwill alslJ be a problemfor ident ifyingprint edChin esecharacte rs.ThelIadallmrd trallsformis more acceptable inhigh-speedprocessingsince itsurillulIdkccrnpututiuuinvolves only additionandsubtractio n.Themajordrawbac kofallpliclitioll
fir
tl1isIt:d llliIIU'' in patternrecognit ionisthatitsperforma ncedependstooheavilyIIp'JIIOil'position ofthepattern,Inpart icular,KL was verysuccessfulin printed CClt,ill whichthree rn lhog'JIllLl axeswereusedinordertoabsorbthe vari ationsofdisplacement andwidth oflines.
18 lIowcvcr,more thanlensuchaxesarcneededforhandwrittenCCR. Furtherm ore,it isflat practicalclue tothe heavycomputa tioninvolvedindiagonalizingtheN2x1\'2 correlationmatrixcorrespo nding tothe sample imagesdigit izedonII.NxNgrid [Leung 198.11.
ForChinese characterpatternsrepresented by their oute rcontour,Four ierde- scriptorsateveryusefulinrecognit ion[Krzyzak and Iluaeehi19891.Amongdifferent technique s,Fourierdescript orsaredistin guishedbytheir invarinncerelativetothe standardshapetransformations such as scaling, rotation,translation,andmirror re-
1I('~.I0I15.Themaj ordrawbacks ofFourierdescript orsare (1)theirinsensiti vity to
SpUIS011thebonndaey and (2)disconne ctedpatt ern s giveacompletelydifferentspec·
trum and stylevariation sarereflectedinthelowerordersp ect ru m[Verschu eren,et 111.198t11·
Strokedist r ibu ti on
A distributionof localstroke feat uresca n betakeninatwo-dimension al plane or projectedallaone-dime nsionalRxis.A popul arexa mple istoconside r thelinedirec- tion sasthe localstroke features.Suchascheme iscalleddirectionmat ching{Yasuda
<loudFujisawa1979;Saito,et al.19821.The bound ary direct ioniscalculatedateach boundarypointby followingthe contourbetweentwopre·and post-pointsalonga binary patternon a 64x64 gridplane.Thedirectionisquantifiedintofour directions and mapp edtoa.16 "16 plane.Thismethodisquite simple, however the recogniti on rate isvceylow.Forimprovement,size normali zatio nandshiftsimilarity should be
Ul used , Anot he rexample ofstroke feat uredistribut ion011Uwt...o-diuwusiouulplum- uses st roke lengt h[HagitaandMasuda1981}whichis consideredns a s\l(...~il~l,'us,'uf the distan cerepresent ationintroducedin theReid cITedmetho d IMuri, d el.1!I'j'I).
Ateach black point,eightquantizeddirections arctakenandtill'di~tal1t·t·iiiIl1t'1l.lillr.·,1 alongeach direclionfromthiscente rpointtothebounda rypoint.Then hl;{ltli~ta.lIl'\' valuesalongtwo oppositedireclionsare suuunedand a Iour-dinu-nsional dista llt'(' vector canbe obtainedat eachpixel ona 128 x128 plane.Toget1\compactf,'a t urt' distribu tion, this planeisdividedinto 8x8 zones, eachorwhichi~orHi"1fj,and thevectorsare averagedovereach zone.Mat chillgisdonehytilt:litH~I.IIt'dMOD rult, (mi nimum distan ce decision] which is essentia llythesameas correlation. Tllis sehenn- extracts mot e globalrcat ures than a localdirect ional Fcutun-Slithat5"111"""lII ph~x featu res suc has intersectionpointsare reflected .Jloweverit CillLn"t dislillgllilihth"
cha racters which Me verysimilarto each other.
An almos tidenticalmet hodwasusedwhich triedto recognise printedUlli lll~M' charact er onthe licenseplates ormoving vehicles[Dni, ctal. WIlIlI. A stMlllanl X andYprofile was definedfor eachcharacter, andfor11givellchurncterSI"IlIIII~, itsprofiles wereconstructe dand compa redtoallstanda rdprofiles. Thecrite riun forrecogniti onwas thepa.ir ofprofilesyielding theclosest ccrreapcudence. 'I'llis ap- proach yields someadvantages:1)Using twoone-dimcnsionnlpattcmsperc!lam.;t':f as opposed to onetwo-dimensional pat ter nresul ts inconsidemblcitl(oflmLtioll rcrluc- tion .2) Theprojectionprofiles are easily extracted[rumtheuriginalJ11111t ~rn. :1)
20 Sincethe project ionpto lilcs areobtainedbyanintegra tiveprocess,theytendtobe It'SIIsensitive tonoise.However,theproject ionpro files5u lTer~frompositi on errors betweenthe illJlutand atan dardpa tterns.In addit ion,differentcharacterswiththe n,,,,eprojectionfilescannot be discriminated.
Bnckgr o undfea turedistri bution
Thebackgroun dfeaturedistribut iontechnique[Su en 1982;Naito, ctal.1981]isbased 011ll.~Iigh tmodificatio nofGluckman' s wellknownmet hod ofba.ckgrou nd featu res cxt racl ion[G lucketn nn,1967J. Forevery backgr oundpictureelementofa binar y pet tcrn,scanninglinesMCderivedinfourdirections,top,beucm ,left ,andright.
Ineachscanning, thenumber ofcrossingsbet wee nthescanning lineandstrokes iscounted. However,thisdoesnotgiveanexactstrokedensity in eachdiredi on, beea usefourqllanlizeddirectionsarenotnecessarilyperpendicular tothestrokes;
sometimestheycrossthestrokes lllngentiallyoreveninparalld.Forimpro vement,
1\crossing is counted only.henCl\ch directionisnearly perpendiculartoII.stroke.
Bnckgro un d.'lutllysis
lnstcnd ofpropagating blackand white informationalinGlucksman'smethod,more exactinformat ion011shokesbeing propagatedcan beextracted. Theideaisto propagateedges.namelyedge valueand direction[R.Oka1982;Yamamoto19841.
Int1.ismethod,acellildefined011cellularspacewithmeshes of7x7andeach cell has eightintracellsforeightdirections.Eachint racell storesthestrengthofthe
21 edge (e dge value)whosedirecti onis jus t norma!tothe direction of tlJillintrlwd l.
These intracellfeaturesrepresen tgeometri cfeaturesof theinput patternnrouudtill' cell.Foreachcell,the edgevalues ofall ih intraccllsarc nvcrng ccl. Dulytile'S" cells whoseaver agededgevalue exceedsspec ifiedthresholdvaluearesdecl l,1lUlll IIS("!
inmatching ,whichiscarr ied out by distan cemeasurement.Howe ver,liStill'i1ll;1~"
qualit y of theinpu t cha rac tervari es,itisdiffic ultto selecl an unequ al thn'slu,hlvulm- even for twodifferent qualit yimagesof the salliein put churnctur.
Stroke ana lysis
Strokean alysisis themostttllditionalapproach forChinese-dU\tlu'lc rrccuguition.
Gen erally, st rokeanalysisis basedon theskeleto nobta ined hy tilt,thilllLillJ.;IIr" pr"·
cessing , bu t itis well known tha tsuc h resultsarcnotsa t isfactorybecauseof IIHiseand disto rtion (Kimura,etal.19781.ForprintedChi nese chnmctcr s,thetypicnltyp es of noisearedistorte d inters ecti onpoints ,whiske rs,and brandies ca usedbytouching str okes,since thestro kesof print ed characte rs arc usuallyvery thick. 'I'hcrcslill remainsa majorpro blemofstrokesegmentation afte r t1linuing.Howe ver ,lhinllinJ.;
preproces si ng isstill attractivebecauseofth esim plicit y of the alg orithm. li usicrn- search continuesonthinningalgorit hmsand theirap plica tion to st ro ke5eg nll~lI tatifJlI aswell[Pavlidis1982jWakayama1982;Lieoand lIuang19901.
Onthe otherhand ,to adm it thatnotallofthenoise N()UTCeN mentioned':11111m removed,aprac t icalapp roachisto considersome non-loc alhut st ill"j l1J "lf~I",i",' removalmethod ,whichwouldbeeffective against tile major types ofII'Jisr1.IIIthili
22
~CflSC,an ideaofthe so ca lled"geedcontinuity rule ofGest alt psychology"is very usefultoremove such noiseasdist ortedinterse ctions .In{act,thisruleisused inIL verysimpleway ofremoving noise and dete cting strokes[Kesve nd 19791.All pairs of segmentsjoiningat11.11intersectionpointare needed ,withsegmentcontinuations being measuredaccordin gto some conditi on s. The pair ofsegme nts withmaximu m continuity whichisgreater thansomethreshold value is chosen as onestrokeand the rest nrctrea tedin rhcsamem an nce.
Forcxtraclion ofthest ablest rokes,atech niqu eusin gHoughtran sform (HT ) was proposedrecentl yfor ha ndwritten
celt
[C heng ,et11.1.19891.Firstthecharact er patternis thinnedendtransformedfromthespatialdomainto theparamet ric one by liT.As most slrokcs ofChinesecharacte rsarcalmostlinear , theycanbe easily deteetcd aslines byIIT withintheheavynoiseimage.Thisis anew approach tothe applica.tionofH'Tand a new attemptatthe strokeextraction of handwritten Chinese charl\ctcr.The methodis still verylime-consumingasno precJassificationexiststo reducethenumberof matching characte rs byusing those features obt ained byHT.Co mb iun t .lou schemes
For therecognitionmeth od sbased onfeat ureextracti on, each character category is madebyfinding thereference vectorwiththeleastdistan cefrom the input pattern . Motivatedbytherequirement ofseeking mereeffectiveandmore reliablefeatures, muchresearcheffor thasbeen made and various cha racterfeatur eshavebeenpro- posed.Neverth eless,itis apparentthat none ofthese features can yield sufficient
accuracywhen usedalone.Thi~problemi~l'iIllM'I 1b....IWIl illlll'rt' lil,lr,l\I'l'ad,~nlill' monto allof the features.
One of these dra wba cksi~thelarkuflli~tTiI Hill ;\lnrrinfurmalion,Th''1'n'tk all ,\' speaking,the purposeof featureoxt ractton istil ;ISSIll'C'rl'linbilil)'..
r
n'n',a:.uitiull1..1' removin gredund a ntand irr c1e\'all linformation1ltul 1'1l11111l..in~Llu-s"I,,,ral,i lilyi1Il1Ull~pattern classes.IIIpractic e,feat u res arc oftencxtrurtedhylllt'a nSofsum,'1lI" il~lIn'·
mcr.ts o,)fthecharac t er pattern.1\ 11tlver"l1dfel'loff('"I,III"\'I'x\.r;u·tiullis I.hal.llIlII' l!
redundantinfcemat lonisremoved, hilt rora sUllllllI11llllH'rIIf dlimlt:l.t'rs ,"'lIll"iru portentdiscr imin a to ry infor ma tiollislus t. 111tile(:hill" s,'dl1m,,'tl'r~<t'I" thr-n-an' many pairsof similarcharactersrhatdifferfrom('a rllnthor (July slip,htly.If S1ll'h,I significantdifferenc e isignoredbytlw tile mcasun-nu-nt,1l1111Jigllitywillure«-IJd wt'l'Jj these charact ers. Worseisthofa d thatS01JleIlissirllililrl:hlll'ill:t"I'S1I1;~y/111'11'vpry simila r fea ture vectors.Ifthereferenc evI'I:lo rs oftll(~dI1l"'1I:11~rsare I'rtlwlll,,1 dlls,'ly in the Featurespace,rccoguitjnnwouldheveryIlilJiC:1Jtt . Fur l'xil1l1j1Il',L111'f"lIl.lIfl' vectorofinputpattern may Jcvj'11efrumil:i refcrencu ill. Il slIlllllll ist,tlWf', liSi,~on"~lJ the casefor a somewhatnoisyc1l11r1lr 1.I'rsample, amih,~dl),'iI~~tto 11fl{I'l"lml'f~'11":\."1' ofanothercharactercategory,rt.~g\lhillginitlrliS. WI:og llil iUll tlliltisrlitlieult.ttl avojll.
Anotheressentialweaknessof tile character feaurn-slil:Sintlldr low sllrl,ility againstno ise Ofdistur ba nce, In tiledllLf,u:l.ersa\ll"ll~n:,u l fromlld uldIlr;nt, ~,1 docum en ts,theremay existmany killd.'lor dist.ur hnncu, such asI,hlfrl~11sl.r" I«:,hrll' ken stroke,position al shift ,charac terrotntlnu,strokethlck ness variutlon,and n"isy
24 pnints,t;vcry featureisparti cularlysens itiveto dist urbanceswhichcanconsiderably Ilffccltheresults of themeasurements onwhich the featureisbased.Undersucha disturbance,il.grcal dist antcwillexistbetweenthefeaturevectorsofan inputpattern nnrlihreference,therebydegradingthe recognitionperformance.
Ilisnaturalto consideranappropria tecombi nationof some method s inorder tilga inII.bett er result. Becausedifferent featuresare obt ained by differentmea- surctuc nts ofthecbnmctcrpatterns,itisreasonableto sup posethatthefeatures utaylmvcdifferent characterdistribu tionson theirfeat ur es spaces.These ch aracte rs troublesomefor certain Icntue csmayhe verydistinguishablefor some othe rfeatures.
Thus,the sepa ra hility among thecharac tercategories willbe grea tl yenha nce dif sev- eraldifferentfea tu resareutilized jointlyin recognit ion .This stra tegy is commonly ndoptedforChinesechara ct errecognition.HagibandMasuda combi nedstroke dis- lrihut iolimethodand thedirectionstro kelength distri buti onme th od [Hagita and Masuda 1981).hmnediatcly afterthiswork,researchwhichcom binedthree kinds o( features;line direction,crossing count,and backgroundfeat ures,wasreported [Fujii,ct al.1981)' Forpredassificati on ,locallinedirecti on can beusedwit hthe peripher alare nvector [Ta kahashi19821_Most fea ture combinationmethodsare first appliedonhandp rinted Chinesecharacter.Themajo rprobl e mwith thecombination schemesisthatitisdilliculttochose thoseIcetureswhich aremutu allyindependent . Itiiithemutuallyindependencethatma kes thedifferentfeaturesselectedsensitiveto differentdisturbancesandimproves thesta bilityofrecog nitionbymea ns offeature
combination.To copewit hsuch problem,an approachcombiningIone illll"lwlIII.'nl features,namelycrossingcount,stroke proport ion ,antitwoper ipheralft'Htllr.,,;,hall been proposed [Zhang,et a1.1989J.Tofindoutthesefourfeatllr('~ ,Iirtlt rt'!ati"n analY$i$ of the distancehasbeenmadefor all possiblefeaturepairsnurongth.· [tlilt feature s.
2.3.2 Structuraltechn iques
Chinesecha racters are patternswhicharchighly strllctllwll.'I'll.,sl.rtu'Lun-II[(:hilll'SI' characterscan bedividedinto threelevels:thewholerhurruac rI.'VI,I,till'!,.uh[rud icals) level,and thestroke level.The characte r levelisthehiglled levelwhil"slruk., levelis thelowest.Fortworadicalsin a charact er,ouo rndicnl mayIJI~011thoI.,[tsid.·
oftheotherrad ical , over the otherradical, orsurrounded by1I1'~otherrlllli,~al.'I' w..
strokes inside aradicalmay beuncon nected,or onestrokemny contuinS(JlIl'~m!llll:d· ingpoint swhichjointhat stroketo the otherstroke.Iti~veryrlillk ultto uwdil~ skltl statisticalapp roachestodescribe such complexpntt em struct uresunrlrela tions1)('.
tweensubpatte rns. Itisthestruct uralpropert ies ofChine.t :dlilr.\ckrxthatmak,~
the struct uralappro achesvery promising inhandwritten MHI printt:d CGIl.Ill·,:,,"ll y, more effortshave beencarried outinthisdirection.Theskructur nl.lJlllrl" u·llt:S for CCRcan bedividedintotwo mainstreams,namely gril.llllllliri1IIJlrflll.du:s~,,"ll;rlLlI I, approaches.
26
GrallIIJWrllll:tIIUU
III tllll grillllll\'l1· rnd llO'] s,thepatternisrepresentedas a stringandagrammar,ei- lhl'rc'JflI,,~xl·rn~~or a restrictedr.on!,exlsensitivogrammar,such as indexedgrammar orjlfOjJ,nllllllll'flgfillllllliH,whir:hisIISC']to describethecharacters[Zhangand Xia 1!IS:I;Tai;1!llHJ. AIHlTSl' !"forthatpnrt.iculurgra mm ar is built to rccoguiac the pa t- In n.Fllrlh<'rr1'~V"I"I'IIl('lIt,,1ollA thisliuoinclude s sto ch as t icla ng uages[FuUJ82j,
r'rrurn.rn'd.ill~I'I,rsjll~,amlstu<:ha.st.icerrorco rr ecti ngpa rsing[Lee and Fu 19771.
II" w'-\'('l",~trinp;grauunarissUlllIotIHlwcrflllenoughtoha ndleverycomplexohjocts lik,-haudwriueuChilll'SCdlMilders,andthereforethoIligllcr dimensionalgramma rs (l.rn',plex,or w,'),}an- !levl·lupe d.Thesetradi Lionalgrammar ap proachesarc still wr-nkill 1l<lll dl illE;lloisyOfrlistort.lonpuu.cmslind numericalsemanticinformation [Tsai alldHJSfll.This sl10rl w IIlingGIllheovercom eifth eatt ribu t ed grammarap- l'WIlrl,isIIst·d. CUllsiderirrg1I11'cliafilct e riNticsof handwrittenChinesecharacter s
111111Hit'('xiNt,iugI'rulJlellls,IItwo-dimensionalextendedattributedgrammarhasbeen
I'rl!po~wdIZhau l!lHOI.ThiNmet.hodearr h,sbothoverallcharactershape information
;dlll llll'JllNt;,iiNt i!' fl'illun>:;u(sillllplt'N,andconductstop-dow nmatchingand bottom- 1111 tI'dlldi,,",Astlll~Clrilll',~('rhnrnetcr set isvery largeand tiregra m mar describing ('hll TilI't,'rs insIKhit,;<'1.isquill '('Ulllplicated,it isverydifficulttodotheparsing.
Houo-.it isnul prnrtiru l10USI'grll1Ulllarappr oac hes(orCeR.
27 Graph methods
Another line ofdevelopmentistoIISCpa tte rn mat chinginsteadofpll r~i llf.\.IIIthis approach,thepa tt ern is usuallyre presented as a rclnricn nlgraphIUl,1graph1II1\Ld ILU f.\
is used ,In ord ertoin co rp orateme reinform ationinto the rclatlougraph,atlrihul..d graphsare used to combinethe structu ra landstnt ist icalapproachesI'l'lia iand 1"11 1979;Shi and Fu1983],The attr ibute dgra phgivesa verynt'xih len'pr(·~t'lIl,ntiull"f struct u ralpatterns,especially forhnndwriucnChineseChn rnctors. inwhich,'aM',tilt' patter nprimit ivesor ve rticesof the uueihu tod grapl!,'lUI r,,!,rt,,;,'ul
nit'
slr"k,'s whilt' thearcsoredge s of theattribut e dgrap h canrepre sentt11( 'rdatiQIIshipsIJdWI"'1ltill' st rokes . Inrealiz iugthat many natur alproperti esandrela tions ofItalld wr i llt~uCh i- nese cha racte rsarcfuzzy,thenex t stepis10 inclu defU1,1.y'llt rilJlltcs inlilt,a1lrihll1." ,j gra phanduse thisin forma tion for fur t her]Jrocc~sillg.Thefu1.1.yId1rihll l,·,1grap ll lor handwritt en CCRwa spropo sedIChan ,ctal.HI/wi,TheIIHlj"rdr((wlm,~k"ftilis appr oa chis thatthestructu ralpropc rties of Chinesecharacters areriotfullyIIli!il,t,,1 and it isverydifficultto orga n i zethe modelda t abaseIorcl1icie flt st:ard lilll;.1(,., gardi ng the threestr ucturallevelso r
Chinesechuructc ra,thehicrar chicul il1lr il!lll,: tl graph represent a t ion is propo sed forhnudwrinonCellilltllist111~s is.'l'hehi':n trd lica l aUrib u tedgrap h repr e sen tswholecharacterwithits vort jcc'l<d':scri hillfl,ll..,rll.fli'~ltls in thecharacte rand it s arcs descri bing the spatialtclatiolll; betwocuther;~di l:ltls,III the hier ar chicalallribut ed gra p h,aradical attributedgra phcorrcs ll"Il,linl; tul~adl vert exis usedto representa radical withits vert icesd(~sr.ri\Jillgtl](: slmk.:s1\l1l1 it s28
.~d/!Pli,!o.-:;,:ril,illgtln-rd aU" IIS betweenthe: strokes.\Vil htheIIAGR,the hugemodel ,j.,Lal,asl:GUJb(~urg'llIi1.ediL~"treewit h severallevels whichfacilitatesaccurateanti faslS(~"rd d llg.
C hapt er 3
Preprocessing
3.1 Introduction
Imagesofinputha ndwrittenCllilll,.'Scdlilr;u:lc..'nian' o),l" illl..1I,yiIvi,lpu";lI1l<'r;" i1t111 then normalizedandlta llll[o f1Ul'tlilllohillilryimi'';l '''l.1\I.ill a ryilllllgf:i.'I;u'l llallyil
two-dimensionalarraywhoreCIIchdell1el'll(pilll'l)ill,.j' hl" IlitII. 'I'1I,'d'M;u'll"
pattern consisl.softhose pixc1liofvallieI. ~:"(hslmkl~Sl~"'lIl1'lIlIIfllll~dlMlI.dl~r
patternis morethen onopixcltbick. VilriOllS 1YIM,. elfillfufl lI;,lilill011IIw"xlr;\f'll ~l fromII.binaryimage 1Iy1I(1I11'~Imliklow level01,,·tilliUltli.TIminl,uld,ar;u:ll'rpoll U·tll inthebiliaryimag«islIkd c t Ull izl:r1thruughillililllli ngill~(lrjtlllll. /\ ft" l'w"rds,llll~
skeleton ofthe inp utcharacteris trnccdtu (Jllll.ill theslrok.'se~IIII'lIl,swhich;'/'1 '1.111:11
merged to Iormrhcstrokesort1u~illp u l(harar.Ll~r.G':':Jllldrit:illflltrmLl.i" nslid .ilS1I1l' direc tion andposilillllor lllru kt-,;(lUIlh'~lbecxlr/lt:h:tl.At:I:ur <lillJ;111 llwl",silillllS
"r
2'J
30
st rokesilIlllllll~coHIlI'dioT!rdatjons betweenstrokes,strokescanbegroup edintothe ra di nd s oftill!inp uteharactur .Allthese ope ratio nsarc conside redas preprocess ing which IITovi dl:S1I1(~local propert ies of tileinp lllcharacter.These prop erti es arcthen
org;U1i~,cdfo rfurthe r image analysisan deeprosontatlon.
3.2 Thinning
ThiuningisIIprol:l~ShywlJir.h11bin arypat t ern istr ansfor m edint o anotherbinary pattt' Tlll:OlIs isLillg ofjLsskeleto n.The majorobjectivesofthinninginpattern rccognl- I,ion,Iudi"1ilgl~IJTOCe-;siligarc toreducedatastorageand tran s missio nrequirements, t"n~ hll:t· t11(~I111lUUI11ofdat il tobe processed,And 10facilitatetheextractionof r'·lIl.ur('.~(ro m UIl: pallel'll.
r..\;lIIYl,hiullingillguril ll1llshavebeenrcponcdjwakeyamas1982;Lu andWang;
I!JS5;PsvlidisI!JS2; Zhuug alillSucu19S'I;NeccarhcandShinghal 19M;Zhang and FIIIllS-I;PllllllIJlld01I andSucn19S!)].TIle thin ningalgorithmdescribedby Zhangand
!-ill('lli~simpleamifIIN!.,and can beimp lem en te dinparallel, however,thealgorithm
cuunotpn'vc'lIt('xcessil'l'erosion,solinesorCurVClItha trepresenttho truefeat uresof tln- ulljl'r ltendtoln-excessively shortened .Inoursystem, theZhang-Su enalgorith m isllltJclili(·dtu overcutuc thiserosionpro blem .
'1'111' Zllillig-SU('1lalgorit hmextractslilt,skeletonof the characte rpatternbyre- tlluvingHlIth,-eclgt'pixels of thl' patt ern exceptpixels that belongtoth eskeleton, luorder to pn'scrvctho COllll{'cth·;ty
or
till'originalpatternintheskeleton,iterative:1\
transformationsarc appliedto thehina ryilllil!;"oftill'I'll'lf arl"r path'rllntul "al'h itera ti onisdividedint o twosubitceat ious .
A 3 x 3 windowis usedtuextr a ct. theskeleton. Let,lIOrvpn'sr-ut1I", F;i\'t'u pix,·1 (i,j ),the eightneig hh ors insideitawindowarcIII10lIlt{s,'t-'Figllrt, :I.l}.
JI HI
t-r
i+l
G Q G
(;)- :~: G G G G
Figure
a.1
Eiglu lleighlmringpixdsof a pix,,1/IllIntilefirst subitcruticn, accortlilLgtothev;.Ill(!S orthl) (·ight.Iwighlmr illgpixdll, contournois delet edFrom the patternifil,llati.~li()llall tIl!)futluwillgnJlloli1.i<>lIs:
2$.%'(1111 ) $(;
N(lIo)
=
IIII'11','I' Jjr,=0
"'I
1tc,'117=[}whereZ(no}isthe111I 1I1IJc r ofIlfJlIZ,~rOlIeighlllll"sof /Ill!lIudlVlutI)is1,1.,:lI"IIII.'~rof
"01" patterns in the ordered setfll,• •,1I~.
32 IIIti lesr:cl!Ilflsubitcrntion, the conditio ns(3.:1)anrl (3.4)arc changed into
(3.5) (3.6)
illlelt111:restremainLlu~same.
lIytilt:I;Ollllit ioliS(:1.:1)and(:lA)of thefirst eubit cratlon, the sou t h-castedge pixels IlIIlIlIl('north- west curlierpixelswhichdonotbel ong tothe skeleton arcre moved . Similarly,thepi:-wJ11:I IIClVCflII}' tlwconditions(:1.5)and(3.6)inthesecond iteratio n lIIiglllII{' 111I11rl, h,wl~11IOIIIIII,Irypixelor asont h-oastcornerpixel.Theiteratio n fUI11ill ll C1(until110morepixelscallberemoved ,
ItwasruncludedbyZ!lIIugami Snelltha t :
•Bycoudit.iou(:1.I)till'endpointsofa skeleton line arcpreser ve d.
•Also,funditillll(:1.2 ) I'n:vcnt s tile dclot lonoftlv-sepixelstha tliebet weenthe end poinls of"skeletonHue.
Iftheulgcritlun1Mapplil'd011thepatternill Figur e 3,2(a),the patt ernwouldbe t'XI'I'Ssh '('ly slmrtcucd.Thispa tt e rn l'OllM is tsoritho rizontalsectionanda diagona l
~wfl,i{lll1';11'11ofwhirhistwopixelwide.Arte rthinning,the diagona lsection is deleted 11I'I'aUSI'or1.111'o\'t'rerosion orthealgoritllln(st'CFigure 3,2( b)).
••••••••
•••••••••
• •
••
diagollal• • SN:tiUlI
••
:1;1
••••••
• •
Figure3.2Exampleofover erosionby ZllRlIg·Snenalgorithm.
Inthefirst itera tion,the endpixel(6,1)isdclcl l'll Ill'callw thispix!'lsali~li,'s allthecondition.orthennt subitemtion. AncrIIixcl(Ij,l)is,[d,·tt'd , illlilt:s,',··
ond subiteration,thepixel(6,2)isdeletedbecause it.ntililk'l;nil thecOll!!itiollllfllr deletion.No otherpixel.ofthediagonal segmentcanher<-1110Y\:l1Illu inglilt'lirsl iterat ionas none or them .at isfies all the condit ionsorthelirlltor5<-'£01111suhit.~rl,ti"lI. Inthesimilar way,pixeb(5,2)and(5,3)of the dilLf:,onalli('l;lIIc/ltarerem" y.,.1ill1114' second iteration,pixel.(1,3)and(1,1)ofthedilLgollftlllC&lIwlltart'rt:III " ""..1 illtill' thirditeration , and10 on.Untilthe firth iteration,nopixel CAlihe(IIrtlJcrn:III"Y"d from thediagona lsegment. Hence, only one pixel(2,5) orthedinWJllnl""gUll:llt is preservedalter thinning.
Itis clear thatthe problemofover-erosionisvery sl:vcterortill''l.IHtll ~.SIlI·lI algorithm.1n nnextreme case,whenILdillgoflulscl;lIll:llL ortw" pi,.dswi,]"illv"ry long,thesegmentwouldvanish entirely.Thiswillincreasethedilliculti()s illfurtlll)' imageanalysisandrepresentationofthe tru lyfcat urClirQrdl1l.lar.t"rrCC<Jl;lIili.m.III
34
orde rtoiLvoid theover-eros ionproblem,a modifieda1l1orith mis proposed. Inthe finllubiteraHon,in..dditio nto the $et ofotisiuleon dition s(3.1)to(3.'),aleiof alternativeconditi on s is alsointroduced:
N(fl.)
=
2 Z(UI)=4(3.7) (3.8) (3.9)
Theecntour pixel"11isdeletedifci UlCrtheset oforiginalconditionsis sat is fiedor tllcset ofalternat ive ccndi ficrnis,;a,tislie<:!. The se toflllternativecon ditionsi.used to guaranteethatifthepixelI/ois onadiagonalsegmentwhichistwo pixels wide and its ncjg hboring pilei'IIi.notonbackg round, then"0isremoved.Therefore, the~&mcnlat"Ibecomcs one pixel wide and thepilld "lcanbepreserved inthe followint;ite ru ion.
Similarly,i.the seeendlubiteration."0is removedifeitherthe setoforisilllli conditions (3.1),(3.2),(3.."'), and (3.6) or thefollowingletofaltern ativeconditions isaa li. Goo.:
II . ' (11:1
+
II~+
tI~)=2N(lId=2
Z(ltl)=4
(3.10) (3.11) (3.12)
Withthemodified algori th m,theoverer osionpro blem isovercome.Thecompar- isonisillustratedill.'iSl'tl,'3.3.Figure3.3(a)is theori~nalpattern .Figure 3.3(b)is
theskeleto n of thepettcrnobtainedloytheZllll.ng-Slll'lInlgorttluu Iti~ ohvi,lll ~lIml the diago nalsectio n ofthe patternisovershorte ned. llowcvcr,tlu-dingulI;Ilsecfion ispreservedbythemodifiedll\goritlllll(sCCPigllT(l3.:I(c)).
1G -1-
(a) Originalpattern(h)Zhung-S ucn.dgcnthm(c) J\'ludi£icdalguri t hm Figure 3.3ComparisonsofthetlriuuingIIlgo rill'llls.
3 .3 Traci n g
Afterthi nning,the skeletonof a chaenctur islrncedillIm lert"I~xt. md1I1<'lir'"
segments.
theskeleto n.
pixelsintheskeleton.
points andend-points)arcdetected.A segment can becxlrac:l(~11flylradlr~fr"1lI
36 aileIcet urcpoin t tothe ot her.When askeleton pixel isvisit ed,itismarked and itbcnordine tcs are recorded. Dyscan n ing row byrow Ircm top tobottom,ifan unmarked pixcl(ca llcd{(Jt;fIlillflpoi"t)is found, theremaining skeletonconnected by
thispixelcan be tracedandmarkedandthecorres po ndin g segmentscanthen be extracted.
lhjilliliOl! .1,:/A'~"I!fmlli'lIlII/lin t"'S~isa pixel connectingtwo line segmentswhich
!J~IVCdiITcecntdircctions.
The markedsegmentsare firstencod edbyFreeman 'schaincode.Thecodes"'An to "'finare used toindicat e eight different directions
;
,,, . * ... (:. ... il;
,, asshowninFtgure3.4.: E-- A :
, ,
, ,
, ,
:I)
c II :
Figure3.4Codes and directions for Free man'schaincoding.
Eachsegmentis coded fromtop to bottomandfromleft toright.For a segment, codingstartsfrom itsone feature pointon toporleft andends atanotherfeature
!,<linton bottomorright,respectively.Thecoding ofasegment in aloop startsfrom itslocat ing point"e"and endsatthe point"e"along an anti-clockwisedirection.
Sl'vcrall"X1UIlillcllau'givenin~'igll re3."'.
(a)
-
,
uu" "
" -
"
"
"-
"
u" -
" - - -- - -
(h)
:\7
For example,according toFrt't'llIiIU'St:1wiurode,t.heSl~glTll'lIt"I"ill Figllfl' :Ul(a) has chaincode"AAAAAB".Itsta rtsFrom"WlindcUlls ill.".",Fur Sl'gllwut "!i"
illtheloopinFigure 3.!)(h),its chain codestartsat
*
,mIlcrllislit•.'I"ll~ehuiu codeis"DCCCC B A A AA A AIlGG G U FEEEI'~I':I'~".Thelunp cau hesqmr;'l,l~llilltu line segments withdillurcntetirt~clin1\s, TIll'S1~parilli(l1\isIll' rful"1llI'd<UTl,rolill~ttl theserules:I.Eachlinesegmenthas unoortwokindsofl~III(:.
2.The code ina chain is ddillcdasapri1l1iliv(~':{Jrll~iftill'(~lfl(~IliI.~tlIaj"rityill thc lincsegrncut.
Theprocedureis asfollows:
L Scanthechaincode of asegment.
38
~.COll1l1 tilenumber ofdifferentkindsof code (denotedasNJ)and the number orco,leswhichart! consecutiveandidentical (de n oted asN,).
a.
IfN,Jrf~/ldu:sthreethcuthescannedsubst ring ofthe chaincode is selectedasaIilie segment.
<I.Ifthesckc1.edsubstringis100short(say, less than fo ur), thefirstcode is
ru:gl,:f.t' ·ll.n(·c l't:.~~( :N,j1.0less thanthree,amicontinuetheproced ure unt ilNJ
."i.If1,lwtnunhcr()fconsecut ive1l11dident icalcodes(N,lisgreaterthanthree, then tilt:dmillemIl'is sdt:clt'dand the corresponding lineseg mentis extrac te d.
(i.Iftwo lille segmentsarc rcnncctcd toeach other,andhavethesameprimitive code,1I1l'1ltheyHI'cUl1lhilll,,1,Hiliane w line segment is selectedtoreplace
t1WI1l.
Fur CXHlllp!t',till'rhulu r-odeofsegment~G~in Figure3.5(b)isscanned. The tirstli\'e rodcsrontulntwokindsorcode,Nd
=
2,Whenthe6lh code is scanned, thelirsLlive(wit'sarr- selectedand thecorresp onding Hill'seg mentwiththevertical dirrvthm is('xtrilr l.ed,Theprocedure cont in ues unt iltheendofthe chaincodeis read't·11.TIll' loop is n'prl'se n letll Jyfourlinesegments:"D C C CC ~(corresp on d i ngto UIt'!er 1 wrl inl i lillt'segmt'llt.),~IlAAAAI\A"(corwsrondinglotilebottom horizontal li'l(' S"I;lllt'lll),~IIG (jG G~(t'lJrrt'Spon tlilig tothorightvcrt.icalhnesegment},and~F EEE E E E" ( nJ rrt'SJl(l n di llgtotht,tophortaonulHucsegment ].
3.4 Merging
Two linesegmentscan be merged1,0 fonu one slrok.·if1IlCy.~h"rt·nil' smUt'k".I' point(fcalurc point, locatingpolut,orsepa ra t ingptliu l)nml111I\'c'tlu-sauli' urit'll' tat.inn.Thercrnlliui nglinescgllll'lllllwllirh<~H lI lI..l1)(·llwrgt',1n·ltliliu;11'imlivi, bl ;11 str o kes. For eachlin esegme nt ,theori"Il\ntinllrnu he dcl.t'fllli1l(,dI))·:
where(XIlYI )tim](X2'Y2)arctill'twokt'}'llll i lilsu(til('lilli'SI'gll1l'll L.Thenr;"l1 lal;oll oralincscg rncutis :
Iliy/ilditl!}01U1/(IUJ) 22.!jO~n<(i7.!i·
~
''''Ok'Verlicil/tV)o Xcr. s-
$0~112,."j°~ "."'."
"
- - - - - - - .
X l,t:!lllill!/fJIlt1l{/,f)) 112.5° :50:5\.'i7,.'i"
Fig ure :J.6Cla.~ si fi, :atiollofLllCllinf~ lil)~lIlt~JlI. s
I)horlzontaljll},ifO·:S:(\0<22}'j"III'1."i7..'j" ::;,,:sIH(J", 2)riglll diagcnelt