Cursive Bengali Script Recognition for Indian Postal Automation

(1)

HAL Id: tel-01748429

https://tel.archives-ouvertes.fr/tel-01748429v2

Submitted on 25 Mar 2011

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Automation

Szilárd Vajda

To cite this version:

Szilárd Vajda. Cursive Bengali Script Recognition for Indian Postal Automation. Engineering

Sci-ences [physics]. Université Henri Poincaré - Nancy 1, 2008. English. �NNT : 2008NAN10083�.

�tel-01748429v2�

(2)

UFR STMIA

Cursive Bengali Script Recognition for

Indian Postal Automation

TH`

ESE

pr´esent´ee et soutenue publiquement le 12/11/2008

pour l’obtention du

Doctorat de l’universit´

e Henri Poincar´

e – Nancy 1

(sp´

ecialit´

e informatique)

par

Szil´

ard VAJDA

Composition du jury

Pr´

esident :

Thierry Paquet

Professor, University of Rouen

Rapporteurs :

Jean-Marc Ogier

Professor, University of La Rochelle

Laurence Likforman-Sulem

Associate Professor, Telecom ParisTech

Examinateurs :

Thierry Paquet

Professor, University of Rouen

Ren´e Schott

Professor, University of Nancy 1

Abdel Bela¨ıd

Professor, University of Nancy 2

Invit´

e :

Christophe Choisy

Research Engineer, Itesoft Company

(3)

(4)

First of all I would like to express my gratitude to Prof. Abdel Belaïd for supervising me during this thesis and giving me precious hints concerning research. He was the person who initiated meinhandwritingrecognition and he alsohelped meduring thehard moments.

I would like to thank everyone else in the Read Group, especially to Hubert Cecotti, Yves Rangoni,HatemHamza andAndréAlusse.Thelaboratoryhasbeen anidealenvironment,both socially andtechnically, inwhich to conduct research.

Specialthankmustgo toDr.ChristopheChoisy forallthehelpconcerningthebasic NSHP-HMMrecognition system.

Special thanks must go to Prof. B. B. Chaudhury and Dr. Umapada Pal for all the help suppliedduringmy stayinCVPRUnit,IndianStatistical Institute,Kolkata, India.

I would also like to thank everyone in the Loria Research Center, specially Nadine Beurne forherindulgence and helpconcerningtheadministration relatedproblems.

The Indian Post should be thanked for providing the Indianpostal document to carry out ourresearchwork.TheServicedeRechercheTechnique delaPoste(SRTP),istobethankedfor providing the bankcheckamount dataset.

TheIndo-FrenchCenterforthePromotionofAdvancedResearch(IFCPAR),istobethanked forproviding the nancial supportnecessaryfor meto carryout this work.

(5)

(6)

(7)

(8)

Chapitre 1

Introduction 1

Chapitre 2

Postal documents recognition

2.1 Postal documentsrecognition . . . 5

2.1.1 History . . . 5

2.1.2 Postal document preprocessing . . . 7

2.1.3 Automatic addressrecognition systems . . . 9

2.1.4 The particularities ofthe Indianpostal documents . . . 12

2.1.5 Conclusions . . . 15

2.2 Handwritten word recognition . . . 15

2.2.1 Introduction. . . 15

2.2.2 Handwriting recognition systems . . . 16

2.2.3 Lexicon reductionstrategies inhandwriting recognition. . . 32

2.3 Handwritten digit recognition . . . 44

2.3.1 Introduction. . . 44

2.3.2 Neural network based classiersfor handwritten digit recognition . . . 48

2.3.3 Stochasticapproachesfor separatedhandwritten digit recognition . . . 56

2.4 Conclusion. . . 59

Chapitre 3 Limits of the baseline NSHP-HMM handwriting recognition model 3.1 The NSHP-HMMon digit andword recognition . . . 61

3.1.1 General framework . . . 61

(9)

3.1.3 Formaldenition ofthe NSHP-HMM. . . 64

3.1.4 Likelihood calculusfor theNSHP-HMM . . . 66

3.1.5 Trainingof themodel . . . 66

3.1.6 Decoding inthe NSHP-HMM . . . 67

3.1.7 Experiments andresults . . . 67

3.2 Analytical extension ofthe NSHP-HMM . . . 69

3.2.1 General framework . . . 69

3.2.2 Analytical approach . . . 70

3.2.3 Formaldenition ofthemodels . . . 70

3.2.4 Model fusion . . . 72

3.2.5 Cross-learning concept . . . 73

3.2.6 Word normalization bytheNSHP-HMM . . . 74

3.2.7 Experiments andresults . . . 76

3.3 General conclusionsconcerningtheNSHP-HMM . . . 77

3.4 Proposedapproach . . . 79

Chapitre 4 High-level information implant in the baseline NSHP-HMM 4.1 Objectives . . . 81

4.2 General descriptionofthe implant problem . . . 82

4.3 Formaldescriptionof theimplant . . . 85

4.3.1 TheNSHP-HMM formalism . . . 85

4.3.2 Theweighting mechanism . . . 86

4.3.3 Localweight and global weight . . . 87

4.3.4 Thenature ofthe weight. . . 88

4.3.5 Thesource of theweight . . . 88

4.3.6 Theweight calculus . . . 89

4.3.7 Theweight normalization . . . 91

4.3.8 Model complexity. . . 92

4.4 Experiments andresults . . . 92

4.4.1 Databases . . . 93

4.4.2 Imagepreprocessing . . . 93

4.4.3 Perceptual feature extraction . . . 96

4.4.4 Thestructural NSHP-HMM parameters . . . 97

(10)

4.4.8 Comparison study with thestate oftheart . . . 104

4.5 General conclusions. . . 107

Chapitre 5 Time complexity reduction in the Viterbi decoding 5.1 Objectives . . . 109

5.2 General descriptionof thereductionprocess . . . 110

5.3 Formaldescriptionofthe reduction . . . 111

5.3.1 The Viterbialgorithm . . . 112

5.3.2 Threshold mechanism . . . 113

5.3.3 Natural lengthestimation . . . 116

5.4 Experimentsand results . . . 117

5.4.1 Results concerningthesymmetryintheNSHP-HMM . . . 117

5.4.2 The Viterbipruning results . . . 117

5.4.3 Natural lengthestimation results . . . 118

5.5 Conclusions . . . 119

Chapitre 6 Neural and stochastic methods in handwritten digit recognition 6.1 Introduction . . . 121

6.2 Proposed neuraland stochasticstrategies indigit recognition . . . 122

6.2.1 The multi-layerperceptron:ReadNet . . . 122

6.2.3 The NSHP-HMMindigit recognition . . . 135

6.2.4 Experimentsand results . . . 135

6.3 Classiers combinationina digit recognition framework . . . 138

6.3.1 Combination rules . . . 139

6.3.2 Experimentsand results . . . 140

(11)

Chapitre 7 Conclusion 145 7.1 Summary ofresults . . . 146 7.2 Contributions . . . 148 7.3 Futurework . . . 148 Annexes 151 Annexe A Databases description 151 A.1 Modied NISTdatabase . . . 151

A.2 Bangla digit andcitynamedatabase . . . 152

A.2.1 Statistics concerningthe Bangla vocabulary . . . 153

A.3 SRTPFrench bankcheckdatabase . . . 155

Annexe B The Bengali script 157 B.1 Origins. . . 157

B.2 Notable features . . . 158

B.3 Usedto write . . . 159

B.4 TheBengali alphabet. . . 159

(12)

2.1 The structureof the convolutional neural network usedbyWolfandPlatt . . . . 8

2.2 A challenging sample le. As it can be observed, even after preprocessing there is still a large amount of background noise. While the left address candidate is almostcorrecttheZIPcodeistruncated.Therightcandidate (showninthelower right)givesthecomplete address. . . 9

2.3 ThesystemowchartconsideredbyBlumensteinetal.forpostaladdressrecognition 10 2.4 Indian multi script postal documents with the corresponding DAB (destination address block) identied . . . 12

2.5 Representation of rstand second digit inan Indianpin-code . . . 13

2.6 Indianpostal codesdistribution onthemapof India . . . 14

2.7 For Arabichandwriting eachline of text isdividedinto frames andeach frameis divided into cells[BSM99]. . . 19

2.8 The PHMMproposed byGilloux [Gil94].. . . 22

2.9 The correspondingPHMM for theArabic paw[ABE98 ] . . . 24

2.10 The NSHP-HMMconsidered byChoisy for bankcheckamounts recognition . . . 24

2.11 The complex letter model considering the dierent graphemes proposed by El-Yacoubi etal.[YGSS99] . . . 27

2.12 Generic word shapecoded bysegments in[CA04] . . . 37

2.13 An overview of a basic handwriting recognition system as described by Koe-rich[KSS03] . . . 38

2.14 Treerepresentation of Englishwords coming froma dictionary. . . 41

2.15 A multi-layerperceptronscheme withthe correspondingweights . . . 49

2.16 The sigmoid function . . . 49

2.17 Architecture of LeNet1. Each plane represents a feature map i.e. a set of units whose weights are constrained to be identical. Input images are sized to t in a

16x16

pixel eld, but enough blank pixels are added around the border to this eld to avoidedge eectsintheconvolution calculations . . . 52

(13)

2.18 An HMM modeled by a Dynamic Bayesian network, where

(X

t

)

1≤t≤T

are the hiddenstatesand

(Y

t

)

1≤t≤T

aretheobservations. . . 57 3.1 The column probabilities observed by the dierent HMM states inthesystem of

Saon . . . 62 3.2 Setsof pixels

Θ

(i,j)

, Σij

relatedto site

(i, j)

. . . 63 3.3 Theneighborhoodorders which can be used . . . 64 3.4 TheNSHP-HMMschemebySaon[Sao97 ]where thestatesrepresent thestatesof

theHMMmapping thedierent image columns . . . 65 3.5 Word meta-models for theFrench words "francs" and thedierent abbreviations

occurringinthebankchecks . . . 70 3.6 Aleftto right modelanda modelwithspecic stateswherethestate durationin

thenalstate ismodied . . . 71 3.7 Thegeneral word modelcreationprocess of theword "et"in[Cho02 ] . . . 72 3.8 Thecrosstrainingmechanismfor the letter"i"considering dierentwordmodels

in[Cho02] . . . 73 3.9 Thecomplete scheme for usingtheHMM innormalization andNN inrecognition 75 3.10 Normalization of the French word "et" by the corresponding NSHP-HMM. The

normalization is based on the mean value of the columns observed by thesame state ofthe model . . . 75 4.1 Thegeneral systemoverviewof thestructural information implant inthe

NSHP-HMM . . . 83 4.2 TheNSHP-HMM model . . . 85 4.3 Busy-zonendingfor theBangla word Dhanekhali using projection proles . . . 95 4.4 Busy-zone nding fr the Bangla word Dhanekhali using water reservoir based

features . . . 95 4.5 (a)Originalimage and (b)Normalizedimage of the word "four"inFrench . . . . 96 4.6 (a)Originalimage and (b)Normalizedimage of theBangla word Dhaniekhali . . 96 4.7 Ascenderand descenderextractionbasedon the middlezoneof writing . . . 97 4.8 The structural NSHP-HMM analyzing the word Darjiling considering the

struc-tural information extracted fromthe word shape . . . 99 5.1 Theconsidered symmetricaspects intheNSHP-HMM . . . 110 5.2 General systemoverviewfor lexicon reduction . . . 111 5.3 TheNSHP-HMMwiththedierentthresholdvaluesxedateachletterlimit.The

letterlimitsareknownasthegeneral word NSHP-HMMarebuilt consideringthe word meta-models andtheletter models.. . . 113

(14)

6.1 Samples ofBangla handwritten numerals. . . 123

6.2 (a) EnglishNineand Bangla Seven,(b)English andBangla Two. . . 123

6.3 The 141 test patterns misclassied by ReadNet. Below each image is displayed the correctanswer (left side) and thecorresponding network answer(right side). These errorsaremostlycausedeitherbythegenuinelyambiguous patternsor by digit written ina style thatareunderrepresentedinthetraining set. . . 128

6.4 Confusion for 16-class classier(Banglaand English) . . . 130

6.5 Confusion matrix for 10-classBangla classier . . . 130

6.6 Confusionmatrixfor10-classEnglishclassierfordigitscomingfromIndianPostal documents . . . 131

6.7 The samplesdistribution inthe classes for thedierent constructed datasets . . . 133

6.8 The NSHP-HMMscheme for separatedhandwritten digit recognition . . . 135

A.1 Digit samplesfrom the MNISTdataset . . . 152

A.2 Some word citynamesamples for theBagla cityname dataset . . . 154

A.3 ThedistributionofthewordentriesintheBanglavocabularybasedonthenumber of letterinthe words . . . 154

B.1 Bengali vowels andvowel diacritics . . . 159

B.2 Bengali consonants . . . 160

B.3 A selection of conjunctconsonants inBengali . . . 160

B.4 Bengali numerals . . . 160

(15)

(16)

Introduction

Nowadays, various works were proposed to realize the core of the recognition systems to satisfy the needs raised by dierent real-life applications like automatic reading of postal do-cuments, bank check reading, form processing, printed document recognition, etc. Despite the impressive progress achieved during the last few decades in this eld, the performances of the handwritingrecognition systems are still far from human performances. Most ofthese systems, whilepresentinga large spectrumofperspectivesto theproblem,share thesame diculties.

Theautomaticreadingofhandwrittenaddressesisquiteadynamicresearcheldandseveral researchteamsalloverthe worldareinterestedin.Suchreading systemsaretypically composed by several processing stages : image acquisition and image pre-processing followed by address bloclocation,segmentationintolinesandwords,locationofZIPcodeandcitynames,recognition of theZIP code and city name and nally thefusion of the results to produce a nal decision. Each oftheseprocessesarehiding quiteseriouschallenges whichguidedus to investigatesucha mailprocessingsystem.

Duringthisthesis,wehavetriedtoconsiderarestrictedpartofthesehandwritingrecognition issues throughout the design and implementation of a system for cursive Bengali handwritten address recognition. Considering this research work positioned in the eld of postal document recognition,dierent type of questionshave been raisedlike:

What kindofword recognizershould beconsidered?Shouldwesegment or notinascript environment(Bengali) whichhasnever been segmentedbefore?

HowtoexploitthegraphicalrichnessofthismainlyunknownscriptwhichBengaliis?How can this extrainformation integrated intherecognizer?

Is it possible to extend the word recognizer to handle larger vocabularies? If so, how we can do this?

Whatkindofdigitrecognizershouldbeconsideredforthispurpose?Whatkindoflearning strategy should be considered?

(17)

Whatkind of improvements canbe proposedto improve thedigit recognition scores? Consideringalltheseissues,wehaveconcentratedoureortstoapplyandextendanexistingword recognizer, socalledNSHP-HMM (NonSymmetric Half-Plane Hidden MarkovModel) which is a totally

2D

model able to recognize words without any kind of physical segmentation. This choice was motivated by the fact that there was no existing solution to segment handwritten Bengali words into letters or graphemes. To exploit the specic graphical shape property of this ancient script, we propose a combination of low-level information with the high-level one consideringthemasoneentityinsteadofusingthemseparatelyasothermodelsdo.Thisnatural combination follows the humanreading habits where thewholeword is considered and thereis no physical separation between the dierent typeof information.

In order to extend the existing model for larger vocabularies, we propose an appropriate stopping mechanism in the decomposition process where there is no physical segmentation, so there arenowelldened boundariesbetween thelettercomponents.

Equally, we were interested to recognize pincodes coming from these postal documents. In thistopicwehave focusedour attentionaroundsomeneuralnetworksolutions andwehave pro-posedanewlearningmechanism.Meanwhile,wehaveconductedsomeresearcharoundclassiers combination to reachhigher accuracy androbustness aswell.

However, the main challenge of this work was to apply these solutions to Indian postal documents written in Bengali. While the Latin scripts which are familiar to us, Europeans, contains just a restricted number of letters and digits, the Bengali script (the second most popular script in India and the

5 th

most popular script over the world) is much more rich in numberaswellmorecomplexingraphicalshapes.Ourtargetwastoapplyandadaptwithsuccess theexistingword and digit recognizersbyexploitingthe specicities of thisIndianscript.

TheproposedHWR (Handwriting RecognitionSystem), whichis theoutcome of astrength collaborationwork

1

betweenIndianandFrenchscientists,hasbeenusedwithsuccessondierent recognition tasks. The main application area is the recognition of Bengali city names and pin codescomingfromIndianpostaldocuments.SeveralexperimentsthathavebeencarriedonLatin and Bangla scripts also allowing us to consider the accuracy and the robustness of the model. A success can also be observed for separated handwritten digits, where the system gives also promising results.

Therecognition ofhandwritten Bangla citynames isapioneeringworkasinourbest know-ledge thereisno existingresearchinthis eld.

Thethesis structure canbedescribedasfollows :

1 . Theinternationalproject2702-1"HANDWRITINGRECOGNITIONFORPOSTALAUTOMATION"has beenhostedbyIFCPAR(IndoFrenchCentrefor thePromotionofAdvancedResearch)andhas beendeployed byCVPR(ComputerVisionandPatternRecognition),Calcutta,IndiaandREAD(REcognitionofwritingand DocumentAnalysis),Nancy,France.

(18)

encountered by the researchers during the last thirty years. We will review some word recognition strategies and digit recognition strategies aswell .Inorder to geta clear idea aboutthecurrentstrategiesappliedtoreducethevocabularyinthehandwritingrecognition paradigm, a sectionisaddressing to this issue.

In chapter 3. a detailed formal description is given concerning the NSHP-HMM system and its dierent applications in handwritten word and digit recognition, highlighting the system'sadvantages andthedrawbacks derived from themodel'snature.

Chapter 4.contains apersonalcontribution concerning theimplant ofhigh-level informa-tion in the NSHP-HMM HWR system to create a more reliable and robust system. A detailed descriptionis given concerningtheextraction of thefeatures, thecombination of the low-level and high-level features in the framework of the NSHP-HMM and the dif-ferent normalization techniques proposed by us. The evaluation of this new technique is performedonthe handwrittenBangla citynamedataset and theSRTPdatasetwhich isa handwritten French bankcheck amount collection.

Chapter5.containsthecontributionconcerninganewpruningmethodologyintheViterbi decodingprocess.Wedescribe thetheoretical aspectsofthethresholdmechanismusedby thisstrategyfollowedbyanevaluationofthetechniqueforhandwrittenBanglacitynames. Chapter 6. presents a comparison study between the stochastic and neural models used for separated handwritten digits. This chapter has therole to show our achievements on digit recognition usingHMM basedtechniquesand respectively neural network based ap-proaches. In order to highlight the strengths and weakness of each type of method we proposesomecombinationschemestoexploit thecomplementarityoftheseclassiers.The evaluationof themethods isperformedondierent handwritten digit datasets.

The nal chapter 7.is consecrated to theconclusions concerning our contribution to the eld of handwriting recognition by appointing new ways to explore based on the work proposedbyus.

(19)

(20)

Postal documents recognition

Throughout this chapter we would like to review the diculties raised by the automatic postal document recognition with specic reection to the Indian postal documents, which is themain concerns in this thesis. The main objective is not to give you a full descriptionof the eldbutto highlightthespecicitiesofthistask.Similarly,wewill reviewthehandwritten word recognition and handwritten digit recognition issues but mainly justsome specic domains will be consideredinorder toallowa directcomparison withthesolutions proposed inthisresearch work.

2.1 Postal documents recognition

Automatic sorting of handwritten mail piecesis a very challenging task.The main problem inhandwritten addressrecognition are parsing and recognizing a setof correlated entities such astheZIPcodes,streetnames andbuildingnumbers, inthepresenceofincompleteinformation. It is a computer vision problem which has stringent performance requirements in commercial applications

The task of accurately recognizing and interpreting a handwritten address is complicated by the variability and the complexity of the address, word shape distortion due to non-linear shifting,unpredictable writing style andfailureto locatetheactual addressinthedatabase due toseverepostcoderecognition errors andintrinsic deciencies intheaddressdatabase.

2.1.1 History

As a result of extensive socioeconomic activity in recent years, Japan's information trac hasbeen increasing rapidly. As stated byWada in [Wad33]the total volume of mail inJapan isequally risingand increasing by7% eversince 1975. Considering thefact thattheamount of mailpercapitalinJapan is1/2 ofthatintheUS,they areestimatingwithakindof condence

(21)

that the volume of mail will become twice that of the early 90s. For that purpose the postal mechanizationwasconsideredahighpriorityissuebythePostalBureauoftheMinistryofPosts and Telecomunication and hasahistory ofmore than 4 decades.

In order to allow such an automatic sorting strategy the standardization of envelopes as well as the introduction of postal codes was necessary. In 1962, in order to ensure postal item harmonization,theJISstandard [Tok93]wasinitiallyinstituted,witheightsuchstandardsbeing formulated asrecommended envelopestandardsbytheMinistry.Meanwhile, theintroductionof thepostalcode systemhasbeena longer process andwasaccepted bythepublic justin1975.

The rst automatic postal mail sorting system has been installed at Tokyo Central Post Oce and it was the world's rst machine that could read 3-digit numerals within red postal code frames through OCR equipment. Similarly in 1968 the rst culling, facing and canceling machine(CFC)hasbeen startedworkingonShinjukuPostOce.In1971theymadeitpossible to interconnect the OCR sorters with the CFC establishing one of the rst entirely automatic postal documentsorting system.

The same development has been started in France due to the growing demand for such automatic mail sorting. The Technical Research Department of La Post (SRTP) established in 1984 wasresponsible for many research projects inthe eld, but themainstream wasto adapt theexistingsystems ratherthan innovate [Bur93 ].

Inordertofollow-upthemodernsolutionsatotalrethinkwasnecessaryasstatedbyBurbaud. An outcome of this strategy is the Rennes-Cesson parcel center, a sorting unit experimentally using self-guidedvehicleswhich servethecontainer unloading platform. Thesame direction has been followed by projects focused on address recognition on small envelopes where a former segmentation method hasbeen considered and for digit recognition a neural network hasbeen found as being the ideal solution. An extended version of the former project stated byGilloux in [Gil93] was the address recognition of at mail, where the address is often surrounded by other informations like advertisement, sender identication, magazines, etc. For that reason, a preliminaryaddresslocation processshould precedethe recognition. AsLa Posteoersbanking solutions for the clients a project has been oriented toward such a bank check reading system [LLGL97]carryingmanyadvantages:the size ofthevocabulary isreduced(theshape proleof thecourtesyamountsinFrencharequitediscriminant)andthelocationofthedierentcontents arerestricted.

The Unites States Postal Service (USPS) has also invested signicantly toward automated processingofmail-piecestospeedupthesortingaswellastoreducethelaborcost.Thelettermail automationprogram ofthe UnitedStatedhosedbyUSPS utilizerecognition softwaredeveloped by numerous vendors [SLGS02 ]. Using such a strategy the costof the processingper1000 mail piecesdropsfrom47.78USDformanualprocessingto 27.46USDformechanizedprocessingand 5.30 USD for automated processing. The savings are multiplying rapidly as about 400 million

(22)

pieces of letter mail per day areconsidered for sorting by the USPS. Setlur and his colleagues [SLGS02 ] aredescribing the dierent databasesand standards imposed by theUSPS to resolve theaddress issue on the mailpieces. The standards arefairly well adhered to especially inthe machineprintedmail-streams.Handwritten mailnds greater deviationsfrom thestandards.

2.1.2 Postal document preprocessing

When mixedmail enters ina postal facility,it must be rst faced andoriented, so thatthe addressto bereadable by the used mailprocessors.Existing USPS systemsface andorient the domestic mail pieces based on the uorescing indicia on each mail pieces. However, as stated by theauthors in [NCA

+

03] stamps and foreign-originated mail pieces do not uoresce sothe processing systems can not sort foreign mails. For this purpose, they are proposing a system which analyze both face of the mail piece and faces it and orients it in right order. After a preliminarybinarization process, the address is located without considering the position of the addresscandidate.Fromthesecandidates,basedonaprioriknowledge(stampposition,presence of postal delimiters, bar codes, etc.), the best one is selected. Even the results are promising, 7.1% oferror has been done duringthe facingand orientation, thesystemis still dependent on theaprioriknowledgegivenbythehumanoperator.However, asstatedbytheauthorsthebeta test of their system will save around 500.000 hours anually. The system has been deployed by USPSinNovember2002.

Once the facing and orientation is performed the location of the address bloc should be considered. Dierent attempts have been done based mainly on the structural composition of the address. El Yacoubi et al. are proposing quite an interesting solution [YBG95]. Instead of usingtheclassicalway,theyareusingword spottingforthatpurpose.Using thistechnique they are not just locating but also recognizing the word they are looking for between the address structure. Their model is based on HMMs. For each digit they are designing a letter HMMS and the word HMMs are made of a simple concatenation of the corresponding letter HMMs. Asfeaturesthey areconsidering thepresenceof :upper strokes, lowerstrokesand closed loops. Thesystemhas been tested on a reduced size database (122) containing 350 streetnames. The achieved92.1%isquiteimpressiveandtheauthorsconcludedthattheremainingerrorarecoming frompieceswhere some segmentation errorsoccuredor theimage was quitenoisy.

Lii and Shrihari [LS95]have considered a similar challenge for fax cover pages. They were trying to locate the name and theaddress of these special type of documents. The system at-temptstolocateandrecognizewordswhicharedataeldindicatorssoastogureouttheposition of such keywords as "TO" and "COMPANY". These keywords are considered as references in blocksegmentationtosegment outtheirassociateddata regions.Firstlytheyarebuildinga spa-tial mapgridwhere all the spatial relationships between thetext objectsarestored. Locational

(23)

information concerningtheindividualconnected componentscandirectlyberetrieved fromthis map. The connected components are serving for labeling purpose. Instead of word recognition they areconcentrating moreon character recognitionusing atwo-layerbackpropagationneural network using chain codefeature. Theword recognition for the reducedvocabulary(To, Atten-tion, From,Company,Message,Pages) isbasedonmatchingatletterlevel. Theresults areboth for machine printedand handwritingfax cover pages. While for printeddocuments 100% good location performance has been done, for printedthe result was just80%. The results are quite impressive but the size of the dataset (12 machine printed documents versus 115 handwritten) can not really showthe quality ofthetechnique.

For the addressblock location a contour clusteringalgorithm has been proposed by Govin-darajuand Tulyakov [GT03 ]meetingthe criteriato beinvarianttothedocumentstyle(printed or handwritten). Their strategy is to extract connected components contours, extract contour features and cluster these features in the feature space. Once these points are clustered using heuristics thecluster correspondingto the addressblock is detected. Finally,the other clusters close to the designated one are discarded. The algorithm was developed for parcel image set which contains images with well separated address blocks as well asnon separated/incorrectly separatedaddressblocks.For thatpurposetheyhave consideredtheHWAIsystemdescribed in detail in [Sri00 ]. While without this addresslocation algorithm the systemhas considered 240 imagesasbeingnalized,whenthealgorithmhasbeendeployedthisnalizeddocument number hasincreasedto272whichisquitea success.Theadvantageofthesystemisitsinvariant aspect which canbe exploited in sucha document environment aspostal documents.

Another kind of strategy is considered by the authors [WP94 ] to locate the addressblock. Theyareconsidering thesamestrategyasLeCunwithhandwrittendigits [LBBH01].Theyalso consider this issue, asa challenging one and they propose a convolutional neural network with four outputs to ndthe dierent corner of the addressblock.

Figure2.1 The structure ofthe convolutional neural networkused byWolfandPlatt The system has been tested on 500 test images. One challenging test image and the

(24)

cor-responding output are presented in Fig. 2.2. The 98.2% score is a very good one and we also considersuch astrategy asbeinguseful forthis addressbloc location issue.

Figure 2.2 A challenging sample le. As it can be observed, even after preprocessing there is still a large amount of background noise. While the left address candidate is almost correct the ZIP code is truncated. The right candidate (shown in the lower right) gives the complete address.

2.1.3 Automatic address recognition systems

Thegoalofthissectionistogiveabriefideaaboutthecurrentsystemsandtheirarchitecture. Unfortunately there are just a few research papers describing the whole system from image acquisition to nal recognition. Mainly the research groups are focusing on some particular problemscoming fromthis managing owlike :imagepre-processing, addresslocation,line and word segmentation, digit recognition and word recognition. We have also decided to follow this structure,presentingabovejustafewsystemsandafterwe willdevelopinmoredetailstheword recognitionpart aswellasthedigit recognition partasthese arealsothemain concerns forour postal automation system.

(25)

Blumenstein and Verma [BV97] proposed the implementation of a recognition system for printedandhandwrittenpostaladdressesbasedonArticialNeuralNetworks(ANN).Theywere interestedincomparingdierenttypeofnetworkstoanalyzetherecognitionperformanceaswell astheaccuracy.

Figure 2.3Thesystemowchart consideredbyBlumenstein etal. forpostal address recogni-tion

Theyperformedall the stepsnecessarybeforesending theimage forrecognition. The acqui-sition was made by a scanner and a simple binarization technique has been implemented. The segmentationisalsobasedonconnected componentsandpayingattentionto handwrittenwords where thecutting path hasbeen dened byasparsepixeldensity.Inorderto allowthe recogni-tionofcharactersbyneural ANNasizeregulatednormalization hasbeen done.Therecognition hasbeen performedbyamulti-layer ANNtrainedwithbackpropagation algorithm.The system owchart of thesystemisdepicted inFig.2.3

Whiletheresults for printedcharacters aregood(84.72% accuracy),thesame situation can notbenotedforthehandwrittencharacterswherejust58.59%hasbeenreached.Similarly,forthe RBFtypeANNtheresults areevenworse.However,theresultfor thewholeaddressrecognition scheme is quitepromising. For printed addresses the system isvaryingbetween 83.33%-97.62% good accuracy, while as it was expected for handwritten addresses it can not reach more than 68.75% accuracy. The low result can be explained by the quality of the images as well as the reducednumberof datasets. Unfortunatelynocomparison can be madedue tothevariations in thedatabase.

Another complete system is proposed in [MSM98 ]. The authors present a system based on four modules :over-segmentor, dynamic zip locator, zipcandidates generator and city-state verier.

Instead of using a linear system architecture as we have seen in [BV97 ], the authors are using a more complicated structure with a possible return to the segmenter if no zip code is found correctly.First the address lines are separated using projection allowing a skew angle of -10to +10 degree.The over segmentoris responsible for ndinga set ofsplit points for a word or text line image. They have applied heuristic for this purpose like location of a set of split point on the upper contour or lower contour of each connected component, looking for sharp or smooth valleys, horizontal and vertical overlapsof thegraphemes, etc. The zip code locator is powered by a ANNin order to generate posterior probabilities inthe matching. Asthey can

(26)

not know about the size of the ZIP code, they are looking for both 9-digit and 5-digit ones as well. Once this segmentation is successful, an HMM is considered to generate the candidates list. The output of the HMM is an ordered list of valid zip codes. Thesezip codes are coupled with the corresponding city names available in the database. The exible matcher is used for matching the listof graphemes withevery entry inthe lexicon. For the selection, a criterion is dened:whenthematchisdoneofasequenceofgraphemestoastringentry,itisnotnecessary that each character in the string is on the top among all the possible character classes for the correspondingsegment.Basedonamatchranking,thenaldecisionistaken.Theoverallsystem achievesanaccuracyrateof83.5%with3.6%errorfor5-digitencodingon805cursiveaddresses. Similarly, asinthe previous case, based on the specicity of thedataset, no directcomparison canbemade.

Thereal-time systemproposed byKimetal.[KG95 ]isreallyusable for asmallsize lexicon. Theyare considering for preprocessing the chain code extraction coded in an array. Each data node in the structure represents one of the eight grid nodes that surround the previous data node.Theyalsoconsiderslantcorrection,noiseremoval,smoothingandnormalization.Afterthis process they are segmenting the characters in graphemes based on the following assumptions : thenumberof segmentspercharactermust beat most4 and all touching characters should be separated. For features they are considering 74 chain code based features. The recognition is based on dynamic matching by comparison between several possible combinations of segments andreference featurevectorsof codewords.

The results are performed on 3,000 images including rm names, street names, personal names and state names. Using all the 74 features they achieved 96.23% (10 words lexicon), 87.40% (100words lexicon) andfor large vocabularytheresults aredecreasing to 72.30% (1000 wordslexicon).Usingasubsetofthesefeatures,lowerresultshavebeenachieved.Firstofallthis is a complete recognition system giving high results for reduced vocabularies but in the mean time is quite fast (100-200 msec) because of the chain code representation, soit can meet the requirementsof areal-time application.

In [Sch78]the authorsdescribethedesignprinciplesofamulti-fontword recognitionsystem developed for German postal documents reading. He also describes the whole work ow but heis concentratingmore on the separated character recognition andcontextual postprocessing insteadoftheimagepreprocessing. ThemainideaistofeedeachseparatedcharacterintoaSCR (SingleCharacter Recognizer).TheseSCRsarestanding for dierentpurposes,they will decide if it is capital letter, small letter or the analyzed character belongs to the numerical dataset. Each recognizerwilloutputarankorderedlistandtheseoutputs willserveforthenaldecision to match them against a hash coded table look-up. Even if the author gives a very detailed descriptionofthemethoditcannotbeconsideredacompletereadingsystemandthereisalack ofprecisionasthere isnokind ofresultgivenabouttheaccuracyof theSCRand neitherabout

(27)

thetable look-up strategy.

2.1.4 The particularities of the Indian postal documents

Asthe main aspectconcerning this thesis is the recognition of Indianpostal documents so we wouldliketo showthespecicityof suchdocuments.

India has a multi lingual and multi-script behavior. In India there are about 19 ocial languages and an Indian postal document can be written in any of these ocial languages. Moreover, some people write thedestination addresspart ofa postal document intwo or more language scripts. For example in Fig. 2.4, the destination address is written partly in Bengali scriptandpartlyinEnglish.BengaliisthesecondmostpopularlanguageinIndiaaftertheHindi and the fthmost popular languageinthe world.

Figure 2.4 Indian multi script postal documents with the corresponding DAB (destination addressblock)identied

Indianpostal code isasix-digit number. Based onthis six-digit numberpin-code wecannot locateaparticularpostoceinavillage.Wecanlocateapostoceofatown/sub-townbythis six-digit pin-code.Representation of thepin-code digits isshown inFigure 2.5. The Fig.2.6is thespatial representation of pin-codesall overIndia.

In India there is a wide variation in types of postal documents. Some of these are post-cards,inland letters,specialenvelopes, etc. Post-cards, inlandletters, specialenvelopesaresold inIndianpostoces and thereisa pin-code boxofsixdigits towrite pinnumberinthepostal document.Also,becauseoftheeducationalbackgrounds,thereisawidevariationinwritingstyle and medium. For example Kol-32 is written instead of Kolkata-700032. Also, sometime people donotmentionpin-codeontheIndianpostaldocument.Thus,thedevelopmentofIndianpostal addressrecognition systemis achallenging issue[RVP

+

(28)

(29)

(30)

2.1.5 Conclusions

Considering thepostal addressrecognition subject, we can conclude several things.Duethe growing trac of postal documents all over the world the designed and development of such systemsistoppriorityforthe dierent Postal Services.During thedevelopmentof suchsystems we can encounter dierent challenging issues like : nd the right orientation of the document, performnoisecleaning,locatethedestinationaddressblock,segmenttheDABinlinesandwords, spotthe city/town nameand the pin-code andnally therecognition.

All theseissues have been addressedindierent scientic papers but inthewholeliterature you can nd just a few pieces where complete systems are described with the corresponding details.The dierent research groups arefocusing on specicproblems like segmentation, DAB location, word recognition or digit recognition. Due to the waste amount of dierent postal datasets used for test purpose it is quite impossible to compare the results of the dierent systems.

Finally, asthis thesis isfocusing on Indianpostal document recognition we should notethe diculty of this task. As it was described theIndian documents are much more complex than otherdocuments due to the multi-script environment which India has. Theaddresses areoften writtenindierentscripts andthequalityof themediumischanging,so,oftentheimage acqui-sitionisalsolow.Inthemeantimeanotherchallengeisthatnobodyhasworked inhandwritten Bengaliword recognition.

Inthenextfewsectionswewilldiscussindetailtheword recognitionachievementsaswellas thedierent attemptsfor handwrittendigit recognitionfocusingonissuelikefeature extraction, feature combination in order to allow a global view about the existing systems and models. For digit recognition we are focusing more on neural network strategies and classiers.We are discussing such a point of view of these issues, because we would like to show what kind of extensionswe areproposing inthisresearchwork.

2.2 Handwritten word recognition 2.2.1 Introduction

The recognition of handwritten words by computers is a challenging task. Despite the im-pressive progress achieved during the last few decades and the increasing power of computers, theperformancesof thehandwritingrecognition systemsarestillfar fromhumanperformances. Wordsarefairlycomplex patternsowingto thegreatvariabilityinhandwritingstyle; handwrit-tenword recognition isa dicult matter.

The rst diculties are due to the high variability and uncertainty of human writing. Not onlybecause ofthe greatvarietyin the shape ofcharacters but also because of theoverlapping

(31)

and the interconnection of the neighboring characters. In handwriting we may observe either isolated letters such as hand printed characters, groups of connected letters, i.e. sub-words or entirely connected words.

Furthermore,whenobservedinisolation,charactersareoftenambiguousandrequirecontext to minimize the classication error. The most natural unit of handwriting is the word and it hasbeen usedbymanyHWRsystems.Oneof themain advantages using whole-word modelsis thattheyarecapabletocapturewithin-wordco-articulations.Whensuchwholewordmodelsare adequately trained they will usually yield the best recognition performance. Global or holistic approaches treat words as single indivisible entities and attempt to recognize them as whole, bypassing thesegmentationissue.Thereforeforsmallvocabularyrecognitionsuchasbankcheck readingapplications,wherethelexicondoesnothavemorethan30-40entries,whole-wordmodels arethepreferredchoice.

While words are suitable baseline units for recognition, they are not a practical choice for large vocabulary handwritingrecognition.Since eachword hasto be processedindividuallyand datacannotbesharedbetween wordmodels,thisimpliesprohibitivelylargeamountsoftraining data. Instead of using whole-word models, analytical approaches use sub-word units such as characters orpseudo-characters calledalso graphemes, requiringthesegmentation of words into these units.

Thesecond type of diculties lie inthesegmentation of handwritten words into characters. While incase of hand printed characters, thesegmentation isnot sodicult, asthecharacters aremore or lesswritten separately.For cursive words, this taskbecomesvery dicult.

Evenwiththis diculty anderrors introduced bythe segmentation, themost successful ap-proachesaresegmentationbasedrecognitioninwhichwordsarerstlysegmentedintocharacters or partof themandafterthatdynamic programming techniquesareuseddrivenbya lexiconto nd thebest word hypothesis.

2.2.2 Handwriting recognition systems

Considering the dierent handwriting recognition systems they can be classied concerning dierent criteria like :

the natureof featuresusedbythe dierent systems the size oflexicon considered bythe system

theanalyzedshape isconsidered asanentityor not (analytical vs.holistic) the analyzedscriptis printedor handwritten

thenatureof the recognizer

Ourclassication criteria adopted in this thesis will be based on thenature ofthe input as we can consider systems where low-level features areused, others where thefeatures contain a

(32)

semantical aspecttransmitted bythehuman vision andmore recently thesystemscombinethe discriminative power of these low-level and perceptual features. A special section will be dedi-catedto the

2D

bi-dimensionalmodelsasour extendedmodelisalso basedon

2D

architecture. Consideringsuch aclassication will allow us also to describe thedierent systemsconsidering thelexiconandthewritingdimensionalitytoo.Thewritingdimensionisdenedasthedimension of the writing which can be considered as a pattern realized on a 2D plan. This classication isconsidered asbeingimportant asour improvements proposedinthis thesis arealso based on suchcriteria.

Low-level features based handwriting recognition systems

Over the last several years, machine learning techniques particularly when applied to neu-ral networks, have played an increasingly important role in the design of the dierent pattern recognition systems.

Asstatedin[LBBH01],betterrecognitionsystemscanbebuiltbyrelyingmoreonautomatic learningandlessonhand-designedheuristics.Inthecasestudyforseparatedhandwrittendigits, the authors show that hand-crafter feature extraction can be replaced by carefully designed learningmachines (classiers) thatoperate on pixellevel.

Inaclassicalpattern recognitionsystemafeatureextractor gatherstherelevantinformation fromtheinputpatternandthenatrainableclassiercategorizestherelevantfeaturevectorsinto classes.Thenewideawastorelyonasmuchaspossibleonthefeatureextractionitself.Precisely insuchacasetheclassiercouldbefedwithalmostrawimagesandthefeatureextractionprocess isembedded inthesystemwhich can extractthe dierentfeaturesand inthesame timeisable tolearn them.

In [KFK02 ] the authors are using a basic classierbased on theEuclidean distance for un-constrainedhandwritingbutthefeatureextractionistremendous.Afterpreprocessingcontaining skew correction, slant removal, a script identication is performed. The line segmentation into words is based on horizontal projection. The word segmentation into characters is based on a technique which automatically extractstherequired knowledge inthe form of IF-THEN rules. Oncethesegmentation isnishedfor eachcharacter,a 280dimensional featurevectoris extrac-ted.After a size normalization to

32 × 32

from each shape thehorizontal and vertical prole is extractedcontainingthe number ofblackpixelscounted duringthesweap.

Theyalso dene some newfeatures by radial histogram the numberof black pixels existing on a rad that starts from the center of the character matrix and ends at its edge. The radial histogram is calculated by rotating the rad by step of 5 degrees. Additionally an out-in radial proleisdenedasthe positionoftherstblackpixelontherad,lookingfromthecenterofthe character tothe periphery.

(33)

Elmsetal. proposeda comparison study[EPI98 ] between a commercialOCR and anHMM approach forfaxedword recognition.Insuchdocuments,transmissions distortionscanbe obser-ved. The results achieved by the HMM approach are greater than the OCRs results. Here the researchers have been concentrating onthe problemof isolated character recognition, assuming that words are easy to segment into characters prior to character recognition, whereas the dif-ferences between images from books and faxes is thatthe facsimile images commonly have the charactersblurred together,makingthemverydicult tosegment.For theOCR theOmniPage has been used, while for the HMM the characters have been viewed asa sequence of columns. For each pixel column the shape aspect (the arrangement of pixel values within the line) and thelocation (the position of the line withrespectto precedinglinesin thesequence) have been considered.

TheconsideredHMMisaclassicalBakis-chain,where thenumberofstateshavebeenset-up basedonaverage lengthofobservations tobemodeled.For trainingpurposetheclassical Baum-Welch formula has been used. While the reported results for the OCR are much more better for cleandocuments thesuperiorityof theHMMmodelpowered bya lexiconis shownfor noisy faxed inputs.

Inorder to solve the problems raised by the dierent ane transformations in [SLD94 ] the authors dene a new distancemeasure whichcan be madelocally invariant to anyset of trans-formationsoftheinputandcanbecomputed eciently.Themetricso-called tangent distance is based onthe iteration of a Newtontype algorithm which nds thepoints ofminimumdistance on the true transformation manifolds. The test results shown that the algorithm can handle a rotation in the range of

(−15

, 15

)

.It is mentioned that other spaces than pixel space should givebetter results.

The method presented in [CK00 ] has been used for handwritten Arabic word recognition for a reduced size vocabulary.The approach doesnot require segmentation into characters and it is applied to a script, where ligatures, overlaps and style variations pose challenges to the segmentation-based methods.Whiletheothermethods extracthigh-levelperceptualfeatures,in this method there isnoneed for such an extractionprocess.

Theauthorsproposetotransformeachwordinpolarcoordinates,thenapplyatwo-dimensional Fouriertransformtothepolarmap.Theresultantspectrumtoleratesvariationslikesize,rotation anddisplacement whichcanoftenoccurinhandwriting. ForthispurposejusthalfoftheFourier spectrum was used and just the lower frequencies have been selected. As classier the simple Euclidean metric was used. The word templates were built using an average of the coecient values.

Theobtainedresults(93%accuracy)forbothprintedandhandwrittenwordsareencouraging. Theextensionofthemodeltoalargevocabularybecomesdicultduetotheresemblanceofthe shapes.

(34)

Toabsorbtherotationforhandwrittencharacters,theauthorsin[CCB04 ]proposeadynamic network topology. To preserve as much as possible the available information the raw image is consideredasinput.

The interest is to handle dynamically the network architecture by taking into account the rotationvariationoftheanalyzedshape.Inthatsensetherotationproblemin

2D

istransformed into a

1D

problemwhich iseasier. The given results areperformedfor reduced vocabularysize like30 characters, where some classesare groupedbased onsimilar shape considerations.

ThecomparisonstudyhasshownthesuperiorityofthismethodamongtheotherslikeFourier transform or Fourier-Mellin transform but a net superiority can be achieved ifthe character is deslanted.

For omni-font English and Arabic open vocabulary in [BSM99 ] a complete OCR system is described.The systemis script-independent, thefeature extractiontechniquesbased mainly on low-level information are also script -independent and for modeling and recognition purpose a segmentation-free technique have been used. The analyzed shape is divided into overlapped frameswhich is a systemparameter. And each frame is decomposed in20 cells aspresentedin Fig.2.7.

Figure 2.7 For Arabic handwritingeach line oftext is dividedinto framesandeach frame is dividedinto cells[BSM99 ].

Asfeaturessomelow-levelfeatureshavebeenusedlike:intensity(percentage ofblackpixels within each cell) as a function of vertical positions, vertical derivative of intensity, horizontal derivative of intensity and local slope and correlation across a window of two cells to avoid to extractscriptdependent features.Theresults isasetof 80simple featuresperframe.For letter models a 14 state left-to-right HMM is used. The achieved recognition scores are excellent but thedataisclean datawithnomuch variations.

For writer dependent vocabularies containing 150 words, Bunke et al. [BRST95] propose a Hidden Markov Model based technique. The input vector is composed by shape descriptors extracted formtheskeleton graph. Thesefeatures aresomehowat an intermediate levelof abs-traction, providing a good compromise between discriminatory powerand extraction reliability

(35)

and reproducibility. The disadvantage of the systemis the assumption of a cooperative writer who is willing to adapt hisor herpersonal writing style such that therecognition performance of the systemis improved. The ISADORAsystem usedfor this purpose allows to usea highly exible HMM-based pattern recognition architecture to build structural models from simple constituents. The number of states for each letter HMM is xedin function of some heuristics basedonthenumberofminimaledgesforthegivenletterintheskeletongraph.ThewordHMM isaconcatenation oftheletterHMMs.Soallthewordsinthevocabularysharethesameletters which allows to havea muchmore largertraining data.

Wecanconcludethanacorrectrecognitionrateofover98%canbeconsideredasaquite satis-factoryresultbut thedatahasquitegoodqualitywithoutmuchvariability.Henceaneshaustive comparison withthe other techniquesusing noisydataisnot possible.

Forpostal OCRsystem, Kornaiisproposing anexperimentalHMM approach[Kor97 ] based alsoonsomelow-levelfeaturesextracted fromaheight normalizedword shapeto64pixelsusing slidingwindowtechnique andfeatureextractionbypre-segmentation.Thefeaturescomingform thewindowframesarebasedonthepixeldensity,upper/lowercontour,etc.Whileforthesliding window method a 12-16 dimensional feature vector is proposed for the segmentation is based mainly on valleys (local minima) in the contour. To increase theperplexity ofthe systemsome language models based on the vocabulary have been developed and implanted in the system. Theresultsobtainedforhandwritten zipcode(84,5%)coming fromtheCEDAR datasetisquite good but thesecond experiment concerning thecity/state name recognition (63,6%)hasshown thatsuch a method isnot tunedfor such atask.

UsingdiscreteHMMforwordrecognizertheauthorsin[GB03a ]proposeaninterestingfeature selectiontechniquebasedontwoempiricalobservations :1)twoHMMsclassiersusingdierent feature sets but the same HMMtopology oftenhave similar (oridentical) paths for the correct class. 2) the HMM with the highest score given one feature set is also very often among the HMMswithvery highscoresusing anotherfeature set.

Aftera slant and skew correction anda normalization procedurea sliding window is moved fromlefttorightovertheword.Theextractedfeaturesare:theproportionofblackpixelsinthe window, thecenterof gravity,and thesecond ordermoments. Thesefeaturesarecharacterizing theword from aglobal point of view.The other featureslike the position of theupper and the lowermost pixel, the number of black and white transitions in the window and the fraction of blackpixelsbetweentheupperandthelowermostblackpixelsisconsidered. Aslowerandupper case areconsidered for eachletter a HMMis built.The character HMMsareconcatenated into word models,sothis approachallowsto sharetrainingdataacrossdierent words.Theresultof 77,2% can be considered agoodscore ifwe considerthe factthat2,296 word classes have been usedfor thedierent experiments.

(36)

linearHMMsto recognizehandwritten Hangulcharactersbelongingtoalarge vocabulary.Each HMM hasas inputa given regional projection contour prole (RPCP) like horizontal, vertical, horizontal-vertical and diagonal-diagonal to consider the all possible senses of writing. Such an RPCP allows to transform a compound pattern or a multi-contour pattern into a unique outercontour. Thecombinationisbasedontheideathatclassierswithdierentmethodologies or features are usually complementary to each other. Fur this purpose weight combining and majorityvoting [BVM

+

04]were used.

Thisapproachallowsustothink thatasimplelinearHMMisnotsuciently enoughtoconsider thedimensionalityofhandwriting andmore sophisticated methods should be proposedfor such ataskwhere not justthetemporalaspect shouldbepreserved butthe spatialaspecttoo.

So far the dierent system presented can be classied as mono-dimensional

1D

systems thatmeansthe modelsdevelopedconsider thehandwriting asaone-dimensionalsignal. Namely the observation symbols are coded accordingly as presented in the pioneering work of Rabi-ner[Rab89 ].

As stated before, a truly

2D

extension of the architecture raises high computational com-plexityproblems,somescientisthaveproposeddierenttechniquestobypassthisdrawback.Our intention is not to give an exhaustive survey of these systems but to review some of the more interesting ones.

Bi-dimensionalsystem architectures using low-level features

AninnovativeideaisproposedbyLevinetal.[LP92 ] tomodelhandwrittendigits. Asstated bytheauthorstheone-dimensionalmodelsproposedbyRabinerforspeechcannotworkproperly forsignal which are

2D

intheir nature, moreprecisely

1D

1/2

ashandwriting is.

To bypass the handwriting constraint, they propose a planar modeling for the handwritten digits where the information is pixel based considering the color of each pixel composing the word shape.The results achieved usingthis new technique hasshowntheforce of themodelto dealwithhandwritingand thechoiceof the pixels asinputseemsto bea good solution.

Thedynamic time warping (DTW) knownasasuitablesolution tomatchareferencevector against an extracted measure vector giving an exact distance, should be extended to dynamic planarwarping (DPW). Onesolution isto dividethe image into sub-images where theclassical warping function can be foundbut such solutionis sub-optimal asstated by [DE87 ]. Therefore thealgorithmisimpracticalforrealsizeimages.Thesolutionproposedbytheauthoristoimpose someconstraintsinthemodeltobeabletoreducethealgorithmcomplexityasbeingpolynomial. The idea is to limit the number of admissible warping sequences in such a way that an optimal solution to the constrained problem can be found in polynomial time. The additional constraints usedare not arbitrary, but instead reect thegeometric property of thespecic set

(37)

of images being considered. Considering a statistical independence among the image columns, theauthors have introduced the PHMM (Planar Hidden Markov Model) or Pseudo

2D

Hidden Markov Model. Each local state in the PHMM was represented by its own binary probability distribution, i.e., the probabilityof apixel being

1

(black)or

0

(white).

Theachievedresultsforseparatedhandwritten digitsshowthesuperiorityofthemodel.The constraints imposed at the beginning help to reduce the computational complexity drastically and nd the optimal solution in lineartime which designates thePHMM as a powerful tool in

2D

objectrecognition problems.

BasedontheworkproposedbyLevin[LP92 ],Gillouxproposedanewsystemforhandwritten digits recognition [Gil94]. The PHMM (see Fig.2.8) observesthe pixel colors. Such a low-level approach adopted also by Gilloux, shows its importance among the others, where perceptual featureshave been considered. ThePHMMusedherecanbeconsideredasacontinuation ofthe basic PHMMproposed byLevin but itwas extendedindierent points.In thatcasethemodel structureisdierent.Insteadofconsideringthedistributioninthesuper-stateofthePHMM,the authors consider super-state classes, where the secondary HMM states can be integrated. The approach is really innovative as the distribution is calculated not column-wise but state-wise allowingto modelmorepreciselythe

2D

deformationsofwriting.Thismethod alsopreserve the hypothesisconcerning the independencebetween the dierent image columns.

Figure 2.8 ThePHMM proposedbyGilloux [Gil94 ].

Amajor contribution oftheauthoristheusageoftheMarkovnetwork which canbetrained with exponential complexity. The training process of such a model is exponential as stated also by Levin [LP92 ]. However, considering that the information repartition is given by the previous PHMM, the bi-dimensional dependency between the states can be calculated directly astheir distribution isgiven a priori. Thedrawback of this modelisthan therepartitionof the

(38)

informationinthe dierent statesissub-optimal. Eveniftherepartitioniscorrect(whichisnot necessary assured) the algorithm is based on Viterbi search which is of course a sub-optimal search mechanism.

Park and Lee propose a totally bi-dimensional Markov model [PL95], namely the Hidden MarkovMeshRandom Field (HMMRF) for handwritten character recognition. Theimages are decomposedin

n × n

windows where theblackpixeldensityisconsideredasobservation forthe model. The authors propose in this model a new decoding algorithm which allows to preserve the completely bi-dimensional relation between the dierent observation. For this reason the decodingisbasedon thehypothesis called"look-ahead" which means themarginal distribution isconsidered asbeingoptimal.

Theexperimentalresultsreportedbytheauthorsconcerningthedigit databaseofConcordia Universityoutperformtheresultsreportedby

1D

linearHMMsorPHMMsforthesamedataset. For printed Arabic word recognition, Amara [ABE98 ] propose also a PHHM without any a priorisegmentation.Theapproachisglobaltryingtomodelpseudowords(seeFig.2.9)occurring oftenis Arabicwhich isasemi-cursive script.A word canbeconstructed fromupto 10 pseudo-wordscalledalsoPAW(Piece of ArabicWord). Such amodeling approach isconsideredbecause theseelementscanquicklybeisolatedinthescriptusing connectedcomponent ndingschemes. Evenifwe cannd themindierent positions they notchange very much their shapewhilethe letterscan have dierent shapesinfunction oftheir positionintheword.

The topology used here is derived directly from the inputas the horizontal pixelsequences havingthesamecolorareconsidered asbeingtheobservations.Theobservationisdependent on theduration of the identical pixel sequence and inthe mean time for the black color sequence the immediate upper neighborhood is considered. This allows to highlight the correspondence between the image lines. The secondary HMM observesthe image lines more precisely the suc-cessionofblackandwhitepixelsequences,whilethemainHMMstatesobservethesuccessionof thelineswhich provides to the model a bi-dimensional aspect. The results achieved for printed Arabic city names is excellent (96.87%-100%) but the size of the vocabulary is reduced. 100 PAWs have been consideredcontaining upto 3 characters.

AsstatedalsobyChoisy[Cho02 ]thisworkshowsthegeneralityofthemodelusedforArabic scriptalsoandinthe sametimethe discriminativepowerofthepixelinformation characterizing thedierent pseudowords inapseudo

2D

representation.

DerivedfromthistheorybasedontheextensionoftheDTWtoDPW,Saon[Sao97 ]proposed a system so called NSHP-HMM (Non Symmetric Half-Plane Hidden Markov Model) for the recognition of handwritten words on literal bank check amounts. The designed scheme (see Fig.3.4)combinesadvantageouslyaHMM(HiddenMarkovModel)andaMRF(MarkovRandom Field). It operates on pixel level, in a holistic manner, on height normalized images which are considered as random eld realizations. The HMM analyzes the image along the horizontal

(39)

Figure 2.9 ThecorrespondingPHMMfor theArabic paw[ABE98 ]

directionof writing,consideringinthe dierentstatesofthe HMMtheobservation probabilities givenbythedierentimagecolumnsestimatedbycausalMRF-likepixelconditionalprobabilities. Sincetheconsideredvocabularyhasareducedsizecontainingjust26wordssuchaholisticmethod is applied. No grapheme segmentation step is required, so thecommonly encountered under or over-segmentation problems areavoided.

Toextendtheprevioussystem,Choisyproposedto introduceintheNSHP-HMManimplicit segmentation [Cho02]. This system (see Fig. 3.1) is also based on pixel column observations produced by theNSHP but insteadof usinggeneral wordmodelsasincase ofSaon,theauthor proposes to build a general word NSHP-HMM. The word model based on letter NSHP-HMMs and word meta-models isable to re-estimate the lettermodels andtheligatures between letters throughoutthegeneral wordmodel.Suchkindofre-estimationismuchmorepreciseasthebasic letter HMMconcatenation usedsooftenintheliterature.

Figure 2.10 The NSHP-HMMconsideredbyChoisy for bankcheck amounts recognition Forthisschemeacross-learningmechanism[CB02 ]wasdevelopedandimplementedwith suc-cessfor dierent handwritingsbelongingalwaysto reducedsizevocabularies.The cross-learning resides in theclassical re-estimation of theglobal word model and based on this re-estimation, the letter models and the word meta model are also re-estimated allowing to consider the

(40)

dif-ferent context for the letter models and the dierent ligatures for the word meta models. The noveltyof the approach resides inthetraining mechanism basedon theconvergence of thewell knownBaum-Welchalgorithm[Rab89 ]andtheinformationdispatchinthemeta-modelsand let-termodels considering the general word models. Whilefor thetraining such re-estimation ow is considered, to test the models the general word models are built based on theletter models andthemeta models respectively.

For Arabic handwriting recently Touj et al. [TNEBA04 ] consider a planar architecture for modelingandrecognition.TheschemeproposedisbasedontheworkofLevin[LP92]where ve dierent horizontal HMMshave been considered. Eachofthem isassociatedto aone horizontal zoneoftheArabichandwriting. Thedierent zonesare:upperdiacriticszone, upperzonepart, middle-zone or busy zone, lower zone part and lower diacritics respectively. These HMM are considered as being the observations for the up-down HMM which models the variations bet-weenthedierentwriting zonesconsideringalsothedierentmorphologicalvariations ofArabic script. In that sense the segmentation procedure for such a scheme is vital. The segmentation is subdivided into four parts : rstly a horizontal segmentation followed by a vertical one in themiddle zone isperformed while thethird and fourthsegmentation concerns theposition of thegraphemesassociated to extensionsand diacritics.After thesegmentation process a feature extractionis performed basedmainly on perceptual featureslikediacritics which can be distin-guishedbasedontheir dimension andpixeldensity,ascendersand descendersand inthemiddle zonecontaining alarge varietyofinformation a8 dimension vectoris extracted.

Theresults obtained ontheIFN/ENITdataset containing handwritten Tunisiancitynames areencouraging (72%) but considering the size of the vocabulary (25entries) the results steps behind. As mentioned above the main drawback of the systemis its sensitivityto thedierent geometrical transformations as the dierent horizontal parts of the writing should be clearly distinquishable.

Considering the same baseline scheme as Touj, Wang et al. in [WBKR00] propose a HMM basedmodelingtogetherwithanextendedslidingwindowfeatureextractionmethodtodecrease theinuenceofthebaselinedetectionerror.Theresultsshownthatthemodelcanachievebetter recognitionperformancesandreducethe errorratesignicantly comparedwithclassicalmodels. Thecodingoftheframesintoobservationhasaweakpoint.Thegeneratedfeaturevectorsare depending upon theaccuracyofthe baseline detection. Asstated bytheauthorssuch areliable detectionmethod doesnot existsotheyareproposinga newfeatureextractionscheme which is much more tolerant to theerrors committed bythe baseline extractor. The new feature vector iscomposedbylocalmeans ofthe dierent writing zones dividedinframeswerethepercentage ofblack pixeliscalculated.

To achieve higher accuracy in [MG96 ] the authors combine a segmentation-free technique basedon matchingwithasegmentation basedone,where dynamicprogramming hasbeen used.

(41)

Thefeatureextractionisbasedonlow-levelfeaturesextractedfromeachimagecolumnlike loca-tionand numberoftransitionsfrom backgroundto foregroundpixels alongtheprocessedimage verticallines(columns).ThecombinationbasedonthresholdsandBorda count issuccessfulbut the results of theclassiers are still not satisfactory due to the sensitivity of themodels to the dierent slant andskewmodication.

Insummary,aswecanobservethedierenthandwritingrecognition systemsworkingon low-levelfeaturesachievegoodrecognition scoresforsmallsizeandmiddle-size vocabulariesbutthey are very sensitive to the dierent variations, distortions introduced by the writer, the writing deviceand thedigitization process.

For areduced vocabularylike separateddigits, the superiorityof theneuralbased approach instead of the stochastic one is considerably. The results can be explained by the fact that in case of digits the number of classes is reduced, the variability is not so huge while for words recognition suchtechniquedoesnotworksatisfactoryasjusta stochasticmodelconsidering the temporalaspectof the inputsignal isableto model correctlythecursivehandwriting.

High-level perceptual features based handwriting recognition systems

Foro-line unconstrainedhandwritten word modeling andrecognition [YGSS99]El-Yacoubi etal.proposedahiddenMarkovmodel-basedapproachdesignedtorecognizehandwrittenwords for a large vocabulary. To reduce the irrelevant information such asnoise and intra-class varia-tions,a fourstep preprocessing mechanismisproposed. Firstlya baselineslant normalizationis performed,followedbyalowerletterarea(upper-baseline)normalization andwhendealing with handwritten cursivewordscharacterskewcorrection.Finally asmoothingis applied inorder to beableto extract featureslike ascenders, descenders,loops, etc.

Asacontextisavailable thefeatureextractionisperformedatsegment levelbut considering also the positions of the loops. The explicit segmentation is based on image upper contour minima allowing to the segmentation to propose a high number of segmentation points. After theextractionofglobalfeatures(27),afeaturesetbasedontheanalysisofthecontourtransition histogram isperformed(14symbols).Also some segmentation features(5)have been used.

The complex letter method presented in Fig. 2.11 allows to model the dierent letters as a succession of two or three graphemes. To model the dierent words written on lowercase or uppercase, two parallel models have been integrated in the general word model to be able to considertheword inuppercase theword inlowercase andamix-upoflower-caseandupper-case letters. The results obtained for real French city names extracted manually from theenvelopes areexcellent considering thehuge variability of writing inthedataset size. (10w, 100w, 1000w) (99,02%, 96,3%, 87,9%) without any kind of rejection criteria. We should also mention than these high-classresults wereachieved by considering theinformation coming from thepincode

(42)

Figure 2.11 The complex letter model considering the dierent graphemes proposed by El-Yacoubi etal.[YGSS99 ]

recognizerhaving acondence value.

Similar approach can be found in [Koe02 ] where the author discuss in details this letter modeling aspectwithspecial considerationto the complexity.If we considera large vocabulary and we are taking into account all the occurring possibilities to mix uppercase and lowercase characters inthe same word,the numberofdecoded statesblows up.

Up tonowthis problemhasnot been addressed inhandwritingrecognition.The complexity ofthe searchinlexicaltreesusingmultiplecharacter modelsisareal challenge. Toovercomethe complexity of the problem, Koerich uses the maximum approximation to select only the more likelycombination, considering the localcontext.

Guillevic and Suen [GS95 ] propose a method for recognizing unconstrained, writer inde-pendent handwritten cursive words belonging to a small static lexicon, i.e. legal bank check amounts. After preprocessing, slant correction mainly, amount segmentation into words and extraction of global features for the recognition module are performed. Seven types of global features are extracted from the word image : ascenders, descenders, loops, estimate length of theword,vertical strokes,horizontalstrokes,diagonal strokes. Thresholdfor ascendersand des-cenders aredetermined empirically and areexpressed asa percentageof themain bodyheight. Word length is estimated as the number of central threshold crossings. Strokes are extracted usingmathematicalmorphologyoperations. Forclassication purposenearestneighbor classier isused.

Madhvanathetalin[MG01b]discusstheuseofholisticfeaturesforanaddressreading classi-erimplementedatCEDAR.Featuresusedbythe systemarewordlength,numberandposition ofascenders, descenders,loops, andpointsof return.Macrofeaturesor compositefeatures such as""and"ty"arealsoextractedand usedtoenhancetheclassierscores.Featureequivalence rulesprovide means of normalization amongdierent styles.

AninnovativefuzzyapproachisproposedbyRodriguesandLingin[RL01 ]toextractfeatures fromhandwriting basedon acorpus of Brazilianbank checks and to classifythem witha fuzzy