3. Electronics - IJECEIERD - HANDWRITTEN - PRITPAL SINGH - Paid

Page 1

International Journal of Electronics, Communication & Instrumentation Engineering Research and Development (IJECIERD) ISSN 2249-684X Vol.2, Issue 3 Sep 2012 27-37 Š TJPRC Pvt. Ltd.,

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS PRITPAL SINGH & SUMIT BUDHIRAJA Department of Electronics and Communication Engineering, UIET, Panjab University, Chandigarh, India

ABSTRACT This paper presents an OCR (optical character recognition) system for the handwritten Gurmukhi characters using different wavelet transforms. There is a lot of research work available in case of optical character recognition of various languages like English, Chinese, and Arabic etc. But in case of handwritten Gurmukhi script very less research has been done. In this paper, Different Wavelet transforms have been used for feature extraction. Also zonal densities of different zones of an image have been used in the feature set. In this work, 50 samples of each character have been used. The back propagation neural network has been used for classification. An average recognition accuracy of 81.71% has been achieved.

KEYWORDS: Optical Character Recognition, Handwritten Gurmukhi Script, Wavelet Transforms, Feature Extraction, Zonal Densities.

INTRODUCTION Handwritten character recognition is a challenging research area in the field of pattern recognition. Character recognition is a process of conversion of an image of a handwritten or printed text in to a computer editable format. Handwritten character recognition has been broadly classified in to two types: 1.

Online handwritten character recognition

2.

Offline handwritten character recognition

In online handwritten character recognition the character is recognized as soon as it has been written. On the other hand, in offline handwritten character recognition the character has been written first, and recognition has been performed later on. In this paper the optical character recognition system for the Gurmukhi characters has been proposed. The recognition of handwritten characters is very difficult. There are many external and internal factors which cause difficulty in recognition of characters in an OCR system for handwritten characters. The external factors are:


28

Pritpal Singh & Sumit Budhiraja

1.

variation in the shapes of characters

2.

variation in writing styles of different writers

3.

possibility of wrong recognition due to similarity between different characters

The internal factors are: 1.

distortion in the character images during scanning of images

2.

addition of noise during image acquisition

3.

degraded and broken characters images

The above mentioned factors are responsible for the reduction in the accuracy of the handwritten character recognition. Same problems have been faced in recognition of handwritten Gurmukhi characters. There are 35 basic characters in Gurmukhi script which are shown in Fig. 1. Gurmukhi script is used for Punjabi language. Punjabi language is the 14th most widely spoken language of the world and main language in the northern India. Some of the features of Gurmukhi script have been given below: 1.

Gurmukhi script is cursive

2.

It has been written from left to right.

3.

There is no concept of upper or lowercase characters in Gurmukhi script.

Figure 1. Gurmukhi Characters The handwritten recognition of Gurmukhi characters has following applications: 1.

In the field of automatic sorting of letters in postal services

2.

In automatic processing of various handwritten forms in various government departments and institutes

3.

In digitization of old manuscripts


Handwritten Gurmukhi Character Recognition Using Wavelet Transforms

4.

29

In banks for automatic verification of customer signatures

In case of Gurmukhi characters there are some problems which lead to reduction in recognition accuracy. These problems have been discussed below. 1.

There is always variation in the shapes of characters and variation in writing styles of different writers as shown in Fig. 2.

-

Figure 2. Few Samples of Gurmukhi Characters 2.

Similarity between different characters which leads incorrect recognition has been shown below: and

3.

,

and

,

and

etc.

It may be possible that during scanning images can distort, noise can be added during image acquisition etc.

The organization of this paper is as given below: in Section 2 previous work has been discussed, in section 3 proposed recognition system has been given, and in section 4 the experimental results have been discussed and conclusion has been drawn in section 5.

PREVIOUS RELATED WORK Several techniques have been proposed for handwritten as well as printed character recognition in various research works. Vikas J Dongre et al. has given a review of various techniques used for feature extraction and classification of Devnagari character recognition [1]. The various feature extraction techniques like Fourier transforms, wavelets, zoning, projections etc has been discussed in [1]. O. D. Trier et al. has also given a detailed survey of various feature extraction techniques [2]. Raju G. has proposed an OCR system for Malayalam characters in which different wavelet filters have been used as


30

Pritpal Singh & Sumit Budhiraja

feature extraction method and MLP network has been used as classifier [3]. An average recognition rate of 81.3% has been achieved in this work. Another OCR system for Malayalam characters has been given by M Abdul Rahiman et al. [4]. The proposed system has used Daubechies wavelet (db4) for feature extraction and neural networks for recognition. The accuracy achieved in this work was 92%. Piu Upadhyay et al. has been developed an OCR system for bangla characters [5]. In this work, Images have been scaled to a pre defined area. Characteristic points have been extracted to get feature vector. An artificial neural network has been used for classification purpose. The recognition accuracy of 98% has been achieved. G S Lehal et al. has given an OCR system for printed Gurmukhi script [6]. The feature extraction has been done using the structural features and binary classifier trees and nearest neighbour classifier has been used. It has been found that an accuracy of 96.6% has been obtained in this work. Puneet Jhajj et al. has first resized the original image to 48*48 pixels normalized image and created 64 (8*8) zones to find zonal densities [7]. These zonal densities have been taken as features. The SVM and K-NN classifiers have been used for classification process. It has been found that 72.83% was the highest accuracy with SVM (Support Vector Machine) with RBF kernel. Ubeeka Jain et al. has been created horizontal and vertical profiles for each chracter, stored height and width of each character and used neocognitron artificial neural network for feature extraction and classification [8]. The accuracy of 92.78% has been achieved. Kartar Singh Siddharth et al. have been used statistical features e.g. zonal density, projection histograms (horizontal, vertical and both diagonal), distance profiles (from left, right, top and bottom sides) for feature extraction [9]. In addition to these features background directional distribution (BDD) features have also been used. The images have been normalized to 32*32 sizes. For classification process SVM, K-NN and PNN classifiers have been used. The highest accuracy of 95.04% has been obtained as 5-fold cross validation of whole database using zonal density and background distribution features in combination with SVM classifier used with RBF kernel. Pritpal Singh et al. have been proposed OCR system for Gurmukhi script in which the feature extraction has been done using Daubechies Wavelet Transforms [10]. The back propagation neural network has been used for classification. In this work, first five characters of Gurmukhi have been considered. Kartar Singh Siddharth et al. have proposed an OCR for handwritten Gurmukhi numerals [11]. For feature extraction distance Profiles and Background Directional Distribution (BDD) has been used .The SVM classifier with RBF (Radial Basis Function) kernel has been used for classification. It has been found that the maximum recognition accuracy was 99.2%. Another approach has been discussed for the handwritten Gurmukhi character recognition in [12]. In this approach Gabor filters has been used for feature extraction and SVM classifier with RBF kernel has been used. The recognition accuracy of 92.29% has been obtained.

THE PROPOSED RECOGNITION SYSTEM FOR GURMUKHI CHRACTERS Handwritten OCR system for Gurmukhi characters consists of several stages. The stages for recognition process of handwritten Gurmukhi characters are given below:


Handwritten Gurmukhi Character Recognition Using Wavelet Transforms

1.

Image acquisition

2.

Pre-processing

3.

Feature extraction using different wavelet transforms

4.

Classification using BP neural network

5.

Recognised character

31

Image Acquisition

Image Pre-processing

Feature Extraction using different Wavelets transform

Classification using BP neural network

Recognised character Figure 3. Block Diagram of Handwritten Gurmukhi Character Recognition System The block diagram of handwritten Gurmukhi characters recognition system has been shown in Fig. 3. The proposed OCR system for the handwritten Gurmukhi characters has been explained below: Image Acquisition The samples of handwritten Gurmukhi characters have been taken from different writers. There are total 1750 samples (50 samples for each character) which have been used in the proposed recognition system. Out of these, 1400 samples have been used for training of neural network and 350 samples have been used for testing. These samples are taken by scanning the handwritten Gurmukhi characters at 400 dpi. Some samples of Gurmukhi have been shown below in Fig. 4. Pre-Processing In the pre-processing stage the Recognition system has given a raw scanned colour image then following operations have been performed on it: 1.

Conversion of colour image in to grey image.

2.

Median filtering is performed to the image to remove noise.

3.

The image then converted in to the binary image using thresholding.

4.

The binary character image is normalized to 32*64.


32

Pritpal Singh & Sumit Budhiraja

5. Gurmukhi Character

Samples of Gurmukhi Character

Gurmukhi Character

Samples of Gurmukhi Character

Figure 4. Samples of Gurmukhi Characters


33

Handwritten Gurmukhi Character Recognition Using Wavelet Transforms

Feature Extraction Wavelets are localized basis functions which are translated and dilated versions of some fixed mother wavelet. The decomposition of the image into different frequency bands is obtained by successive low-pass and high-pass filtering of the signal and down-sampling the coefficients after each filtering. In this work, various discrete wavelet transforms e.g. Daubechies, symlet, coiflet, biorthogonal etc have been used. The feature extraction is done by using the following algorithm: For each pre-processed image following steps have been repeated: 1.

Number of black pixels along each row of the binarized image has been counted to form a 32 sized vector.

2.

The 1D wavelet transform on row count vector (two levels) has been applied.

3.

Then the approximation (low frequency or average) coefficients have been directly taken as feature values.

4.

Number of black pixels along each column has been counted to form a 64 sized vector.

5.

The 1D wavelet transform on column count vector (three levels) has been applied.

6.

Then the approximation coefficients have been directly taken as next feature values.

7.

Divide each 32*64 image in to 16 zones of size 8*16.

8.

Then find the mean zonal densities of these 16 zones.

9.

Take these as the next 16 values of feature vector.

10. Divide each 32*64 image in to 8 zones of size 16*16. 11. Compute sum of pixels for each of 31 diagonals and then find average value for each zone. 12. In this way, eight values have been obtained for eight zones. 13. Take these as the next 8 values of feature vector. 14. Take aspect ratio as the last feature element of the feature vector. Above explained steps are repeated with different wavelet filters viz. db1, db4, sym2, sym4, coif3, coif5, bior1.3 and bior3.9. After the feature extraction has been done, the feature vectors lengths are summarized in the Table 1: Table1. Length of Feature Vectors S.No.

Wavelet filter

Length of feature vector

1

Db1

41

2

Db4

52

3

Sym2

45

4

Sym4

52

5

Coif3

67

6

Coif5

87

7

Bior1.3

48

8

Bior3.9

71


34

Pritpal Singh & Sumit Budhiraja

CLASSIFICATION The back propagation neural network has been used for classification of the Gurmukhi characters. Back Propagation Neural Network (BPNN), is a Multilayer Neural Network which is based upon back propagation algorithm for training. This neural network is based upon extended gradientdescent based Delta learning rule, commonly known as Back Propagation rule. The basic architecture of a back propagation neural network has been shown in Fig. 5.

Figure 5. Back Propagation Neural Network Architecture In this network, error signal between desired output and actual output is being propagated in backward direction from output to hidden layer and then to input layer in order to train the network. In this work the back propagation neural network has been used at two stages: 1.

To classify the 35 Gurmukhi characters in to 7 classes with 5 characters in each class.

2.

In second stage, characters are classified individually from each class. In first stage, input nodes equal to number of feature vector elements, one hidden layer with 30

nodes and 7 output nodes have been used. The testing input is fed into the input layer, and the feed forward network will generate results based on its knowledge from trained network. Then in second stage, if classification in the first stage is correct, the five characters of that class have been classified individually. In second stage, the input feature vector consists of 25 elements (16 values of average zonal densities, 8 values of average of diagonal values for each zone, aspect ratio), hidden layer has 30 neurons and output nodes have been changed to 5. The neural network of second stage has been trained from the 200 samples of the five characters of particular class. In the second stage, there are seven neural networks. The neural network in the second stage has been selected based on the output from the first stage.

EXPERIMENTAL RESULTS In this work, various Discrete Wavelet Transforms e.g. db1, db4, sym2, sym4, coif3, coif5, bior1.3, and bior3.9 have been used to extract the wavelet coefficients. Then a feature vector has been obtained by combining the wavelet coefficients, zonal densities, average diagonal values and aspect ratio which is given as input to the BPN network of first stage. The first stage classifies the character given as


35

Handwritten Gurmukhi Character Recognition Using Wavelet Transforms

test input to one class out of seven classes. Then accordingly the second stage selects the trained neural network belonging to that class which recognises the character from five characters of that class. The outcomes, for first stage, have been summarized in Table 2. Table 2. Comparison of Recognition Accuracy Using Different Wavelets for First Stage (using 35 characters) S.no.

Wavelet filter

%Recognition accuracy(with 35 characters)

S.no.

Wavelet filter

%Recognition

accuracy(with 35 characters)

1

Db1

5

coif3

6

coif5

7

bior1.3

8

bior3.9

90.6 2

Db4

3

sym2

4

sym4

92.3

90.3

91.7

90

89.4

92.3

92

The highest recognition accuracy is 92.3% using sym4 and coif3 wavelet transforms. The recognition accuracy using bior3.9 is 92.0 % and using coif5, it is 92.3%. But it has been observed that the use of characters

and

is very rare. So the experiment has

been repeated without these two characters. The wavelet transforms used for this experiment were db1, sym4, coif3, and bior3.9. The recognition accuracy with 33 characters has been summarized below in Table 3.

Table 3. Comparison of Recognition Accuracy Using Different Wavelets for First Stage (using 33 characters) S.no.

Wavelet filter

%Recognition accuracy(with 33 characters)

1

Db1

93.6

2

Sym4

3

Coif3

4

Bior3.9

94.5 94.8 96.1


36

Pritpal Singh & Sumit Budhiraja

Hence from the Table 3, it has been clear that bior3.9 has classified the Gurmukhi characters in to 7 classes with an accuracy of 96.1%. Then based on the output class of the first stage the neural network in the second stage is selected which have recognised the characters of that class. The outcomes of second stage have been listed in the table given below: Table 3. Final Recognition Accuracy using Different Wavelets after Second Stage Wavelet filter

Recognition accuracy for class

Average recognition accuracy

1

2

3

4

5

6

7

72

82.5

72.5

76

78

80

88

78.4

80

87.5

72.5

76

68

80

86

78.6

82

82.5

72.5

74

80

80

88

79.9

78

77.5

72.5

76

86

80

88

79.7

82

87.5

72.5

76

86

80

88

81.7

Db1 Sym4 Coif3 Bior3.9 Best value

The highest value of recognition accuracy after considering the best value for each class is 81.7%.

CONCLUSIONS In this handwritten Gurmukhi character recognition system the lesser elements in the feature vector has been used as compared to other OCR systems reported so far. The result obtained is comparable with similar works reported earlier. In this recognition system an average recognition rate of 81.7% has been achieved. As the size and quality of database is major factor influencing HCR systems, so relatively large database can be used in the future work. This will help to enhance the recognition accuracy. By adding some more features can also be helpful to enhance the recognition accuracy.

REFERENCES [1] Vikas J Dungre et al., “A Review of Research on Devnagari Character Recognition”, (November

2010) International Journal of Computer Applications, Vol. 12, No. 2. [2] O. D. Trier et al., “Feature Extraction Methods for Character Recognition- A Survey” (1996)

Pattern Recognition, Vol. 29, No. 4, pp. 641-662. [3] Raju G., “Wavelet Transform and Projection Profiles in Handwritten Character Recognition – A

Performance Analysis” (2008), 16th International Conference on Digital Object Identifier (ADCOM), IEEE, pp. 309-314. [4] M Abdul Rahiman et al., “OCR for Malayalam Script Using Neural Networks” (2009) Ultra

Modern Telecommunications & Workshops (ICUMT '09), IEEE, pp 1-6.


Handwritten Gurmukhi Character Recognition Using Wavelet Transforms

37

[5] Piu Upadhyay et al., “Enhanced Bangla Character Recognition using ANN”, International

conference on Communication Systems and Network Technologies ,(2011) IEEE, pp. 194-197. [6] G S Lehal et al., “A Gurmukhi Script Recognition System” (2000) Proceedings of the International

Conference on Pattern Recognition (ICPR'00). [7] Puneet Jhajj et al., “Recognition of Isolated Handwritten Characters in Gurmukhi Script” (2010)

International Journal of Computer Applications, Vol. 4, No. 8. [8]

Ubeeka Jain et al., “Recognition of Isolated Handwritten Characters of Gurumukhi Script using Neocognitron” (2010) International Journal of Computer Applications, Vol. 10, No. 8.

[9] Kartar Singh Siddharth etal., “Handwritten Gurmukhi Character Recognition Using Statistical and

Background Directional Distribution Features” (June 2011) International Journal on Computer Science and Engineering, Vol. 3 No. 6. [10] Pritpal Singh et al., “OCR for Handwritten Gurmukhi Script using Daubechies Wavelet

Transforms” (May 2012) International Journal of Computer Applications, Vol. 45, No.10. [11] Kartar Singh Siddharth et al.," Handwritten Gurmukhi Numeral Recognition using Different

Feature Sets", (August 2011) International Journal of Computer Applications, Volume 28, No.2. [12] Sukhpreet Singh et al., “Use of Gabor Filters for Recognition of Handwritten Gurmukhi

Character” (May 2012) International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 2, Issue 5.


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.