International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395 -0056
Volume: 03 Issue: 07 | July-2016
p-ISSN: 2395-0072
www.irjet.net
Devanagari OCR Using Projection Profile Segmentation Method Mr. Akshay J. Kant1, Dr. Mrs. Arati J. Vyavahare2 1Student, 2Professor,
E&TC Engineering, Modern College of Engineering, Maharashtra, India Dept. of E&TC Engineering, Modern College of Engineering, Maharashtra, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract -Optical character recognition (OCR) is
processing, in this the matched image is processed to output character in text file using ASCII code.
electronic translation of scanned images of printed or handwritten text into machine editable text. In India, for documentation Devanagari script is used by more than 300 million people. There are lots of research done on character recognition of Devanagari script. In OCR there are main two types of documents processed. Typed text that is machine printed and handwritten text. Character recognition of machine printed text is easy as compared to handwritten text. Because in typed text the font size and text quality is good as compared to handwritten text. In handwritten the style of writing vary from person to person, so it is very difficult to recognize the characters. There are so many languages covered under the Devanagari script like Sanskrit, Hindi, Marathi, Nepali, etc. In the proposed work the main focus is on Marathi character recognition. The process of OCR is first input image which is handwritten text is scanned and pre-processed. The pre-processed image is segmented into lines, lines into words and words into characters. The features are extracted from segmented characters using classifiers. Finally post processing where recognized image is converted into editable text.
2.FEATURES OF DEVNAGARI SCRIPT In India there are so many languages which are used in daily basis like Hindi, Marathi, Sanskrit, etc. In most of the language Devanagari script is used. Devanagari script has 11 vowels and 33 consonants. These are basic characters. The Devanagari script is written in left to write format. The word is divided into three parts. The upper part is called header part. All characters have horizontal line in upper part known as header line or Shirorekha. The middle part which contains actual body of characters. And the lower part. The following diagram shows all the three part of the word.
Key Words:Devanagari, OCR, Handwritten. Fig -1: Three zones of Devanagari word
3. PROPOSED OCR SYSTEM
1.INTRODUCTION
Following steps have been followed in the design of proposed OCR system: • Preprocessing • Segmentation • Feature Extraction • Classification.
Since the evolution of digital computer the interaction between machine learning and human computer is most challenging research field. In optical character recognition (OCR), the input image which is handwritten text is preprocessed first. In this stage the input image is converted into grayscale image and then to binary image using different method like Otsu method. If any noises are present in image then it get remove in this stage. We have to process image clean and clear for next step. In next step i.e. segmentation the image is broken into characters. The image is set of lines, words and characters. In segmentation step each line, word and characters are separated. This step is very important. As if we get proper character segmented image then it helps to recognize image properly. The next classification step, in this the segmented characters are matched with stored database characters set. The matching process like feature based, edge based, and pattern based, etc. Final step is post
© 2016, IRJET
|
Impact Factor value: 4.45
Fig -2: Steps of OCR system
|
ISO 9001:2008 Certified Journal
|
Page 132