Home > Computer science essays > Optical character recognition

Essay: Optical character recognition

Essay details and download:

  • Subject area(s): Computer science essays
  • Reading time: 15 minutes
  • Price: Free download
  • Published: 1 November 2022*
  • Last Modified: 28 August 2024
  • File format: Text
  • Words: 4,167 (approx)
  • Number of pages: 17 (approx)

Text preview of this essay:

This page of the essay has 4,167 words.

Optical character recognition, Language Identification, Translation and finally displaying mechanism are the main components of the optical character translation using spectacles (OCTS). In this essay, it describes available information of Optical character recognition, and it helps to reap character conversation mechanism.

Optical Character Recognition

A brief description of the history of Optical Character Recognition (OCR) as follows. In 1929, Gustav Tauschek got the patent for OCR in Germany and later Handel, US patent for OCR in the United States of America in 1933 and 1935 also awarded Tauschek US patent on his method. Later he introduced a mechanism to detect the photos from the documents [1].

Radio Corporation of America (RCA) engineers worked in 1949 to the first primitive PC-type OCR to help blind people to the United States Veterans Administration, but instead of converting Printed characters into machine language, their devices Converted into machine language and then spoke letters. It proved very expensive and not to seek after the test [2, 3].

A number of varied applications such as invoice imaging[4], legal industry [4], banking, health care industry [4], etc. Are used in Optical Character Recognition. OCR is also widely used in many other fields like Captcha [5], Institutional repositories and digital libraries[6], Optical Music Recognition [7] without any human correction or human effort, Automatic number plate recognition [8] and Handwritten Recognition [8] MCQ marking.

For the identification of the characters of many languages and convert as the editable character, OCR technology was widely used. Under OCR technology various researches have been conducted followed by extracting features and feeding them to different models for recognition. Many systems have been developed for Latin languages like English and Asian language like Arabic [9, 10] Devanagari [11].

Optical Character Recognition is classed into two varieties, Offline recognition, and on-line recognition. In offline recognition, the supply is either a picture or a scanned type of the document whereas in on-line recognition the ordered points are described as a operate of your document. [12]. Here in this only offline recognition is dealt.

Akmal and Ragel [13] discuss the importance of the OCR within the education field, Pool of the database depends on the supply Learning Resources, which may diverge from the previous written Documents to supply electronic materials. ancient Libraries play a very important role in the dissemination and Saved from this material. The fast transfer of Materials is offered in ancient libraries into digital type. an outsized quantity of manual effort is required if we tend to square measure to take care of the form and look of electronic documents same as written or print counterparts. Therefore, the conversion of documents plays a very important role in building Digital libraries.

Using an encoded character string dictionary, a method for recognition of machine printed Tamil characters was described by Siromoney et al. [14]. The scheme employs string features extracted by row- and column-wise scanning of the character matrix. Depending upon the complexity of the script to be recognized, the features in each row (column) are encoded suitably. A given text is presented symbol by symbol and information from each symbol is extracted in the form of a string and compared with the strings in the dictionary. Following a special method of transliteration the letters are identified and printed out in Roman letters when there is agreement. By using numerals printed above each letter the lengthening of vowels and hardening of consonants are signposted.

2.1.1 Feature Extraction Tesseract Algorithm

Eikvil [15] described a way for extracts the options that square measure the characteristics and symbols. Here symbols square measure characterized and unimportant attributes square measure omitted. The feature extraction technique doesn’t match concrete character patterns, however rather makes note of abstract options gift in a very character like intersections, open areas, lines, etc

Feature extraction is finished by the victimization Tesseract algorithmic program. Feature extraction thinks about with the illustration of the symbols. By extracting special characteristics of the image within the feature extraction part , the character image is mapped to a better level.

Fig. 1 Tesseract OCR Architecture

Fig. 2 Tesseract: Recognize Word

2.1.2 Binarization Algorithm

A method for binarization is indicated by Mollah et al. [16] Author developed an efficient binarization technique for a skew corrected text region to binarize before segmenting it. The algorithm has been given below. Basically, this is an improved version of Bernsen’s binarization method [17]. In his method, the arithmetic mean of the maximum (Gmax) and the minimum (Gmin) gray levels around a pixel is taken as the threshold for binarizing the pixel. In the present algorithm, the eight immediate neighbors around the pixel subject to binarization are also taken as deciding factors for binarization. This type of approach is especially useful to connect the disconnected foreground pixels of a character.

Binarization algorithm proposed by Mohan et al

begin

for all pixels (x, y) in a TR

if intensity(x, y) < (Gmin + Gmax)/2, then mark (x, y) as foreground else if no. of foreground neighbors > 4, then

mark (x, y) as foreground

else

mark (x, y) as background

end if

end if

end for

end

2.1.3 Detection Segmentation and Classification

Louloudis et al. [18] described techniques for the text line detection segmentation methodology which is based on and consists of three distinct steps. The first step includes, connected component extraction and average character height estimation of the binary image. In the

second step, a block – based mostly Hough rework is employed for the detection of potential text lines whereas a 3rd step is employed to correct double cacophonic, to find text lines that the previous step didn’t reveal and, finally, to separate vertically connected characters and assign them to text lines. These are the steps suggest by Louloudis et al to detect the segmentation for OCR.

Suresh et al. [19] introduces an approach to use the fuzzy concept on handwritten Tamil characters to classify them. The unknown and prototype characters are preprocessed and considered for recognition. An approximate but effective means of describing the behavior of ill-defined systems are provided by the theory of fuzzy set. Patterns of human origin like handwritten characters are to some extent found to be fuzzy in nature. It is decided to use fuzzy conceptual approach effectively.

2.1.4 Hidden Markov Models (HMM)

Hidden Markov Models (HMM) are stochastic methods to model temporal and sequence data. They are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following and bioinformatics.

Hidden Markov Models were first described in a series of statistical papers by Leonard E. Baum and other authors in the second half of the 1960s. One of the first applications of HMMs was speech recognition, starting in the mid-1970s. Indeed, one of the most comprehensive explanations on the topic was published in “A Tutorial On Hidden Markov Models And Selected Applications in Speech Recognition”, by Lawrence R. Rabiner in 1989[20].

Traditionally, HMMs have been defined by the following quintuple:

Where

 N is the number of states for the model

 M is the number of distinct observations symbols per state, i.e. the discrete alphabet size.

 A is the NxN state transition probability distribution given in the form of a matrix A = {aij}

 B is the NxM observation symbol probability distribution given in the form of a matrix B = {bj(k)}

 π is the initial state distribution vector π = {πi}

If opt out the structure parameters M and N we have the more often used compact notation

Language Identification

Language identification is the task of identifying the language in a given document or hard copy. After the Optical character conversation the system want to identify the language such as Tamil, Sinhala, Hindi or English. These identification steps are very important for further improvements in language translation. This identification method pave the way to find the source language very easily. There are several researches conducted related to this field.

Padma et al. (2009) proposed an algorithm for identification of language in an Indian multilingual Printed document containing the text in three languages English (General Language), Hindi (National Language) and Kannada (Regional/State Language). The algorithm uses the local geometric approach to identify the type of language [21].

Milne et al. (2012) has been proven that the language Determined by the documents short and long documents for Different. The book approaches – carried Baseline and Agua approach way for a long time as well Short documents from data sets that have been collected from Wikipedia and Europarl. The study four languages English, German, Spanish, and French. the principle Approach, is to build personal language of each language Which has the highest most common words. In the exam, It is compared to the unique words of its kind in the test documents with Features training and language more common words [22].

2.3 Language Translation

Language is vital for human communication. There are many countries wherever folks speak one language. In India, there is a unit over eighteen constitutional languages similar to Hindi, Bengali, Gujarati, Oriya, Punjabi, Telugu, Kannada, Tamil, Malayalam, etc. In Asian nation, Hindi is that the national language in India and English is that the common language for all states. Although Hindi is the national language, Hindi is spoken solely in northern states however within the southern region particularly in Tamil Nadu most of the folks speak solely in their regional language (i.e. Tamil). So, for higher communication English to Indian language machine translation is important.

Machine translation is that the method of translating a text from a source language into a target language with the help of computers. The interpretation method converts a text in one human

Language to a different that preserves not solely that means, but also the shape, impact, and elegance. Today most of the web information is accessible in English. During a multi-lingual society, different languages are spoken totally different in several regions. So, for this purpose machine translation is needed [23].

Photocopier as a Translator: a tool has been developed by Fuji Xerox which may scan a written sheet of Japanese text from a newspaper or magazine and provides out its translated version in Chinese, English or Korean, whereas retentive the initial layout [24].

Transliteration is a method that accepts a character string in language as input and generates a character string within the target language as output. The transliteration method principally involves the segmentation of the supplied string into written text units and relating the language written text units with units within the target language by resolution completely different combos of alignments and unit mappings. The training of the transliteration model takes place supported the alignments [25] [26].

M. S. Vijaya et al. [27] the system was trained with Weka belonging to a transliteration model supported multi-category classification approach for English to Tamil transliteration was incontestable. During this model, j48 decision tree classifier of Weka was used for classification. The feature patterns were extracted from the parallel corpus consisting of thirty thousand person names and thirty thousand place names and used it to train the transliteration model. The accuracy of the model was tested with one thousand English names that were out of corpus. The transcription model created a precise transcription in Tamil from English words with an accuracy of eighty-four percentage. The accuracy can be exaggerated by considering the consequent best output from the classifier. This could even be accustomed generate all possible transliterations for a given English word.

A. P. Uzoma, [28] discuss the 5 main elements of language square measure phonemes, morphemes, lexemes, syntax and context: Along with synchronic linguistics, linguistics, and linguistics, these elements work along to make significant communication among people. A Phoneme is that the smallest unit of sound that will cause a modification of that means inside a language however that doesn’t have that means by itself. A morpheme is that the smallest unit of a word that provides a selected intending to a string of letters (which is named a phoneme). There are main types of morpheme: free morphemes and sure morphemes. A lexeme is that the set of all the inflected sorts of a single word. The syntax is that the set of rules by that someone constructs full sentences. Context is, however, everything inside language works together to convey a specific that means. Semantics is consists of vocabulary and the way ideas are expressed through words.

2.4 Character regeneration with Arduino

Character regeneration involves the process of reconstructing or reproducing the characters identified and translated by the Optical Character Recognition (OCR) system into a format that can be displayed or further processed. In the context of the Optical Character Translation using Spectacles (OCTS), this regeneration can be efficiently managed using Arduino, a widely used open-source electronics platform.

Arduino is favored for its simplicity, flexibility, and the vast community support it enjoys. Its application in character regeneration involves interpreting the digital signals received from the OCR module and subsequently driving a display mechanism, such as an LCD or OLED, to visually present the recognized text.

One of the key components in this process is the Arduino microcontroller, which acts as the central unit for processing the data. After the OCR system converts the scanned image into a machine-readable string of characters, this string is transmitted to the Arduino through a communication protocol, such as I2C, SPI, or UART. Arduino, in turn, processes this data and regenerates the characters onto a connected display.

2.4.1 Integration of OCR Output with Arduino

The integration process begins with receiving the OCR-processed text. The text, once converted into a character array, is passed to Arduino through a serial communication interface. Arduino then interprets these characters, and with the help of libraries like LiquidCrystal for LCDs or Adafruit_GFX for OLEDs, it sequentially regenerates the characters on the screen.

For instance, if the OCR system identifies and transmits the word “HELLO,” Arduino will iterate over each character, converting it into the corresponding ASCII value and sending it to the display controller. This method ensures that each character is accurately represented in the display format selected.

2.4.2 Challenges in Character Regeneration with Arduino

While Arduino is versatile, it faces several challenges in character regeneration, primarily related to memory and processing speed. The Arduino Uno, for example, has only 2 KB of SRAM, which can quickly become a limitation when dealing with large texts or complex character sets, such as those found in Asian languages.

To overcome these limitations, developers often employ techniques like using external EEPROM or optimizing the code to handle strings more efficiently. Additionally, care must be taken to ensure that the communication between the OCR system and Arduino is robust, particularly in environments where signal noise might corrupt the transmitted data.

2.4.3 Practical Applications and Examples

A practical application of character regeneration using Arduino can be found in assistive technology for visually impaired individuals. For example, an OCR-enabled wearable device could scan printed text, process it through Arduino, and then display the text on a Braille display or read it aloud using a speech synthesis module.

Another example includes educational tools where Arduino-driven systems can regenerate characters for language learning devices, helping users to see the text in both its native script and a translated version, thereby enhancing the learning process.

2.4.4 Future Prospects and Enhancements

The future of character regeneration with Arduino looks promising, especially with the advent of more powerful microcontrollers like the Arduino Due, which offers more RAM and a faster clock speed. These advancements will enable more complex character sets and larger volumes of text to be processed more efficiently.

Furthermore, as display technologies advance, Arduino’s role in driving high-resolution and even 3D displays could open new possibilities in how characters are regenerated and presented, making the system more immersive and interactive.

By combining OCR with Arduino’s capability for character regeneration, OCTS can evolve into a more comprehensive and user-friendly system, capable of bridging the gap between printed and digital worlds, especially in accessibility and educational applications.

2.5 Character regeneration with Raspberry pi

Idea behind the creation of PI, a small and cheap system for youths came in 2006, once Eben Upton, Rob Mullins, Jack Lang and Alan Mycroft, primarily based at the University of Cambridge’s Laboratory, became involved about the year-on-year decline within the numbers and skills levels of the level students applying to read Computer Science. From a state of affairs within the Nineteen Nineties wherever most of the children applying were coming back to interview as experienced amateur programmers, the landscape within the 2000s was terribly different; a typical human may solely have done a bit internet design [29] [30].

In 2006, early ideas of the Raspberry Pi were based mostly on the Atmel ATmega644 microcontroller. Its schematics and PCB layout are publically available. Foundation trustee Eben Upton assembled a group of academics, teachers and system enthusiasts to devise a system to inspire kids. The system is impressed by Acorn’s BBC small of 1981. Model A, Model B and Model B+ were references to the initial models of a people academic BBC small system, developed by acorn Computers. The primary ARM model version of the pc was mounted during a package a similar size as a USB memory stick. It had a USB port on one finish associated an HDMI port on the opposite [31].

V. Gawande Et al. [32] discuss the pro and cons of Raspberry Pi, advantages are it is ATM -card sized single board system. Due to its low value, it’s reasonable. Due to its size, it are often hidden anyplace, behind TV sets, among walls etc. It provides high performance. It provides basic laptop functions like word process, net browsing etc. Disadvantages are though it are often used as a laptop however it’s closer to a mobile device. Since it’s not lined with any case, it is exposed and might be touched simply which may cause harm. It is time intense to transfer and install software and is unable to try to complicate multitasking.

D. Naga Madhavi and G. S. Sarma. [33] Discuss about CMOS camera, CMOS sensors having circuitry at the pixel element level. This suggests that every picture element on the detector is scan and transmitted at the same time, preparing voltage for the chip. The chip then uses additional technology, similar to amplifiers, noise correction, and conversion, to

Convert the voltage to digital information. This means that CMOS sensors don’t need a separate image processor. As a result of CMOS sensors ready to convert visual information to digital knowledge a lot of quickly than CCDs, they need less power, which preserves battery life.

L. Nagaraja. Et al. [34]. Describes Raspberry pi one of the SOC (System on Chip), that integrates several useful parts into one chip or chipset. The SOC utilized in Raspberry Pi two is that the Broadcom BCM2836 SOC multimedia system processor. The CPU of the Raspberry Pi contains an ARM Cortex-A7 900MHz processor that makes use of the architecture design and low power draw. It is not compatible with ancient computer code. Thus it’s to be connected to a monitor severally. Thus it’s typically referred to as a mini pc.

V. Ajantha Devi and S. S. Baboo, [35]. The aim of the paper is a methodology enforced to recognition sequence of characters and also the line of reading. As a part of the software system development [36] the Open CV (Open source Computer Vision) libraries is used to try and do image capture of Tamil text, to do the character recognition. Optical character recognition (OCR) is that the translation of captured pictures of written Tamil text into machine-encoded text. Its wide won’t to convert books and documents into electronic files to be used in storage and document analysis. OCR makes it doable to use techniques reminiscent of computational linguistics, text-to-speech and text mining to the capture / scanned page.

References

[1] K. Tyagi, V. Rastogi, “Survey on Character Recognition using OCR Techniques,” In International Journal of Engineering Research and Technology 2014 Aug 3, Vol. 3, No. 2 February-2014.

[2] L.A. Zadeh, “Fuzzy sets”, Information and Control, vol. 8 (3), pp. 338– 353, 1965.

[3] C. Olivier, “H2M: A set of MATLAB/OCTAVE functions for the EM estimation of mixture and hidden Markov model”, retrieved July 2016: http://webcache.googleusercontent.com/search?q=cache:http://perso.telecom-paristech.fr/~cappe/Code/H2m/h2m-doc.pdf

[4] R. Gossweiler, M. Kamvar, S. Baluja, “What’s up CAPTCHA?: a CAPTCHA based on image orientation. InProceedings” of the 18th international conference on World wide web 2009 Apr 20 (pp. 841-850). ACM.

[5] J. Barwick, “Building an institutional repository at Loughborough university: Some experiences,” Program: electronic library and information systems, vol. 41, no. 2, pp. 113–123, 2007.

[6] A. Singh, K. Bacchuwar, A. Choubey, S. Karanam, D. Kumar, “An OMR Based Automatic Music Player”, in 3rd International Conference on Computer Research and Development (ICCRD) in, (IEEE Xplore), 2011, Vol. 1, pp. 174-178, 2011.

[7] S. Chang, L. Chen, Y. Chung, and S. Chen, “Automatic license plate recognition,” IEEE Transactions on Intelligent Transportation Systems, vol. 5, no. 1, pp. 42–53, Mar. 2004.

[8] R. Plamondon and S. N. Srihari, “Online and off-line handwriting recognition: A comprehensive survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 63–84, 2000.

[9] H. Almohri, J. Gray and H. Alnajjar, “A Real-time DSP-Based optical character recognition system for isolated Arabic characters using the TI TMS320C6416T,” in IAJC-IJME International Conference,(The USA, 2008), 228.

[10] M. Ismail and S. Abdulla, “Online Arabic handwritten character recognition based on a rule-based approach,” online journal of Computer Science, 8 (11), 1859-1868, 2012.

[11] N. Sahu, S. Gupta and S. Khare, “Neural network based approach for recognition for Devanagari characters,” International journal of advanced Technology in engineering and science, 2 (05), 187-197,2014.

[12] “Document Analysis and Recognition”, 2005. Eighth International Conference on 29 Aug.-1 Sept. 2005, Alon, Jonathan

[13] M. A. C. Akmal Jahan and R. G. Ragel, “Locating tables in scanned documents for reconstructing and republishing”, In Information and Automation for Sustainability (ICIAFS), 2014 7th International Conference on, pp. 1-6, IEEE, 2014.

[14] G. Siromoney, R. Chandrasekaran, and M. Chandrasekaran, “Computer recognition of printed Tamil characters,” Pattern Recognition, vol. 10, no. 4, pp. 243–247, Jan. 1978.

[15] K. Aas and L. Eikvil, “Text page recognition using grey-level features and hidden Markov models,”Pattern Recognition, vol. 29, no. 6, pp. 977–985, Jun. 1996.

[16] A. F. Mollah, S. Basu, N. Das, R. Sarkar, M. Nasipuri, M. Kundu, Binarizing Business Card Images for Mobile Devices, in Computer Vision and Information Technology Advances and Applications, pp. 968-975, edited by K. V.Kale, S. C. Mehrotra and R. R. Manza, I. K. International Publishing House, New Delhi, India.

[17] J. Bernsen, “Dynamic thresholding of grey-level images”, Proc. Eighth Int’l Conf. on Pattern Recognition, pp. 1251-1255, Paris, 1986.

[18] G. Louloudis, B. Gatos and C. Halatsis, “Text Line Detection in Unconstrained Handwritten Documents Using a Block-Based Hough Transform Approach”, 9th International Conference on Document Analysis and Recognition (ICDAR’07), pp. 599-603, Curitiba, Brazil, September 2007.

[19] Suresh et al., “Recognition of Handprinted Tamil Characters Using Classification Approach”, ICAPRDT’ 99, pp: 63-84, 1999.

[20] L. R. Rabiner, “A Tutorial on Hidden Markov Models, and Selected Applications in Speech Recognition,” Proc. IEEE, Vol. 77, No. 2, pp. 257–286, Feb. 1989.

[21] M. C. Padma and P. A. Vijaya, “Language identification of Kannada, Hindi and English text words through visual discriminating features,” International Journal of Computational Intelligence Systems, vol. 1, no. 2, pp. 116–126, May 2008.

[22] R. M. Milne, R. A. O. Keefe, A. Trotman, “A Study in Language Identification”, ADCS’ 12, December 05-06, Dunedin, New Zealand, 2012.

[23] P. C, D. V, A. Kumar M, and S. K P, “Rule-based sentence simplification for English to Tamil machine translation system,” International Journal of Computer Applications, vol. 25, no. 8, pp. 38–42, Jul. 2011.

[24] “Photocopier that speaks English, Japanese, Chinese,” 2007. [Online]. Available: http://www.smh.com.au/news/technology/photocopier-that-speaks-english-japanese-chinese/2007/09/28/1190486545353.html. Accessed: Aug. 13, 2016.

[25] M. Vijaya, R. Loganathan, G. Shivapratap, V. Ajith, and K. Soman, “English To Tamil Transliteration Using Sequence Labelling Approach,” in International Conference on Asian Language Processing, Thailand, 2008.

[26] M. Vijaya, V. Dhanalakshmi, G. Shivapratap, V. Ajith, and K. Soman, “Sequence Labeling Approach for English to Tamil Transliteration using Memory-based Learning,” in 6th International Conference on Natural Language Processing ICON, Pune, 2008.

[27] M. Vijaya, R. Loganathan, G. Shivapratap, V. Ajith, and K. Soman, “English to Tamil transliteration using Weka.”International Journal of Recent Trends in Engineering “, vol. 1, no.1, 2009.

[28] A. P. Uzoma, “Translators as Agents of National Linguistic Promotion Revival and Development,” Middle-East Journal of Scientific Research, vol. 24, no. 8, pp. 2558–2562, 2016.

[29] “Raspberry pi foundation – about us,” Raspberry Pi, 2012. [Online]. Available: https://www.raspberrypi.org/about/. Accessed: Aug. 13, 2016.

[30] W. Stallings, Cryptography and Network Security Principles and Practices, 4th ed. Prentice Hall, 2005.

[31] J. Shepherd, “Red hat access labs,” 2016. [Online]. Available: http://securityblog.redhat.com/category/Cryptography/page/3/. Accessed: Aug. 13, 2016.

[32] M. S. Sejal, V. Gawande, D. R. Prashant, and R. Deshmukh, “Raspberry Pi Technology,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 5, no. 4, pp. 37–40, Apr. 2015.

[33] D. Naga Madhavi and G. S. Sarma, “VISION BASED ASSITIVE SYSTEM FOR LABEL DETECTION,” International Journal of Engineering Science & Advanced Technology, vol. 5, no. 3, pp. 262–268, Jun. 2015.

[34] L. Nagaraja, R. S. Nagarjun, N. M Anand, D. Nithin, and V. S Murthy, “Vision based Text Recognition using Raspberry Pi,” in National Conference on Power Systems & Industrial Automation (NCPSIA), International Journal of Computer Applications, 2015, pp. 1–3.

[35] V. Ajantha Devi and S. S. Baboo, “Embedded Optical Character Recognition On Tamil Text Image Using Raspberry Pi,” International Journal of Computer Science Trends and Technology (IJCST), vol. 2, no. 4, pp. 127–131, Aug. 2014.

[36] J. George, V. George, A. Deepan, J. Ninan, and Valson, “Optical Character Recognition Based Hand-Held Device for Printed Text to Braille Conversion,” in 3rd International Conference on Machine Learning and Computing (ICMLC 2011), 2011, pp. 172–175.

2016-8-21-1471762732

About this essay:

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, Optical character recognition. Available from:<https://www.essaysauce.com/computer-science-essays/optical-character-recognition-2/> [Accessed 17-01-25].

These Computer science essays have been submitted to us by students in order to help you with your studies.

* This essay may have been previously published on EssaySauce.com and/or Essay.uk.com at an earlier date than indicated.