OCR

=**What is OCR?**= OCR is the acronym for Optical Character Recognition. This technology allows a machine to automatically recognize characters through an optical mechanism. Human beings recognize many objects in this manner our eyes are the "optical mechanism." But while the brain "sees" the input, the ability to comprehend these signals varies in each person according to many factors. By reviewing these variables, we can understand the challenges faced by the technologist developing an OCR system. First, if we read a page in a language other than our own, we may recognize the various characters, but be unable to recognize words. However, on the same page, we are usually able to interpret numerical statements - the symbols for numbers are universally used. This explains why many OCR systems recognize numbers only, while relatively few understand the full alphanumeric character range. Second, there is similarity between many numerical and alphabetical symbol shapes. For example, while examining a string of characters combining letters and numbers, there is very little visible difference between a capital letter "O" and the numeral "0." As humans, we can re-read the sentence or entire paragraph to help us determine the accurate meaning. This procedure, however, is much more difficult for a machine. Third, we rely on contrast to help us recognize characters. We may find it very difficult to read text which appears against a very dark background, or is printed over other words or graphics. Again, programming a system to interpret only the relevant data and disregard the rest is a difficult task for OCR engineers. There are many other problems which challenge the developers of OCR systems. In this paper, we will review the history, advancements, abilities and limitations of existing systems. This analysis should help determine if OCR is the correct application for your company's needs, and if so, which type of system to implement.

=**What are its Applications?**= OCR has been used to enter data automatically into a computer for dissemination and processing. The earliest of systems was dedicated to high volume variable data entry. The first major use of OCR was in processing petroleum credit card sales drafts. This application provides recognition of the purchaser from the imprinted credit card account number and the introduction of a transaction. The early devices were coupled with punch units which made small holes to be read by the computer. As computers and OCR devices became more sophisticated, the scanners provided direct access into the CPU (computer processing unit). This quickly lead to the payment processing of credit card purchases, known as "remittance processing". These two applications are still the two major applications for OCR. Over time, other applications evolved. They included cash register tape readers, page scanners, etc. Any standard form or document with repetitive variable data would be a candidate application for OCR. Some very imaginative applications have evolved. Perhaps the most innovative are the Kurzwell scanners which read for the blind. With these devices, the optically scanned pages are converted to spoken words.

=**What are its Limitations?**= OCR has never achieved a read rate that is 100% perfect. Because of this, a system which permits rapid and accurate correction of rejects is a major requirement. Exception item processing is always a problem because it delays the completion of the job entry, particularly the balancing function. Of even greater concern is the problem of misreading a character (substitutions). In particular, if the system does not accurately balance dollar data, customer dissatisfaction will occur. The success of any OCR device to read accurately without substitutions is not the sole responsibility of the hardware manufacturer. Much depends on the quality of the items to be processed. Through the years, the desire has been: *To increase the accuracy of reading, that is, to reduce rejects and substitutions *To reduce the sensitivity of scanning to read less-controlled input *To eliminate the need for specially designed fonts (characters) *To read handwritten characters. However, today's systems, while much more forgiving of printing quality and more accurate than earlier equipment, still work best when specially designed characters are used and attention to printing quality is maintained. However, these limits are not objectionable to most applications, and dedicated users of OCR systems are growing each year. But the ability to read a special character is not, by itself, sufficient to create a successful system.

[]