EnginSoft - Handwritten pattern recognition with modeFRONTIER
EnginSoft
22-23 October 2012 Pacengo del Garda
(VR) - Italy

www.caeconference.com

2011 Conference Proceedings
2011 Conference Proceedings are now avaliable to download
2006-2010 Proceedings

download CAE proceedings

CHOOSE YOUR COUNTRY

Virtual Prototyping in Italy Virtual Prototyping in Germany Virtual Prototyping in France
Virtual Prototyping in Spain Virtual Prototyping in Norway Virtual Prototyping in United Kingdom
Other Countries...



Handwritten pattern recognition with modeFRONTIER


Each of us has his/her own handwriting. When two persons are asked to write the same text, they may produce two documents which look completely different, even though the content is the same. Moreover, the same person will not produce the same document if he/she writes at two different times. A person’s handwriting actually reflects the emotional state, it also strongly depends on the surroundings, e.g. the pencil, pen and paper being used. Figure 1 shows an image with the same simple text written by different persons. It becomes immediately clear that different styles are present. Words can have a “rounded style” or, on the contrary, be rich of corners; characters can be placed closed to or far from each other, they can always look equal in size or be different according to the position assumed in the word, they can be deformed etc. Despite this high degree of variability, we are able to read these texts and understand the meaning of the message. The handwritten pattern recognition is a discipline which tries to automatically recognize an handwritten text contained in an image and translates it into an editable text, written with a unique standardized font, making it available for further modifications. The result should be a machine (a series of physical devices and software) able to emulate the human capacity of recognizing the letters and the words, giving them a meaning, independent of the graphical style used. Ideally, this activity should be performed in a fast and accurate way, also with very long texts.

hand
Figure 1: The same simple text written by different persons may appear very different.



This kind of activity usually known as Optical Character Recognition (shortly OCR), is of great interest in various industrial contexts. As an example, the United States Postal Service has been using OCR machines since the mid 60s to read names, addresses and postal codes on letters, to speed up delivery processes. Another field of applicability is surely the bank checking process, where a great number of documents (such as cheques) and signatures have to be validated.


The OCR systems are usually divided into two categories; the online OCRs are able to recognize the text as it is written on-the-fly, maybe on a tablet (usually a small handheld computer, also known as personal digital assistant, or PDA), while the offline OCRs read a full text, previously written, and they translate it into an editable one. The handwritten pattern recognition system is still a subject of research; high rates of success in the recognition process are always a problematic issue, especially for Chinese and Arabian characters.
During the last decades, researchers have tried to mix a letter-by-letter recognition strategy, which seems not to be sufficient to achieve high success rates with a dictionary and contextual strategies, which are always used by human beings.
As explained above, the high variability in the graphical styles represents a key issue to deal with for the implementation of an efficient recognition strategy. For this reason, the vast majority of OCR systems are based on “fuzzy logic” algorithms, which can efficiently manage databases containing vaguely defined or in-contrast data.
Extensive research has been conducted in order to design efficient OCRs, and different approaches have been followed, using, for example, the Bayesian theory of decision, artificial neural networks and many other tools coming from the multivariate analysis of data, statistics and signal processes opportunely combined and used (see for example [1] and [2]).
A quite commonly used strategy for the offline character recognition (summarized in Figure 2) firstly needs an image containing the text to be analyzed. Secondly, the words have to be identified in the image file and letters have to be extracted. This phase is usually known as segmentation and it can be performed using many different approaches and strategies. Then, the letters have to be opportunely treated to be subsequently managed by the recognition algorithm: the main objectives are to reduce the noise and to normalize the letter.

hand
Figure 2: Some handwritten alphabets used to build the database used in this paper. It can be easily noted that equal letters can be graphically represented in very different manners.



The OCR has to be learned at first by using a sufficiently large and heterogeneous dataset. This is the crucial phase to get the best performance from the OCR and it is usually known as learning phase. Once the training has been completed, the OCR can be employed to treat automatically new texts which have not been used during the learning phase. This is the classification phase.
It is easy to understand that the preprocessing phase has a fundamental role, both during the learning and running phase of the OCR. Actually, high recognition scores are very often due to sophisticated preprocessing. In this paper, we especially focus on the last two phases, training and classification, showing how some tools provided in modeFRONTIER can be efficiently used to build a simple but efficient letter-by-letter OCR for the Italian alphabet. Finally, a free OCR software (download available in [4]) has been used to have a comparison and to judge the quality of the obtained results.


The database construction
The database used in this paper has been artificially built using the following approach. Some handwritten fonts have been downloaded (see [3]), and an image of the Italian capital alphabet for all the fonts has been produced. These images, totally 201, could be regarded as the collection of the alphabet written by different persons.

hand
Figure 3: A possible strategy for the handwritten pattern recognition. An image containing the text to be analyzed has firstly to be loaded in the system which starts with the segmentation phase. The words have to be isolated in the text and then they have to be decomposed into letters. Letters have to be normalized and the noise has to be reduced. The training phase (black arrows and boxes) consists in giving to OCR system a certain number of different letters, teaching in this way the “rules” underlying the handwritten recognition. Once the training has been completed the OCR can be asked to classify new letters not contained in the training set (classification phase, red arrows).



This way of building the database has been chosen among many others for its easiness and for the possibility to generate quickly a sufficiently large number of alphabets. The main drawback is probably a lack of “randomness” in the graphics, which is typical of a human handwritten text. Obviously, the interested reader can verify the effectiveness of the proposed techniques using other databases, may be constructed on real handwritten alphabets.
A first Matlab script reads the images, extracts the letters and writes them into a new set of 21 images. This procedure could be seen as a sort of simple text segmentation.
A second Matlab script is charged to read the letters, preprocess them and write some key information on a text file; this phase is composed by the following steps:

  1. read the image previously extracted from the text,
  2. transform the image into a pure black-and-white scale (e.g. the letter image is transformed into a matrix containing 1 where the pixel is on and 0 otherwise),
  3. skeletonize the image (reduce the image to its skeleton, to highlight the most important features of the letter),
  4. normalize the image (reduce the image to 15 rows for a 10 columns image),
  5. probe the image (measure the image’s depth at some given points),
  6. write a database (write on a text file all the relevant information).

The steps described above allow to extract some key features from a letter image and fix them into a database. The first fundamental step consists in a “binarization” of the image with a consequent reduction of noise; this makes the recognition process less sensitive to the image quality and color. The skeletonization reduces the letter thickness to just one pixel, preserving the continuity of the graph; this operation is fundamental to normalize the letters nullifying the thickness of the graphical sign, which is one of the most important sources of variability. The image downsizing is the last operation, but it is fundamental to treat images which originally can have any dimension and resolution.
Other steps could be introduced with the aim to improve the standardization of the letter; the reduction of insignificant features (such as curls), the reduction of letter inclination and the closure of gaps in the pen stroke. All these operations certainly help in improving the recognition rate but they require a rather complex implementation; this is out of the scope of this work and therefore, we decided to organize the simplest preprocessing possible while preserving a certain effectiveness and quality in results.
The database contains the matrix describing the normalized image (as explained above) and a vector of 21 components, one for each letter in the alphabet; the i-th component of the vector assumes a non zero value if the letter under exam is the i-th in the alphabet, 0 otherwise.
Moreover, three probes on each letter side (see Figure 4) are used to measure the distance between the boundary and the first black pixel in the image; this allows to have some information on the shape of the letter in the database.
The obtained database can be loaded in modeFRONTIER following the wizard step by step. The variable names are loaded automatically, simplifying the user work.

hand
Figure 4: Three probes are used for each letter side to measure the distance between the boundary and the first black pixel in the image. These probes allow to have important information on the shape of the letter, which could be, in some cases, extremely important to correctly identify the letter under exam. For sake of clarity some white bounds around the letters have been left in the picture; the adopted procedure uses the probes on a cropped image which do not have any white space around.



The SOM approach

As explained above, the database used for the learning phase is made by three different contributions. The first one is given by 15 x 10 (=150) variables which can assume the value 0 or 1 according to the status of the image pixel. In this way, the distance between two “perfectly” identical images, except for a single pixel, is exactly 1. Then, there are 3 x 4 (=12) columns, one for each probe, and they can range between 0 and 50; the value given by the probe as the distance between the image boundary and the first black pixel is multiplied by 50 and then divided by 10 or 15 according to the direction of the probe (horizontal in the first case, vertical in the second). This means that a difference in a probe is 50 times more important, as a maximum, than a change in a single pixel. The last contribution in the database is given by 21 columns, one for each alphabet letter, which can assume the value 50 or 0 if the letter under exam corresponds, or does not, to the one to which the column refers to. This allows a unique identification of letters in the database and it is fundamental during the learning phase.
It is obvious that this last contribution does not appear in the dataset that will be used to test the OCR, which should be able to give us evidence of the letter we are looking at.
The different weights are fundamental to obtain an efficient OCR and they have to be chosen with some attention.
Once the construction of the dataset has been concluded, a self organizing map (SOM) can be easily built following the wizard step by step. Obviously, no scaling has to be done of database values in order to not nullify the effect of the weights described above.
In Table 1, the set up used for the construction of the sequential SOM applied to this work is collected.

hand
Table 1: the sequential SOM set up used in this work



Other choices are possible but, if “reasonable” values are chosen, the final result is not influenced too much. Once the SOM training has been completed, it is possible to look at the D-matrix, as shown in Figure 5. It is clear that a clusterization of data is present, as expected and wanted, and it is primarily due to the weights adopted for the dataset components. If a new SOM with range scaling of data were built, this clusterization would disappear.

hand
Figure 5: In the D-matrix the mean distance between neurons in the map is plotted. In this case 21 groups of data corresponding to the letters are clearly visible. The square sides are proportional to the number of designs (letters) that have been “captured” by the neuron.



The sides of squares superimposed to the D-matrix graph are proportional to the number of designs that have been captured by the neuron during the learning phase.
In Figure 6, the organization of the designs in 21 groups is made more evident by means of colors and white circles which delimit the area of each letter. It is interesting to note that some letters which look similar (such as E and F or C and O) fall very distant in the SOM; this means that the probes are effective and the difference between, even small in some cases, is captured. The same cannot be said for letters U and V which fall very close to one another: the probes cannot efficiently register the difference between these two letters and this could be a possible reason for a recognition failure.

hand
Figure 6: The database components pertaining to the letters have been all plotted on the net using different colors. The clusterization of data, as shown in Figure 5, is here highlighted with some white circles.

 


The letter recognition

In this section, we present the results obtained using a free OCR software (freeOCR, see [3]) and the SOM, as built before, applied as a predictive tool.Two different alphabets (TEST 1 and TEST 2 are shown in Figure 10) not used for the SOM learning have been adopted to build two incomplete datasets; the third ingredient, corresponding to 21 columns ranging from 0 to 50 as described above, actually has not been included.


hand
Figure 7: Some letters for whom the recognition process fails.

hand
Figure 8: The freeOCR software used to test the effectiveness of the OCR system implementer in modeFRONTIER. Specifically, a letter M is tested on the left and the software correctly recognizes it on the right.

hand
Figure 9: The SOM can be used as a predictive tool; in this case an incomplete dataset has been loaded (TEST 1) and the SOM is asked to guess the letter corresponding to a given row in the dataset. In the picture the first record of dataset corresponds to an A; the BMU (right) falls in the zone of the net corresponding to the letter A (left).

hand
Figure 10: TEST1 and TEST2 alphabets used to test the character recognition based on the SOM.


In Table 2 a, a comparison between the results obtained with these two approaches is reported. When the recognition fails, the letter(s) proposed by the algorithms is (are) reported. We can say that the two approaches both present difficulties in the recognition of some letters, although the given answers cannot be considered as a complete failure. For example, if the Q of TEST 1 is considered, as shown in Figure 10, it can be immediately seen that there is a certain affinity with a G and that it could be divided into two letters, a C and a smaller L. On the contrary, the freeOCR fails to recognize the F of TEST 1, while the V of TEST 2 is not recognized as a letter but rather as two characters (\|) which however look like a V.

hand
Table 2: A comparison between the results obtained with the SOM approach proposed in this work and the ones obtained with the freeOCR software. For the TEST 1 an alphabet not used during the learning phase has been adopted, while for the TEST 2 a real handwritten alphabet has been used. When the recognition fails, it is reported between brackets the letter which are proposed by the algorithm.



Conclusions
In this paper we have shown how some tools of modeFRONTIER can be used to build a simple but efficient letter-by-letter OCR system. This naïve system has been compared to a free software downloadable from the internet and the results are undoubtedly good.
As mentioned, a high rate of recognition success can be reached only with a sophisticated preprocessing phase of text; this goes beyond the scope of this work and therefore, the interested reader is addressed to the following literature for more details.


References

  1. Gosselin Bernard (1995), Application de reseaux de neurones artificiels a la reconnaissance automatique de caracteres manuscripts, Ph.D. thesis, Faculté Polytechnique de Mons
  2. Vuori Vuokko (2002), Adaptive methods for on-line recognition of isolated hand-written characters, Acta Polytechnica Scandinavica, Mathematics and Computing Series No. 119, Espoo 2002, 93 pp.
  3. The freeOCR software has been downloaded from http://softi.co.uk/freeocr.htm
  4. The handwritten fonts used in this work for the training phase have been downloaded from http://www.1001freefonts.com/handwriting-fonts.php


For more information about this article:
info@enginsoft.it

copyright © 2011 all rights reserved | terms of use | Download EnginSoft Logo | VAT nb IT00599320223