Recognizing Typeset Documents using Walsh Transformation

Attila Fazekas, András Hajdu

Abstract


In this paper we present an effective character recognition algorithm, which can be applied mainly to typeset documents. Our aim was to compose a character recognition algorithm, which can be used to recognize simple typeset documents in a fast and reliable way. To get a good result by this algorithm the input text document should contain characters from the same character set with a small number of symbols. This condition does not mean a strong restriction as the documents in practice usually have this property. The main character recognition part of the algorithm is based on the Walsh transformation, which gives a verbose description about the image, like the symmetrical relations, placement of the foreground and background pixels, and so on. That is why we tried to apply it to recognize characters, and the algorithm proved to be fairly efficient and reliable for simple documents, since the feature vectors extracted by Walsh transformation can be well distinguished. Moreover, our method had very good results in tolerating different types of noise corruption.

Full Text:

PDF


DOI: https://doi.org/10.2498/cit.2001.02.01

Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Crossref Similarity Check logo

Crossref logologo_doaj

 Hrvatski arhiv weba logo