Skew Detection Technique for Binary Document Images based on Hough Transform

Document image processing has become an increasingly important technology in the automation of office documentation tasks. During document scanning, skew is inevitably introduced into the incoming document image. Since the algorithm for layout analysis and character recognition are generally very sensitive to the page skew. Hence, skew detection and correction in document images are the critical steps before layout analysis. In this paper, a novel skew detection method is presented for binary document images. The method considered the some selected characters of the text which may be subjected to thinning and Hough transform to estimate skew angle accurately. Several experiments have been conducted on various types of documents such as documents containing English Documents, Journals, Text-Book, Different Languages and Document with different fonts, Documents with different resolutions, to reveal the robustness of the proposed method. The experimental results revealed that the proposed method is accurate compared to the results of well-known existing methods.




References:
[1] Akiyama T and Hagita N, Automated entry system for printed
documents, Pattern Recognition, Vol. 23, No. 11, 1990, pp 1141-1158.
[2] Baird H.S, The Skew Angle of Printed Documents, Proceedings of
Conference Society of Photographic Scientists and Engineers,
Rocherster, New York, 1987, pp 14-21.
[3] Cao Yang, Shuhua Wang, Li Heng., Skew detection and correction in
document images based on straight-line fitting, Pattern Recognition
Letters, 24, pp 1871-1879, 2003.
[4] Gonzales R.C and Woods R.E, Digital Image Processing, 2nd ed.,
Pearson Education Asia, 2002.
[5] Hashizume A Yeh P S and Cosenfeld A, A Method of Detecting the
Orientation of Aligned Components, Pattern Recognition Letters, Vol. 4,
April 1986, pp 125-132.
[6] Hou H.S., Digital Document Processing, Wisely New York, 1983.
[7] Le D S, Thoma G R and Wechsler H, Automatic page orientation and
skew angle detection for binary document images. Pattern Recognition
27, 1994, pp 1325-1344.
[8] O-Gorman L, The document spectrum for page layout analysis, IEEE
Transactions on Pattern Analysis and machine Intelligence, No 15, vol
11, 1993, pp. 1162-1173.
[9] Pal U and Chaudhuri B. B, An Improved document skew angle
estimation technique, Pattern Recognition Letters, Vol. 17, 1996, pp
899-904.
[10] Pavlidis T and Zhou J, Page segmentation by white streams, Proceedings
of first International Conference on Document Analysis and Recognition
(ICDAR), France, September 30, October 2, 1991, pp 945-953.
[11] Postl W, Detection of linear oblique structures and skew scan in
digitized documents. Proceedings 8th International Conference on Pattern
Recognition, 1986, pp. 687-689.
[12] Postl W, Detection of linear oblique structures and skew scan in
digitized documents. Proceedings 8th International Conference on Pattern
Recognition, 1986, pp. 687-689.
[13] Srihari SN and Govindaraju V, Analysis of textual images using the
Hough Transform, Machine Vision and Applications, vol 2, 1989, pp.
141-153.
[14] Yan, H. Skew correction of document images using interline crosscorrelation,
Computer Vision, Graphics, and Image Processing 55, 1993,
pp 538-543.
[15] Yu, B., Jain, A.K., A robust and fast skew detection algorithm for
generic documents, Pattern Recognition 29 (10), pp 1599-1629, 1996.
[16] Yue Lu and Chew Lim Tan, A nearest neighbor chain based approach to
skew estimation in document images, Pattern Recognition Letters 24,
2003, pp 2315-2323.
[17] M. Ahmed and R. Ward, (2002), Rotation Invariant Rule-Based
Thinning Algorithm for Character Recognition, IEEE. Trans. Pattern
Analysis and Machine Intelligence, vol. 24, No. 12, December 2002.