Recognition of Grocery Products in Images Captured by Cellular Phones

In this paper, we present a robust algorithm to recognize extracted text from grocery product images captured by mobile phone cameras. Recognition of such text is challenging since text in grocery product images varies in its size, orientation,
style, illumination, and can suffer from perspective distortion.
Pre-processing is performed to make the characters scale and
rotation invariant. Since text degradations can not be appropriately
defined using well-known geometric transformations such
as translation, rotation, affine transformation and shearing, we
use the whole character black pixels as our feature vector.
Classification is performed with minimum distance classifier
using the maximum likelihood criterion, which delivers very
promising Character Recognition Rate (CRR) of 89%. We
achieve considerably higher Word Recognition Rate (WRR) of
99% when using lower level linguistic knowledge about product
words during the recognition process.





References:
[1] M. Mirmehdi and P. Clarck, “Recognising text in real scenes,” in
International Journal on Document Analysis and Recognition (IJDAR),
2001, pp. 243–257.
[2] O. D. Trier, A. K. Jain, and T. Taxt, “Feature extraction methods
for character recognition – a survey,” Journal of Pattern Recognition,
Elsevier ScienceICPR06), vol. 29, pp. 641–662, 1996.
[3] S. Lu and C. L. Tan, “Camera text recognition based on perspective
invariants,” in Proc. of the 18th International Conference on Pattern
Recognition (ICPR06), 2006.
[4] S. Omachi, M. Iwamura, S. Uchida, and K. Kise, “Affine invariant information
embedment for accurate camera-based character recognition,”
in Proc. of the 18th International Conference on Pattern Recognition
(ICPR06), 2006, pp. 1098–1101.
[5] S. Uchida and M. Iwamura, “Data embedding for camera-based character
recognition,” in Proc. of the 18th International Conference on Pattern
Recognition (ICPR06), 2006, pp. 1098–1101.
[6] S. Uchida, M. Iwamura, S. Omachi, and K. Kise, “Ocr fonts revisited for
camera-based character recognition,” in Proc. of the 18th International
Conference on Pattern Recognition (ICPR06), 2006.
[7] J. Flusser and T. Suk, “Pattern recognition by affine moment invariants,”
Pattern Recognition, vol. 26, pp. 192–195, 1993.
[8] J. Flusser and T. Suk, “Graph method for generating affine moment
invariants,” in Proceedings of the 17th International Conference on
Pattern Recognition (ICPR04), 2004, pp. 167–174.
[9] J. Sun and S. N. Y. Hotta, Y. Katsuyama, “Camera based degraded text
recognition using grayscale feature,” in Proc. of the 8th International
Conference on Document Analysis and Recognition (ICDAR05), 2005. [10] D. S. Zhang and G. Lu, “A comparative study on shape retrieval
using fourier descriptors with different shape signatures,” in In Proc.
of International Conference on Intelligent Multimedia and Distance
Education (ICIMADE01), 2001, pp. 1–9.
[11] C. Dionisio and H. Kim, “A supervised shape classification technique
invariant under rotation and scaling,” in Intl Telecommunications Symposium,
2002.
[12] K. Jung, K. I. Kim, and A. Jain, “Text information extraction in images
and video: A survey,” in Pattern Recognition, vol. 37, 2004.
[13] F. Einsele and H. Foroosh, “Towards text extraction from low resolution
cell phone images,” in submitted paper to IEEE International Conference
on Image Processing (ICIP09), 2009.
[14] R. Duda and P. Hart, Pattern classification and scene analysis. Reading,
MA: John Wisley & Sons, 1972.