Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian

The paper presents combined automatic speech
recognition (ASR) of English and machine translation (MT) for
English and Croatian and Croatian-English language pairs in the
domain of business correspondence. The first part presents results of
training the ASR commercial system on English data sets, enriched
by error analysis. The second part presents results of machine
translation performed by free online tool for English and Croatian
and Croatian-English language pairs. Human evaluation in terms of
usability is conducted and internal consistency calculated by
Cronbach's alpha coefficient, enriched by error analysis. Automatic
evaluation is performed by WER (Word Error Rate) and PER
(Position-independent word Error Rate) metrics, followed by
investigation of Pearson’s correlation with human evaluation.





References:
[1] S.-A. Selouani, T.-H. Lê, C. Moghrabi, B. Lanteigne, and J. Roy,
“Online Collaborative Learning System Using Speech Technology,”
WASET, International Journal of Social Sciences, 2(2), 2008, pp. 665-
670.
[2] S.-Y. Suk, and H.-Y. Chung, “A speech and character combined
recognition engine for mobile devices,” International Journal of
Pervasive Computing and Communications, 4(2), 2008, pp. 232-249.
[3] M. Wald, “Creating accessible educational multimedia through editing
automatic speech recognition captioning in real time,” Interactive
Technology and Smart Education, 3(2), 2006, pp. 131-141.
[4] M. Miyabe, T. Fukushima, T. Yoshino, and A. Shigeno, „Development
of Circulating Support Environment of Multilingual Medical
Communication using Parallel Texts for Foreign Patients”, International
Conference on Health and Medical Informatics (ICHMI 2010), World
Academy of Science, Engineering and Technology – WASET (4), 2010,
pp. 212-216.
[5] S. Yamamoto, “10 Emerging Technologies That Will Change Your
World. Engineering Management Review,” IEEE, 32(2), 2004, pp. 32-
51.
[6] E. Vidal, F. Casacuberta, L. Rodriguez, J. Civera, and C. D. M.
Hinarejos, “Computer-assisted translation using speech recognition,”
Audio, Speech, and Language Processing, IEEE Transactions, 14(3),
2006, pp. 941-951.
[7] L. Frädrich, and D. Anastasiou, “Siri vs. Windows Speech Recognition,”
Translation Journal, 16(3), 2012.
[8] K. Harrenstien, “Automatic captions in You Tube, Google Official
Blog,” Retrieved on 3rd of April 2014 from:
http://googleblog.blogspot.com/2009/11/automatic-captions-inyoutube.
html
[9] H. Sawaf, “Automatic Speech Recognition and Hybrid Machine
translation for High-Quality Closed-Captioning and Subtitling for Video
Broadcast,” Proceedings of Association for Machine Translation in the
Americas – AMTA. San Diego, United States of America, 2012.
[10] M. J. F. Gales, X. Liu, R. Sinha, P. C. Woodland, K. Yu, S. Matsoukas,
T. Ng, K. Nguyen, L. Nguyen, J.-L. Gauvain, L. Lamel, and A.
Messaoudi, “Speech Recognition System Combination for Machine
Translation,” Proceedings of the International Conference On Acoustics,
Speech and Signal Processing. Honolulu, United States of America,
2007, pp. 1277-1280.
[11] J. Schalkwyk, D. Beeferman, F. Beaufays, B. Byrne, C. Chelba, M.
Cohen, M. Garret, and B. Strope, “Google Search by Voice: A case
study,” Advances in Speech Recognition. Mobile Environments, Call
Centers and Clinics, 2010, pp. 61-90.
[12] A. Reddy, R.-C. Rose, and A. Désilets, “Integration of ASR and
Machine Translation Models in a Document Translation Task,”
Proceedings of the International Conference of the International Speech
Communication INTERSPEECH, Antwerp, Belgium, 2007, pp. 2457-
2460.
[13] S. Burgstahler, “Working Together: People with Disabilities and
Computer Technology,” Seattle, University of Washington, 2012.
[14] B. Andresen, “Literacy, assistive technology and e-inclusion,“ Journal
of Assistive Technologies,” 1(1), 2007, pp. 10-14.
[15] S. Judge, Z. Robertson, and M. Hawley, “The limitations of speech
control: perceptions of provision of speech-driven environmental
controls,” Journal of Assistive Technologies, 5(1), 2011, pp. 4-11.
[16] F. Casacuberta, M. Federico, H. Ney, and E. Vidal, “Recent efforts in
spoken language translation,” Signal Processing Magazine, IEEE, 25(3),
2008, pp. 80-88.
[17] S. Seljan et al., “Computational Language Analysis: Computer-Assisted
Translation and e-Language Learning”, (reprint of published papers),
Zagreb: Department of Information and Communication studies of
Faculty of Humanities and Social Sciences Zagreb, 2012, ch. 1-5.
[18] I. Dunđer, S. Seljan, and M. Arambašić, “Domain-specific Evaluation of
Croatian Speech Synthesis in CALL,” Recent Advances in Information
Science, Athens, WSEAS Press, 2013, pp. 142-147. (7th European
Computing Conference Dubrovnik, Croatia).
[19] S. Seljan, and I. Dunđer, “Automatic word-level evaluation and error
analysis of formant speech synthesis for Croatian,” Recent Advances in
Information Science - Recent Advances in Computer Engineering Series
17. Athens, WSEAS, 2013, pp. 172-178. (4th European Conference of
Computer Science, Paris, France).
[20] D. Boras, and N. Lazić, “Aspects of a Theory and the Present State of
Speech Synthesis,” Proceedings of the 29th International Convention
MIPRO: Computer in Technical Systems, Rijeka, Croatian Society for
Information and Communication Technology, Electronics and
Microelectronics – MIPRO, 2006, pp. 187-190.
[21] Z. Handley, “Is text-to-speech synthesis ready for use in computerassisted
language learning?,” Speech Communication, 51(10), 2009, pp.
906-919.
[22] Z. Handley, and M.-J. Hamel, “Establishing a Methodology for
Benchmarking Speech Synthesis for Computer-Assisted Language
Learning (CALL),” Language Learning & Technology, 9(3), 2005, pp.
99-120.
[23] F. Ehsani, “Speech technology in computer-aided language learning:
Strengths and limitations of a new CALL paradigm,” Language
Learning & Technology, 2(1), 1998, pp. 54-73.
[24] A. Black, R. Brown, R. Frederking, K. Lenzo, J. Moody, A. Rudnicky,
R. Singh, and E. Steinbrecher, “Rapid Development of A Speech-to-
Speech Translation Systems,” Proceedings of the International
Conference on Spoken Language Processing in Denver, United States of
America, 2002.
[25] A. Black, R. Brown, R. Frederking, R. Singh, J. Moody, and E.
Steinbrecher, “TONGUES: Rapid Development of a Speech-to-Speech
Translation System,” Proceedings of the 2nd International Conference on
Human Language Technology Research – HLT, San Diego, United
States of America, 2002, pp. 183-186.
[26] D. Jurafsky, J. Martin, and A. Kehler, Speech and Language Processing:
An Introduction to Natural Language Processing, Computational
Linguistics, and Speech Recognition. New Jersey: Prentice Hall, 2000.
[27] M. Popović, and H. Ney, “Word Error Rates: Decomposition over POS
Classes and Applications for Error Analysis,” Proceedings of the Second
Workshop on Statistical Machine Translation, Association for
Computational Linguistics, Prague, Czech Republic, 2007, pp. 48-55.
[28] L. Dybkajær, N. O. Bernsen, and W. Minker, “Overview of Evaluation
and Usability,” Spoken Multimodal Human-Computer Dialogue in
Mobile Environments: Text, Speech and Language Technology, 28,
2005, pp. 221-246.
[29] M. Popović, “Hjerson: An Open Source Tool for Automatic Error
Classification of Machine Translation Output,” The Prague Bulletin of
Mathematical Linguistics, 96(1), 2011, pp. 59-67.
[30] V. I. Levenshtein, “Binary Codes Capable of Correcting Deletions,
Insertions and Reversals,” Soviet Physics Doklady, 10(8), 1966, pp. 707-
710.
[31] J.-M. Torres-Moreno, “Beyond Stemming and Lemmatization: Ultrastemming
to Improve Automatic Text Summarization,” 2012.
arXiv:1209.3126 (cs.IR).
[32] A. Gesmundo, and T. Samardžić, “Lemmatisation as a Tagging Task,”
Association for Computational Linguistics, 2, 2012, pp. 368-372.