A Black-box Approach for Response Quality Evaluation of Conversational Agent Systems

The evaluation of conversational agents or chatterbots question answering systems is a major research area that needs much attention. Before the rise of domain-oriented conversational agents based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when chatterbots began to become more domain specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time to achieve high quality responses. This paper discusses the inappropriateness of the existing measures for response quality evaluation and the call for new standard measures and related considerations are brought forward. As a short-term solution for evaluating response quality of conversational agents, and to demonstrate the challenges in evaluating systems of different nature, this research proposes a blackbox approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems, AnswerBus, START and AINI.





References:
[1] J. Lin, V. Sinha, B. Katz, K. Bakshi, D. Quan, D. Huynh, and D. Karger,
"What Makes a Good Answer? The Role of Context in Question
Answering," presented at.the 9th International Conference on Human-
Computer Interaction, 2003.
[2] L. Hirschman and R. Gaizauskas., "Natural Language Question
Answering: The View from Here," Natural Language Engineering, vol.
7, pp. 275-300, 2001.
[3] U. Hermjakob, "Parsing and Question Classification for Question
Answering," presented at the ACL Workshop on Open-Domain
Question Answering, 2001.
[4] Z. Zheng, "Developing a Web-based Question Answering System,"
presented at.the 11th International Conference on World Wide Web,
2002a.
[5] C. Kwok, D. Weld, and O. Etzioni, "Scaling Question Answering to the
Web," ACM Transactions on Information Systems, vol. 19, pp. 242-262,
2001.
[6] P. Zweigenbaum, "Question Answering in Biomedicine," presented at
the 10th Conference of the European Chapter of the Association for
Computational Linguistics, 2003.
[7] H. Chung, K. Han, H. Rim, S. Kim, J. Lee, Y. Song, and D.Yoon, "A
Practical QA System in Restricted Domains," presented at the ACL
Workshop on Question Answering in Restricted Domains, 2004.
[8] F. Benamara, "Cooperative Question Answering in Restricted Domains:
the WEBCOOP Experiment," presented at the ACL Workshop on
Question Answering in Restricted Domains, 2004.
[9] F. Benamara and P. Saint-Dizier, "Advanced Relaxation for Cooperative
Question Answering," in.New Directions in Question Answering: MIT
Press, 2004.
[10] W. Wong, O. S. Goh, M. I. Desa, and S. Sahib, "Online Cyberlaw
Knowledge Base Construction Using Semantic Network," presented at
International Conference on Computational Intelligence for Modelling,
Control and Automation, Rhodes, Greece, 2004.
[11] O. S. Goh, C. C. Fung, and M. P. Lee, "Intelligent Agents for an
Internet-based Global Crisis Communication System," Journal of
Technology Management and Entrepreneurship, vol. 2, pp. 65-78, 2005.
[12] B. Katz and J. Lin, "START and Beyond.," presented at the 6th World
Multiconference Systemics, Cybernetics and Informatics, 2002.
[13] B. Katz, "Annotating the World Wide Web using Natural Language,"
presented at the 5th Conference on Computer Assisted Information
Searching on the Internet., 1997.
[14] D. Moldovan, M. Pasca, M. Surdeanu, and S. Harabagiu., "Performance
Issues and Error Analysis in an Open-Domain Question Answering
System," presented at the 40th Annual Meeting of the Association for
Computational Linguistics, 2002.
[15] J. Allen, D. Byron, M. Dzikovska, G. Ferguson, L. Galescu, and A.
Stent, "Towards conversational human-computer interaction," AI
Magazine, vol. 22, 2001.
[16] J. Cassell, "Embodied Conversation: Integrating Face and Gesture into
Automatic Spoken Dialogue Systems," in Spoken Dialogue Systems,
Luperfoy, Ed.: MIT Press, to appear.
[17] R. J. Lempert, S. W. Popper, and S. C. Bankes, Shaping the next one
hundred years: new methods for quantitative, long-term policy analysis.
Santa Monica, CA.: RAND, 2003.
[18] O. S. Goh and C. C. Fung, "Automated Knowledge Extraction from
Internet for a Crisis Communication Portal," in First International
Conference on Natural Computation. Changsha, China: Lecture Notes in
Computer Science (LNCS), 2005, pp. 1226-1235.
[19] J. A. Fodor, Elm and the Expert: An Introduction to Mentalese and Its
Semantics: Cambridge University Press, 1994.
[20] R. A. Brooks, "The Cog Project: Building a Humanoid Robot,"
presented at The 1st International Conference on Humanoid Robots and
Human friendly Robots, Tsukuba, Japan, 1998.
[21] O. S. Goh, A. Depickere, C. C. Fung, and K. W. Wong, "Top-down
Natural Language Query Approach for Embodied Conversational
Agent," presented at the International MultiConference of Engineers and
Computer Scientists 2006, Hong Kong, 2006.
[22] M. King, "Evaluating Natural Language Processing Systems,"
Communications of the ACM., vol. 39, pp. 73-79, 1996.
[23] E. Voorhees, "Overview of TREC 2003," presented at the 12th Text
Retrieval Conference, 2003.
[24] J. Facemire, "A Proposed Metric for the Evaluation of Natural Language
Systems," presented at the IEEE Energy and Information Technologies
in the Southeast,, 1989.
[25] G. Guida and G. Mauri, "A Formal Basis for Performance Evaluation of
Natural Language Understanding Systems.," Computational Linguistics.,
vol. 10, pp. 15-30, 1984.
[26] A. Srivastava and V. Rajaraman, "A Vector Measure for the Intelligence
of a Question-Answering (Q-A) System," IEEE Transactions on
Systems: Man and Cybernetics., vol. 25, pp. 814-823, 1995.
[27] J. Allen, Natural Language Understanding: Benjamin/Cummins
Publishing, 1995.
[28] E. Nyberg and T. Mitamura, "Evaluating QA Systems on Multiple
Dimensions," presented at the Workshop on QA Strategy and Resources,
2002.
[29] A. Diekema, O. Yilmazel, and E. Liddy., "Evaluation of Restricted
Domain Question-Answering Systems," presented at the ACL Workshop
on Question Answering in Restricted Domains, 2004.
[30] M. Maybury, "Toward a Question Answering Roadmap," presented at
the AAAI Spring Symposium on New Directions in Question
Answering, 2003.
[31] Z. Zheng, "AnswerBus Question Answering System," presented at the
Conference on Human Language Technology, 2002b.
[32] B. Katz, S. Felshin, and J. Lin, "The START Multimedia Information
System: Current Technology and Future Directions," presented at the
International Workshop on Multimedia Information Systems, 2002.