Response Quality Evaluation in Heterogeneous Question Answering System: A Black-box Approach

The evaluation of the question answering system is a major research area that needs much attention. Before the rise of domain-oriented question answering systems based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when question answering systems began to be more domains specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time achieve higher quality responses The research in this paper discusses the inappropriateness of the existing measure for response quality evaluation and in a later part, the call for new standard measures and the related considerations are brought forward. As a short-term solution for evaluating response quality of heterogeneous systems, and to demonstrate the challenges in evaluating systems of different nature, this research presents a black-box approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems (i.e. AnswerBus, START and NaLURI).





References:
[1] Benamara, F., Cooperative Question Answering in Restricted Domains:
the WEBCOOP Experiment. In Proceedings of the ACL Workshop on
Question Answering in Restricted Domains, 2004.
[2] Benamara, F. & Saint-Dizier, P., Advanced Relaxation for Cooperative
Question Answering. In New Directions in Question Answering. MIT
Press, 2004.
[3] Chung, H., Han, K., Rim, H., Kim, S., Lee, J., Song, Y. & Yoon, D., A
Practical QA System in Restricted Domains. In Proceedings of the ACL
Workshop on Question Answering in Restricted Domains, 2004.
[4] Diekema, A., Yilmazel, O. & Liddy, E., Evaluation of Restricted
Domain Question-Answering Systems. In Proceedings of the ACL
Workshop on Question Answering in Restricted Domains, 2004.
[5] Facemire, J., A Proposed Metric for the Evaluation of Natural Language
Systems. In Proceedings of the IEEE Energy and Information
Technologies in the Southeast, 1989.
[6] Guida, G. & Mauri, G., A Formal Basis for Performance Evaluation of
Natural Language Understanding Systems. Computational Linguistics,
10(1):15-30, 1984.
[7] Hirschman, L. & Gaizauskas, R., Natural Language Question
Answering: The View from Here. Natural Language Engineering,
7(4):275-300, 2001.
[8] Hermjakob, U., Parsing and Question Classification for Question
Answering. In Proceedings of the ACL Workshop on Open-Domain
Question Answering, 2001.
[9] Lin, J., Sinha, V., Katz, B., Bakshi, K., Quan, D., Huynh, D. & Karger,
D., What Makes a Good Answer? The Role of Context in Question
Answering. In Proceedings of the 9th International Conference on
Human-Computer Interaction, 2003.
[10] Katz, B. & Lin, J., START and Beyond. In Proceedings of the 6th World
Multiconference Systemics, Cybernetics and Informatics, 2002.
[11] Katz, B., Annotating the World Wide Web using Natural Language. In
Proceedings of the 5th Conference on Computer Assisted Information
Searching on the Internet, 1997.
[12] Katz, B., Felshin, S. & Lin, J., The START Multimedia Information
System: Current Technology and Future Directions. In Proceedings of
the International Workshop on Multimedia Information Systems, 2002.
[13] King, M., Evaluating Natural Language Processing Systems.
Communications of the ACM, 39(1):73-79, 1996.
[14] Kwok, C., Weld, D. & Etzioni, O., Scaling Question Answering to the
Web. ACM Transactions on Information Systems, 19(3):242-262, 2001.
[15] Maybury, M., Toward a Question Answering Roadmap. In Proceedings
of the AAAI Spring Symposium on New Directions in Question
Answering, pp. vii-xi, 2003.
[16] Moldovan, D., Pasca, M., Surdeanu, M. & Harabagiu, S., Performance
Issues and Error Analysis in an Open-Domain Question Answering
System. In Proceedings of the 40th Annual Meeting of the Association
for Computational Linguistics, 2002.
[17] Srivastava, A. & Rajaraman, V., A Vector Measure for the Intelligence
of a Question-Answering (Q-A) System. IEEE Transactions on
Systems_Man and Cybernetics, 25(5):814-823, 1995.
[18] Wong, W., Practical Approach to Knowledge-based Question
Answering with Natural Language Understanding and Advanced
Reasoning. Thesis (MSc), Kolej Universiti Teknikal Kebangsaan
Malaysia, 2004.
[19] Wong, W., Sing, G. O., Mohammad-Ishak, D. & Shahrin, S., Online
Cyberlaw Knowledge Base Construction using Semantic Network. In
Proceedings of the IASTED International Conference on Applied
Simulation and Modeling, 2004a.
[20] Wong, W., Sing, G. O. & Mokhtar, M., Syntax Preprocessing in
Cyberlaw Web Knowledge Base Construction. In Proceedings of the
International Conference on Intelligent Agents, Web Technologies and
Internet Commerce, 2004b.
[21] Voorhees, E., Overview of TREC 2003. In Proceedings of the 12th Text
Retrieval Conference, 2003.
[22] Zheng, Z., Developing a Web-based Question Answering System. In
Proceedings of the 11th International Conference on World Wide Web,
2002a.
[23] Zheng, Z., AnswerBus Question Answering System. In Proceedings of
the Conference on Human Language Technology, 2002b.
[24] Zweigenbaum, P., Question Answering in Biomedicine. In Proceedings
of the 10th Conference of the European Chapter of the Association for
Computational Linguistics, 2003.
[25] Allen, J., Natural Language Understanding. Benjamin/Cummins
Publishing, 1995.
[26] Nyberg, E. & Mitamura, T., Evaluating QA Systems on Multiple
Dimensions. In Proceedings of the Workshop on QA Strategy and
Resources, 2002.