How Valid Are Our Language Test Interpretations? A Demonstrative Example

Validity is an overriding consideration in language testing. If a test score is intended for a particular purpose, this must be supported through empirical evidence. This article addresses the validity of a multiple-choice achievement test (MCT). The test is administered at the end of each semester to decide about students' mastery of a course in general English. To provide empirical evidence pertaining to the validity of this test, two criterion measures were used. In so doing, a Cloze test and a C-test which are reported to gauge general English proficiency were utilized. The results of analyses show that there is a statistically significant correlation among participants' scores on the MCT, Cloze, and Ctest. Drawing on the findings of the study, it can be cautiously deduced that these tests measure the same underlying trait. However, allowing for the limitations of using criterion measures to validate tests, we cannot make any absolute claim as to the validity of this MCT test.





References:
[1] R. J. Mislevy, "Test theory reconceived," Journal of Educational
Measurement, vol. 33, no. 4, pp. 379-416, 1996.
[2] L. F. Bachman, Statistical Analysis for Language Assessment.
Cambridge: Cambridge University Press, 2003.
[3] A. Hughes, Testing for Language Teachers. Cambridge: Cambridge
University Press, 1989.
[4] G. Henning, A Guide to Language Testing. Cambridge, MA: Newbury
House, 1987.
[5] L. F. Bachman, Fundamental Considerations in Language Testing.
Oxford: Oxford University Press, 1990.
[6] P. Groot, "Language testing in research and education: The need for
standards," in J. De Jong, Ed. Standardization in Language Testing,
London: AILA Review, 1990, pp. 7-23.
[7] M. T. Kane, "An argument-based approach to validity," Psychological
Bulletin, vol. 122, no. 3, pp. 527-35, 1992.
[8] L. Cronbach, and P. E. Meehl, "Construct validity in psychological
tests," Psychological Bulletin, vol. 52, 281-302, 1955.
[9] A. Chapelle, "Are C-tests valid measures for L2 vocabulary research?"
Second Language Research, vol. 10, no. 2, pp. 157-187, 1994.
[10] E. Tarone, "Research on interlanguage variation: implication for
language testing," in L. F. Bachman, and A. D. Cohen, Eds. Interfaces
between Second Language Acquisition and Language Testing Research,
Cambridge: Cambridge University press, 1998, pp. 71-89.
[11] S. Messick, "Validity," In R. Linn, Ed. Educational Measurement, New
York: Macmillan, 1989, pp. 13-103.
[12] G. Fulcher, and F. Davidson, Language Testing and Assessment: An
Advanced Resource Book. New York: Routledge, 2007.
[13] L. Bachman, and A. Palmer, Language Testing in Practice. Oxford:
Oxford University Press, 1996.
[14] T. F. McNamara, Measuring Second Language Performance. Harlow:
Longman, 1996.
[15] L. Cronbach, "Five perspectives on validity argument," in H. Wainer,
and H. Braun, Eds., Test Validity. Hillsdale, NJ: Erlbaum, 1988, pp. 3-
17.
[16] J. B. Carrol, "The nature of data, or how to choose a correlation
coefficient," Psychometrika, vol 26, pp. 4342-4372, 1961.
[17] Spolsky, "What does it mean to know a language; or how do you get
somebody to perform his competence?" In J. Oller, and R. Richards,
Eds. Focus on the Learner, Rowley, Massachusetts: Newbury House,
1973, pp. 164-176.
[18] J. Anderson, Psycholinguistic Experiments in Foreign Language
Testing. St Lucia, Queensland: University of Queensland Press, 1976.
[19] J. W. Oller, J.W. Language Tests at School: A Pragmatic Approach.
London: Longman, 1979.
[20] H. Farhady, and M. N. Keramati, "A text-driven method for the deletion
procedure in cloze passages," Language Testing, vol. 13, 191-207,
1996.
[21] Shohamy, E, "Investigation of concurrent validity of oral interview with
cloze procedure for measuring proficiency in Hebrew as a second
language," Ph.D. Dissertation, University of Minnesota, 1978.
[22] Hinofotis, "Cloze as an alternative method of ESL placement and
proficiency testing," in J. Oller & K. Perkins, Eds. Research in
Language Testing, Rowley, MA: Newbury House, 1980, pp. 45-67.
[23] K. Mullen, K, "Rater reliability and oral proficiency evaluation," in J.
Oller, and K. Perkins, Eds. Research in Language Testing, Rowley,
MA: Newbury House, 1980.
[24] R. Grotjahn, C. Klein-Braley, and U. Raatz, "C-Tests: An overview," in
R. Grotjahn, C. Klein-Braley, and U. Raatz, Eds. University Language
Testing and the C-Test, Bochum: AKS-Verlag, 2002, pp. 93-114.
[25] C. Klein-Braley, "Language testing with the C-Test: A linguistic and
statistical investigation into the strategies used by C-Test takers and the
prediction of C-Test difficulty," Ph.D. dissertation, University of
Duisburg, 1994.
[26] C. Klein-Braley, "C-Tests in the context of reduced redundancy testing:
An appraisal," Language Testing, vol. 14, pp. 47-84, 1997.
[27] T. Eckes, and R. Grotjahn, "C-tests: Rasch analyses via the continuous
rating scale model," In R. Grotjahn, Ed. The C-test: Theory, Empirical
Research, and Applications. Frankfurt am Main: Peter Lang, 2006, pp.
167-193.
[28] R. Lado, Language Testing: The Construction and Use of Foreign
Language Tests. London: Longman, 1961.
[29] V. Parvaresh, and M. Tavakoli, "Discourse completion tasks as
elicitation tools: How convergent are they?" The Social Sciences, vol. 4,
no. 4, pp. 366-373, 2009.