Development of a Rating Scale for Elementary EFL Writing

In EFL programs, rating scales used in writing
assessment are often constructed by intuition. Intuition-based scales
tend to provide inaccurate and divisive ratings of learners’ writing
performance. Hence, following an empirical approach, this study
attempted to develop a rating scale for elementary-level writing at an
EFL program in Saudi Arabia. Towards this goal, 98 students’ essays
were scored and then coded using comprehensive taxonomy of
writing constructs and their measures. An automatic linear modeling
was run to find out which measures would best predict essay scores.
A nonparametric ANOVA, the Kruskal-Wallis test, was then used to
determine which measures could best differentiate among scoring
levels. Findings indicated that there were certain measures that could
serve as either good predictors of essay scores or differentiators
among scoring levels, or both. The main conclusion was that a rating
scale can be empirically developed using predictive and
discriminative statistical tests.





References:
[1] T. McNamara, “Discourse and assessment,” Annual Review of Applied
Linguistics, vol. 22, pp. 221-242, 2002.
[2] B. North, Scales for Rating Language Performance: Descriptive Models,
Formulation Styles, and Presentation Formats. TOEFL Monograph 24.
Princeton: Educational Testing Service, 2003.
[3] S. Weigle, Assessing Writing. Cambridge: Cambridge University Press,
2002.
[4] T. Stewart, S. Rehorick, and B. Perry, “Adapting the Canadian language
benchmarks for writing assessment,” TESL Canada Journal, vol. 18, no.
2, pp. 48-64, 2001.
[5] U. Knoch, “Rating scales for diagnostic assessment of writing: What
should they look like and where should the criteria come from?”
Assessing Writing, vol. 16, no. 2, pp. 81-96, 2011.
[6] W. Grabe and R. Kaplan, Theory and Practice of Writing. New York:
Longman, 1996.
[7] U. Knoch, “The Development and Validation of an Empiricallydeveloped
Rating Scale for Academic Writing”, University of Auckland,
Unpublished PhD dissertation, 2007.
[8] G. Fultcher, Testing Second Language Speaking. London: Pearson
Longman, 2003.
[9] J. Upshur and C. Turner, “Constructing rating scales for second
language tests,” ELT Journal, vol. 49, no. 1, pp. 3-12, 1995.
[10] R. Hawkey and F. Barker, “Developing a common scale for the
assessment of writing,” Assessing Writing, vol. 9, no. 2, pp. 122-159,
2004.
[11] B. North, “The development of a common framework scale of
descriptors of language proficiency based on a theory of measurement,”
System, vol. 23, no. 4, pp. 445-465, 1995.
[12] B. North and G. Schneider, “Scaling descriptors for language
proficiency scales,” Language Testing, vol. 15, no. 2, pp. 217-263, 1998.
[13] C. Alderson, “Bands and scores”, in Language Testing in the 1990s: The
Communicative Legacy, C. Alderson and B. North, Eds. London:
Modern English Publications/British Council/Macmillan, 1991, pp. 71-
86.
[14] C. Myford, “Investigating design features of descriptive graphic rating
scales,” Applied Measurement in Education, vol. 15, no. 2, pp. 187-215,
2002.
[15] L. Bachman and A. Palmer, Language Testing in Practice. Oxford:
Oxford University Press, 1996.
[16] C. Turner and J. Upshur, “Rating scales derived from student samples:
Effects of the scale maker and the student sample on scale content and
student scores,” TESOL Quarterly, vol. 36, no. 1, pp. 49-70, 2002.
[17] P. Mickan, 'What's Your Score?' An Investigation into Language
Descriptors for Rating Written Performance. Canberra: IELTS
Australia, 2003.
[18] R. Todd, P. Thienpermpool, and S. Keyuravong, “Measuring the
coherence of writing using topic-based analysis,” Assessing Writing, vol.
9, no. 2, pp. 85-104, 2004.
[19] S. Shaw, “IELTS writing: Revising assessment criteria and scales (Phase
1),” Cambridge Research Notes, vol. 9, pp. 16-18, 2002.
[20] S. Claire, “Assessment and moderation in the CSWE: Processes,
performances, and tasks”, in Studies in Immigrant English Language
Assessment,2nd ed. vol., G. Brindley and C. Burrows, Eds. Sydney:
National Center for English Language Teaching and Research, 2001, pp.
15-57.
[21] D. Smith, “Rater judgments in the direct assessment of competencybased
second language writing ability,” in Studies in Immigrant English
Language Assessment, vol. 1, G. Brindley, Ed. Sydney: National Centre
for English Language Teaching and Research, 2000, pp. 159-189.
[22] T. Lumley, Assessing Second Language Writing: The Rater's
Perspective. Frankfurt: Peter Lang, 2005.
[23] J. Yi, “The use of diaries as a qualitative research method to investigate teachers’ perception and use of rating schemes,” Journal of Pan-Pacific
Association of Applied Linguistics, vol. 12, no. 1, pp. 1-10, 2008.
[24] T. McNamara, Measuring Second Language Performance. Harlow,
Essex: Pearson Education, 1996.
[25] S. Luoma, Assessing Speaking. Cambridge: Cambridge University Press,
2004.
[26] ACTFL, “ACTFL Proficiency Guidelines for Writing”, 2012, at the
following link: http://actflproficiencyguidelines2012.org/writing,
accessed on 17 Mar., 2013.
[27] G. Lim, “Investigating prompt effects in writing performance
assessment,” Spaan Fellow Working Papers in Second or Foreign
Language Assessment, vol. 8, pp. 95-116, 2010.
[28] IBM, SPSS Statistics (Version 19). (Software). 2012. Available from
http://www-01.ibm.com/software/analytics/spss/
[29] K. Wolfe-Quintero, S. Inagaki, and H-Y Kim, Second Language
Development in Writing: Measures of Fluency, Accuracy and
Complexity. Technical Report No. 17. Honolulu, HI: University of
Hawai'i Press, 1998.
[30] C. Kennedy and D. Thorp, A Corpus-based Investigation of Linguistic
Responses to an IELTS Academic Writing Task. Birmingham, University
of Birmingham, 2002.
[31] J. Wu, “Topical Structure Analysis of English as a Second Language
(ESL) Texts Written by College South-east Asian Refugee Students”,
ProQuest Dissertations and Theses database, 1997.
[32] Y. Sugita, “The development and implementation of task-based writing
performance assessment for Japanese learners of English,” Journal of
Pan-Pacific Association of Applied Linguistics, vol. 13, no. 2, pp. 77-
103, 2009.