Validity Evaluation

Validity evaluation in foreign language assessment:
Understanding and improving test use in the
Georgetown University German Department

John M. Norris

Northern Arizona University and
Center for Advanced Study of Language, University of Maryland
German Research Seminar, Georgetown University

October 16, 2003

1. Introduction: The challenge of assessment in college FL education

· Many reasons, pressures, traditions for assessment
· Many problems for college FL assessment: expertise, relevance, purpose, use, usefulness? (Norris & Pfeiffer, 2003)
· Translates into difficulties for development, use, and validation: guidelines, priorities, feasibility?
· GUGD approach

* align assessment expertise with curricular expertise
* develop assessment according to intended uses
* evaluate assessment according to intended uses

2. Setting the stage: Specification of intended test use and development

· Curricular/instructional innovation, but lack of assessment alignment (Byrnes & Kord, 2001; Pfeiffer, 2002)
· Specification of intended test use: who, what, why, impact? (Norris, 2000)
· Initial assessment priorities

* placement exam
* task-based writing assessment
* external proficiency assessment

a. GUGD Placement Exam (Norris, forthcoming)

* short-cut estimate for quick/trustworthy placement decisions
* considered variety of possibilities
* agreed on C-test + Listening + Reading comprehensions tests

b. Task-based writing assessment (Byrnes, 2002)

* prototypical performances for understanding/improving student learning at each curricular level
* prioritization of writing initially, given literacy focus of curriculum
* task + content + language focus

3. Validity evaluation

· Received view: Focus on construct validity of tests as scientific measures (Messick, 1989; AERA, APA, NCME, 1999)
· Validity evaluation: Focus on provision of useful information to particular audiences for informing test improvement (Cronbach, 1969; Kane, 2001; Shepard, 1993)

a. C-test validity evaluation:

* instrument effectiveness,

alignment with curriculum,
* cut-score accuracy,
* scoring reliability,
* teacher/student perceptions and understandings,
* communication,
* placement-achievement relationship,
* placement-language background relationship

The above are all used for assessment improvement

b. Task-based writing assessment evaluation:

* reliability in assessing writing performance
* transactional development/evaluation
* baseline task comparisons
* ETC...(ongoing)

The above are all used for assessment improvement

4. Conclusion
· Integration of development-use-evaluation via focus on intended use
· Assessment as educational process rather than measurement
· Alignment of assessment methods with purposes
· Use of program evaluation models for validity evaluation
· Transformative potential of assessment validity evaluation

References and a few related sources

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Bachman, Lyle F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.

Bachman, Lyle F., & Palmer, Adrian S. (1996). Language testing in practice. Oxford: Oxford University Press.

Bachman, Lyle F., & Palmer, Adrian S. (1981). A multitrait-multimethod investigation into the construct validity of six tests of speaking and reading. In A. S. Palmer, P. J. M. Groot, & G. A. Trosper (Eds.), The construct validation of tests of communicative competence (pp. 149-165). Washington, DC: TESOL.

Barnes, Betsy, Klee, Carol, & Wakefield, Ray. (1990). A funny thing happened on the way to the language requirement. ADFL Bulletin, 22(1), 35-39.

Brennan, Robert L. (2001). Some problems, pitfalls, and paradoxes in educational measurement. Educational Measurement: Issues and Practice, 20(4), 6-18.

Brown, James D., & Hudson, Thom. (2002). Criterion-referenced language testing. New York: Cambridge University Press.

Byrnes, Heidi. (1998). Constructing curricula in collegiate foreign language departments. In H. Byrnes (ed), Learning foreign and second languages: perspectives in research and scholarship (pp. 262-295). New York: The Modern Language Association.

Byrnes, Heidi. (2002). The role of task and task-based assessment in a content-oriented collegiate foreign language curriculum. Language Testing, 19, 419-437.

Byrnes, Heidi, & Kord, Susanne. (2001) Developing literacy and literary competence: Challenges for FL departments. In V. Scott & H. Tucker (eds.), SLA and the literature classroom: Fostering dialogues (pp. 31-69). Boston: Heinle & Heinle.

Center for Advanced Research on Language Acquisition. (2003). Minnesota language proficiency assessments. Retrieved July 24, 2003 from

Chapelle, Carol. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254-272.

Cronbach, Lee J. (1969). Validation of educational measures, Proceedings of the 1969 Invitational Conference on Testing Problems: Toward a theory of achievement measurement (pp. 35-52). Princeton, NJ: Educational Testing Service.

Eldridge, Marlene H. (1999). The German undergraduate foreign language placement process: A national survey of procedures. Unpublished doctoral dissertation. State University of New York at Stony Brook.

Georgetown University German Department. (2003) Developing multiple literacies. Retrieved April 01, 2003 from

Grotjahn, Rüdiger. (1992a). Der C-Test. Einleitende Bemerkungen. In R. Grotjahn (Ed.), Der C-test: Theoretische Grundlagen und praktische Anwendungen (Vol. 1, pp. 1-18). Bochum, Germany: Brockmeyer.

Grotjahn, Rüdiger. (1987). How to construct and evaluate a C-Test: A discussion of some problems and some statistical analyses. In R. Grotjahn, C. Klein-Braley, & D. K. Stevenson (Eds.), Taking their measure: The validity and validation of language tests (pp. 219-254). Bochum, Germany: Brockmeyer.

Grotjahn, Rüdiger, Klein-Braley, Christine, & Raatz, Ulrich. (1992). C-Tests in der praktischen Anwendungen. Erfahrungen beim Bundeswettbewerb Fremdsprachen. In R. Grotjahn (Ed.), Der C-Test: Theoretische Grundlagen und praktische Anwendungen (Vol. 1, pp. 263-296). Bochum, Germany: Brockmeyer.

Jakschik, Gerhard. (1994). Der C-Test für Erwachsene Zweitsprachler als Einstufungsinstrument bei der Schulausbildung. In R. Grotjahn (Ed.), Der C-test: Theoretische Grundlagen und praktische Anwendungen (Vol. 2, pp. 259-278). Bochum, Germany: Brockmeyer.

Kane, Michael T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319-342.

Klein-Braley, Christine. (1997). C-Tests in the context of reduced redundancy testing: An appraisal. Language Testing, 14(1), 47-84.

Köberl, Johann, & Sigott, Günther. (1994). Adjusting C-test difficulty in German. In R. Grotjahn (Ed.), Der C-test: Theoretische Grundlagen und praktische Anwendungen (Vol. 2, pp. 179-192). Bochum, Germany: Brockmeyer.

Linacre, John M. (1998). FACETS computer program for many faceted Rasch measurement. Chicago: Mesa Press.

Lynch, Brian K. (1996). Language program evaluation: Theory and practice. New York: Cambridge University Press.

Messick, Samuel J. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp. 13-103). New York: American Council on Education and Macmillan.

Moss, Pamela A. (1992). Shifting conceptions of validity in educational measurement: Implications for performance assessment. Review of Educational Research, 62(3), 229-258.

Norton, Bonnie. (2000). Writing assessment: Language, meaning, and marking memoranda. In A. J. Kunnan (Ed.), Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida (pp. 20-29). New York: Cambridge University Press.

Norris, John M. (2000). Purposeful language assessment. English Teaching Forum, 38(1), 18-23.

Norris, J. M. (forthcoming). Development and evaluation of a curriculum-based German C-test for placement purposes. In R. Grotjahn (Ed.), Der C-Test: Theoretische Grundlagen und praktische Anwendungen (vol. 5). Bochum: Brockmeyer.

Norris, John M. (2003). Validity evaluation in foreign language assessment. Unpublished doctoral dissertation. Honolulu, HI: University of Hawaii at Manoa.

Norris, John M., & Pfeiffer, Peter. (2003). Exploring the uses and usefulness of ACTFL Guidelines oral proficiency ratings and standards in college foreign language departments. Foreign Language Annals.

Patton, Michael Q. (1997). Utilization-focused evaluation: The new century text ( Third ed.). Thousand Oaks, CA: SAGE Publications, Inc.

Pfeiffer, Peter. (2002). Preparing graduate students to teach language and literature in a foreign language department. ADFL Bulletin, 34, 11-14.

Popham, William J. (2000). Modern educational measurement: Practical guidelines for educational leaders ( Third ed.). Boston: Allyn & Bacon.

Shavelson, Richard, & Huang, Liu. (2003). Responding responsibly to the frenzy to assess learning in higher education. Change, 35(1), 10-19.

Shepard, Lorrie A. (1993). Evaluating test validity. Review of Research in Education, 19, 405-450.

Stokes, Gale. (2002). Guidelines for foreign language and literature teaching responsibilities. Retrieved April 01, 2003 from

van den Branden, Kris, DePauw, Veerle, & Gysen, Sara. (2002). A computerized task-based test of second language Dutch for vocational training purposes. Language Testing, 19(4), 438-452.

For further information, please contact me via e-mail at: