Validity Evaluation

Validity evaluation in foreign language assessment:
Understanding and improving test use in the
Georgetown University German Department

John M. Norris

Northern Arizona University and
Center for Advanced Study of Language, University of Maryland
German Research Seminar, Georgetown University

October 16, 2003

1. Introduction: The challenge of assessment in college FL education

· Many reasons, pressures, traditions for assessment
· Many problems for college FL assessment: expertise, relevance, purpose, use, usefulness? (Norris & Pfeiffer, 2003)
· Translates into difficulties for development, use, and validation: guidelines, priorities, feasibility?
· GUGD approach

* align assessment expertise with curricular expertise
* develop assessment according to intended uses
* evaluate assessment according to intended uses

2. Setting the stage: Specification of intended test use and development

· Curricular/instructional innovation, but lack of assessment alignment (Byrnes & Kord, 2001; Pfeiffer, 2002)
· Specification of intended test use: who, what, why, impact? (Norris, 2000)
· Initial assessment priorities

* placement exam
* task-based writing assessment
* external proficiency assessment

a. GUGD Placement Exam (Norris, forthcoming)

* short-cut estimate for quick/trustworthy placement decisions
* considered variety of possibilities
* agreed on C-test + Listening + Reading comprehensions tests

b. Task-based writing assessment (Byrnes, 2002)

* prototypical performances for understanding/improving student learning at each curricular level
* prioritization of writing initially, given literacy focus of curriculum
* task + content + language focus

3. Validity evaluation

· Received view: Focus on construct validity of tests as scientific measures (Messick, 1989; AERA, APA, NCME, 1999)
· Validity evaluation: Focus on provision of useful information to particular audiences for informing test improvement (Cronbach, 1969; Kane, 2001; Shepard, 1993)

a. C-test validity evaluation:

* instrument effectiveness,

alignment with curriculum,
* cut-score accuracy,
* scoring reliability,
* teacher/student perceptions and understandings,
* communication,
* placement-achievement relationship,
* placement-language background relationship

The above are all used for assessment improvement

b. Task-based writing assessment evaluation:

* reliability in assessing writing performance
* transactional development/evaluation
* baseline task comparisons
* ETC...(ongoing)

The above are all used for assessment improvement

4. Conclusion
· Integration of development-use-evaluation via focus on intended use
· Assessment as educational process rather than measurement
· Alignment of assessment methods with purposes
· Use of program evaluation models for validity evaluation
· Transformative potential of assessment validity evaluation

