![]() |
2019年度 異文化言語教育評価論 |
Chapter 1
Mediating
Assessment Innovation: Why Stakeholder Perspectives Matter
1.4
Assessment Validation
1.4.1 Fundamental Considerations
- assessment is the basis for the evolution
of teaching and learning processes
Assessment Stakes
- low-stakes assessment has small
effect on stakeholders the teaching/learning process = small consequences
- high-stakes assessment (such as
'failing the grade' = negative consequences included) have great effect on
stakeholders
- As Shohamy (2001b) explains, assessment
results have substantial social consequences. High-stakes decisions
(winner/loser, acceptance/rejection) are usually made based on limited media (grades,
marks, percentages or comments).
- Tests are seen as an unpleasant experience
- a source of anger, pressure, competition and humiliation, while lacking the
feeling of testing real knowledge or proper explanation of why tests are
important. (Learning is supposed to be fun and rewarding, so testing feels like
betrayal. (Shohamy, 2007))
- It is important for assessment results to
be closely representing of students' abilities.
- In foreign language education, a test
should be an opportunity for its takers to present their "best"
performance. There is also need accurate discrimination between different
levels.
1.4.2 The Contributions of Assessment
Score Evidence to a Validity Argument
- To test validity and reliability of newly
proposed assessments, psychometric model, based on psychology, can be applied.
- construct validity = agreement
between a test score or measure and the quality it is believed to measure (Do
the scores of a test tell us how well students are able to perform in a certain
activity?)
- reliability: how scores are awarded
and whether the assessment process is consistent
-
parallel forms: comparation with a different assessment concerning the same
construct
-
test-retest reliability: will same assessment completed at a different time
differ?
- fairness: the construct is clearly defined,
meaningfully operationalized and the scores are reliable.
- If we guarantee the fairness of a test,
high-stakes decisions become more beneficial and can be performed with an ease
of mind.
- Do performance outcomes alone provide
sufficient evidence for the evaluation of a fairness?
1.4.3 The Limitations of Assessment Score
Evidence to a Validity Argument
- assessment in practice is often
unpredictable and frequent innovation of assessment can cause frustration to
teachers.
- Norris (2008) argues that assessment
validity evaluation requires more than a small sample of data. McNamara and Roever
(2006) criticize the psychometric approach as being too disconnected from the
reality of "far-reaching and unanticipated social consequences."
- In high-stakes decisions, ranking or
grading (abbreviating results into a numerical score) are not avoidable, so the
validity of ranking/grading will not be discussed.
- McNamara and Roever (2006) and others
suggest, that performance scores are not sufficient evidence to for validity of
an assessment.
1.4.4 Towards a Broader Understanding of
Assessment Validation
- several reasons for measurement errors:
-
construct under-representation: the assessment task does not include the
important aspect of the construct
-
construct irrelevant variance: assessment includes variables that are
not relevant to the construct (when some aspects of the task make the task
irrelevantly easy / difficult for some test takers)
converse |
interact |
- positive effect
on test takers' performance - not fully relevant
to speaking proficiency |
- created with
focus on reliability and validity - more
representative of spoken communication - peer interaction
might cause variance in difficulty, causing one or both test takers to
under-perform |
- therefore, validity should have evidential
(score etc.) and consequential (value implications and social
consequences) basis
- opposite side of the argument represented
by Newton and Shaw (2014): ethical concern about impacts of assessment is
taking away the 'simple' focus from scores.
- not only the conditions within the test,
but also the conditions around it (teaching and learning process, assessment
process) should be considered.
- assessment results (performance outcome)
alone can't provide enough evidence for the validity of interact.
1.4.5 A Qualitative Perspective on
Assessment Validation
- Lazaraton (2002) argues, the most important
development in language testing in 1990s was the introduction of qualitative
research methods. There are multiple qualitative approaches. These methods help
better understand phenomena in qualitative data, and should play a bigger role
in applied linguistics (Lazaraton, 2002).
- stakeholders' judgements about an
assessment are important for determining consequential validity. Particularly
teachers' ones, as teachers have insight into administrating tests and the
consequences of assessment.
- Positive effect of tests can also be
promoted by involving test takers in the design and development of the test.
- Fundamental questions posed by this book:
- What
are teachers and students making of interact?
- What
is working, what is not working, what could work better?
- What
are the implications, both for on-going classroom practice and for on-going
evaluation of the assessment?
1.6 Conclusion
- ideological theoretical perspective affects
language teaching, but we need to confirm, whether these innovations are truly
beneficial
- central control: students change their
behavior to match test form → authorities use tests to assimilate students' behavior to their own
(Shohamy, 2001b)
- proponents of new assessment method must
provide convincing justification. However, quantitative data is not enough for
such justification anymore (McNamara, 1997).
Discussion Points
(1) Try to think of actual examples of construct
under-representation and construct irrelevant variance.
(2) If interact scores depend on
conversational partner, can we consider this test lacking the quantitative
property of reliability, since the outcome will be different each time.
Is this really a quantitative / qualitative issue?