筑波大学 人文社会科学研究科                                                現代語・現代文化専攻                                           平井 明代研究室



2018年度  異文化言語教育評価論


Final Report

              Speaking assessment is a widely-popular topic since speaking proficiency is valued so highly in the modern world. In order to prepare the right assessment test, it is important to make sure that the test is valid, reliable, and fitting for the context, in which the test is taken.

              One of the most trusted and influential approaches in assessment test validation is an argument-based approach commonly known as Assessment Use Arguments framework, initially proposed by Bachman (2003, 2005), and later improved by Bachman and Palmer (2010). Assessment Use Arguments (AUA) framework is a system of logical arguments, consisting of a warrant, a backing, a rebuttal and a rebuttal data, which all facilitate in connecting test performance to decision making,  and proving the validity of a particular test construct. In other words, applying AUA framework to a test provides a systematic chain of arguments, relevant to test stakeholders and specific context, in order to offer evidence of the test validity.  Bachman offers a simple illustration as to how the framework is structured:

Data: Mark was born in the USA.

Claim:  Mark is a  U.S. citizen.

Warrant: All individuals born in the U.S. are U.S. citizens.

Backing: According to the U.S. Constitution, anyone born in the U.S. Is a U.S. citizen.

Rebuttal: Marc has renounced his U.S. citizenship.
Rebuttal data: Marc’s affidavit renouncing his U.S. citizenship.

In this situation, given the clear evidence against the claim (rebuttal data), we conclude that Mark is not a U.S. citizen. 

(2005)

This example very basic, and not related to language assessment. However, AUA framework is commonly used to build evidence regarding a given test validity. One of the examples is Long, Shin, Geeslin & Willis’s (2018) evaluation of a Spanish placement test where the group created a placement test from scratch, conducted the test among2201 participants, followed up with statistical analysis of the results, built up the AUA framework, and used statistical analysis as evidence for backing and rebuttal. The study concluded that based on AUA framework,  the test was indeed valid and reliable.

              Another example is the use of AUA for validating TOEIC test. Schmidgall (2017) implements the argument-based approach and offers evidence from test design process, statistical and procedural monitoring, and research data to back the arguments. In fact, the AUA claims are available on the TOEIC website to communicate them to the major stakeholders (ETS.org). Schmidgall additionally states that since TOEIC is designed for a variety of uses, it is also reasonable to build individual AUAs to offer context-specific claims and evidence. (2017).

              Additionally, the new speaking assessment system interact, introduced in New Zealand in 2012-2013 was also submitted to AUA framework evaluation. In this case interact was assessed from six dimensions to test usefulness of the method. The dimensions included construct validity, reliability, interactiveness, impact, practicality and authenticity, using evidence from the surveys and interviews of the stakeholders (East, 2016). The results have reflected the overall usefulness of interact as a form of assessment. Although, the perceived usefulness was communicated largely from the teachers’ point of view.

              Im, Shin & Cheng (2019) mention another 33 journal articles and dissertations, and procides the analysis for 8, which utilize traditional, as well as more creative methods to argument-based validation of the test systems. They  provides a thorough review AUA framework, along with other validation methods, concluding that while AUA framework is a modern and widely accepted method, it requires testing organizations to conduct ongoing validation studies to accommodate continuous shift in social, political, and cultural contexts, which inevitably influence the AUA framework arguments. Additionally, these studies need to consider perspectives of stakeholders to accurately shape intended scores and their application for these specific contexts (Im, Shin & Cheng, 2019).

              All these examples illustrate how the argument-based approach can validate specific claims (whether it’s validity, reliability, usefulness, etc) within specific tests. When it comes to a speaking assessment, the rules remain the same. Test developers can design a variation of a speaking activity with the evaluation matrix, ask the students to participate, and based on the results evaluate the speaking test using the AUA framework. However, while evaluating reading or listening skills is more straightforward, and can even be done automatically with a computer algorhythm, speaking carries additional layers, related to social status, cultural background, gender, etc., which a computer cannot always recognize due to currently present, but slow developments in speech-to-text recognition software. These additional levels influence the way we speak, resulting in different accents, lexicon, proficiency level, etc. And since testing organizations are moving towards evaluating speaking proficiency automatically, the issue of validity is urgently raised.

              In order to address the issue of validity in automated assessment of speaking proficiency, the assessment software has to contain corpora with a large variety of speaking patters, considering accents, intonation, lexicon, syntactical patterns, etc, to be able to recognize and evaluate speech. Of course, with a current technological developments, such a large task is still far-reaching. However, an alternative strategy could include a focused narrowly-based testing system, which has a limited database, but which is relevant to the context of the testing environment. For example, in a case where there is a company of IT developers in Russia who have to be able to talk about their projects in English, the company can develop a speaking assessment software which will be limited to testing IT jargon, will be able to recognize phonetic peculiarities of English infused with Russian phonetic standard, and perhaps contain syntactical and idiomatic variations of English pertaining to Russian speakers, since it is common for students with lower proficiency to translate directly from their language, therefore ruining the familiar SVO structure in English and translating Russian idioms literally into English. At the end, this new assessment system could be evaluated with a newly-created AUA framework, fitting to this specific context, in order to confirm that this particular assessment system offers results expected from the test.

              In conclusion, it is important to check the validity and reliability of the tests to be able to provide speaking assessment accurate to the context in which the test is taken. And AUA framework can be a very tool useful even when preparing an assessment tool for speaking proficiency evaluation.

             


 

References

Bachman, L. F. (2003). Constructing an assessment use argument and supporting claims about test taker-assessment task interactions in evidence-centered assessment design. Measurement: Interdisciplinary Research and Perspectives, 1, 63–65.  https://doi.org/10.1207/S15366359MEA0101_03

Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2, 1–34.  https://doi.org/10.1207/s15434311laq0201_1

Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford: Oxford University Press.

East, M. (2016). Assessing Foreign Language Students’ Spoken Proficiency: Stakeholder Perspectives on Assessment Innovation. Springer.

Im, G.-H., Shin, D., Cheng, L. (2019). Critical review of validation models and practices in language testing: their limitations  and future directions for validation research. Language Testing Asia, 9 (14). https://doi.org/10.1186/s40468-019-0089-4

Long, A. Y., Shin, S.-Y., Geeslin, K., & Willis, E. W. (2018). Does the test work? Evaluating a web-based language placement test. Language Learning & Technology, 22(1), 137–156. https://dx.doi.org/10125/44585 

Schmidgall, J.E. (2017). Articulating and evaluating validity arguments for the TOEIC tests. ETS Research Report Series, 2017-1 , 1-9. https://doi.org/10.1002/ets2.12182

ETS.org. (n.d.) The Theory Behind the TOEIC® Program. https://www.ets.org/toeic/organizations/research/theory/