筑波大学 人文社会科学研究科                                                現代語・現代文化専攻                                           平井 明代研究室



2018年度  異文化言語教育評価論

 

The Summary of the Second Paper I Read

Title:

Distinguishing features in scoring L2 Chinese speaking performance: How do they work?

 

1. Introduction

  Distinguishing features has been widely used in the construction of rating scales to L2 score performances. International English tests such as TOEFL and IELTS all use distinguishing features to rate L2 learners speaking performance. However, some researchers have criticized that using distinguishing features to judge L2 learners speaking ability may not be suitable. 

 In this regard, in order to validate the distinguishing features widely used in scoring L2 learners speaking ability, this paper aims to examine the relation between L2 learners’ speaking score and distinct speaking features.

 

2. Literature Review

  Although previous L2 English studies (Brown et al., 2005; Brown, 2006a; Iwashita et al., 2008) suggested that there is a link between distinguishing features and scores, this issue has not been clearly examined in L2 Chinese studies.

Previous L2 Chinese studies (Wang, 2002; Zhu, 2009) mainly illustrate that there is a relationship between distinguish features and L2 learners language proficiency. It was found that distinguish features emerged with significant difference among L2 learners performance at different proficiency levels. These studies, however, did not clearly examine how do these distinguish features relate to L2 learners speaking scores. In this regard, this paper would like to fill in this gap and investigate how distinguishing features contribute to L2 Chinese learners speaking scores.

 

3. Identification of distinguishing features

  In this study, distinguishing features were identified by analyzing 4 sets of influential documents concerning L2 Chinese teaching and testing syllabi with focuses on speaking proficiency. The title of the used documents are listed as follows:

 1. Chinese Proficiency Scales and Syllabus of Graded Grammar

 2. Chinese Language Proficiency Scales for Speakers of Other Languages

 3. Five-Band Holistic Scoring Standards for L2 Chinese Speaking in Test Syllabus

for HSK-Advanced Level

 4. Spoken Chinese Proficiency Grading Standards and Testing Guidelines

  By analyzing the documents, 7 distinguish features were identified. The features were target-like syllables, speech rate, pause time, word tokens, word types, grammatical accuracy and grammatical complexity. These features comprised the distinguishing features analyzed in this study.

 

4. Research Questions

In this study, two research questions were addressed:

1. In what way does each of the distinguishing features relate to the scores?

2. How do the distinguishing features contribute to the scores?

 

5. Method

5.1 Measurement of Distinguishing Features

5.1.1 Pronunciation

 While examining L2 Chinese learners’ pronunciation, the syllable has been widely used as a basic unit of analysis. A syllable mainly includes three units: an initial, a final and a tone. Therefore, in this study, if L2 learners could clearly pronounce the three units of a syllable, it represents that they could produce target-like syllables.

    

5.1.2Fluency

  Two distinguish features (speech rate and pause time) were under the category of fluency. In this study, speech rate was calculated by counting the number of syllables, divided by total speech in seconds. As for pause time, it was calculated by counting the total time of unfilled paused and divided it by total speech duration in seconds.

 

5.1.3 Vocabulary

  In regard to vocabulary, word tokens and word types have been widely accepted as effective measures for evaluating L2 learners’ vocabulary. In this study, Word tokens and word types were counted in terms of segmented words as presented in (1):

 

(1)

 

 

Example one shows the Chinese segmented words. In the example, there are 5 word tokens and 4 word types.

 

5.1.4 Grammar

  In Chinese, the ‘sentence’ has to be taken as the basic unit of analyzing grammatical accuracy and grammatical complexity. In this regard, this study examined L2 learners’ grammatical accuracy and grammatical complexity by observing the percentage of error-free sentences and the mean length (number of syllables) of sentences L2 learners applied.

 

5.2 Tasks

This study implemented 2 integrated tasks and 1 independent task in order to measure learners speaking performance.

 

5.3 Participants

There were 66 students from L2 Chinese speaking courses for advanced level learners at a comprehensive university in Shanghai. L2 learners speaking performance were all recorded for further analysis. In addition, two raters participated in the scoring of the speech data.

 

5.4 Scoring method

A five level scoring rubric was developed for the scoring:

 

 Raters used holistic scoring by assigning single whole levels to L2 learners’ performance on all three tasks.

 

6. Results

6.1 In what way does each of the distinguishing features relate to the scores?

  The results first showed that the seven distinguish features were strongly correlated with the scores as can be seen in Table 1:

 

Table 1 Correlation Matrix

 

Table 1 presented that speech rate, word tokens and word types were in large size effect (|r| ≥ 0.50) in correlations with the scores. As for target-like syllables, pause time, grammatical accuracy and grammatical complexity, they were of medium effect sizes (0.30 ≤ |r |≤ 0.49) in correlations with the scores. Such results indicate that the distinguishing features had strong or moderate effect on the scores.

 

6.2 How do the distinguishing features contribute to the scores?

 In order to answer this question, the researcher carried out two standard multiple regression in which six distinguishing features (incorporating word tokens and word types, separately[1]) were regressed against the scores. By doing so, the researcher could clearly examine the contribution of each features to the scores. The results can be seen in Table 2 and 3:

 

Table 2 Regression 1 (including word tokens)

 

 

Table 3 Regression 2 (including word types)

 

The tables presented that target-like syllables, word tokens, word types and

grammatical accuracy made a significant contribution to L2 learners’ scores. These

features also represented the three crucial categories in assessing L2 learners speaking

ability: pronunciation, vocabulary and grammar.

 

 

7. Conclusion and Implication

  This study attested that there was a strong relation between distinguishing features and L2 learners’ performance. This further suggested that distinguishing features could provide evidence for constructing valid rating scales. In addition, the results in this study also offered empirical evidence for developing automated scoring of L2 Chinese speaking in the future.  

 

My thoughts from the two papers I read

Among the four language skills: reading, writing, listening and speaking, speaking

skill is considered as the most difficult skill to evaluate. I am eager to know what kind

of features should be taken into consideration while assessing students’ speaking ability.

By reading the journal articles, I learned that distinguish features can contribute to L2

learners scores. The results clearly pointed out critical features which should be

included while evaluating L2 learners speaking ability.

 

From the two studies I read, they all presented that word tokens and word types can

be significant features while assessing learners speaking ability. This entails that

learners’ vocabulary load can be a great factor which may influence their speaking

ability. In addition, the studies also demonstrated that grammatical accuracy could

be a crucial factor. This indicates that learners’ grammar knowledge can be a great

factor affecting L2 learners speaking performance. In this respect, it is obvious that

learners’ vocabulary load and grammar knowledge are crucial factors which can

strongly influence their speaking performance.

 

However, in regard to pronunciation, previous studies showed different outcomes.

Iwashita et al. (2008) showed that this feature did not strongly influence learners

speaking performance, whereas Jin and Bak (2012) demonstrated that this feature

could greatly influence learners speaking performance. As for the feature of pause

time and speech rate, previous studies also had different results. The study of Iwashita

et al. (2008) presented that the pause time and speech rate feature could significantly

affect learners speaking performance, while the study of Jin and Bak (2012) rejected.

In this respect, it is considered that more studies on examining speaking fluency and

pronunciation are necessary. The main results from these studies are presented in the

following table:

 

 

 

 

Table 4 The main results from the two studies

Category

Features

Effects

Yes

No

Pronunciation

Target-like syllables

Fluency

Speech Rate

Pause time

Vocabulary

Word tokens

 

Word types

 

Grammar

Grammar Accuracy

 

 

  By knowing these results, I consider that vocabulary and grammar features have to be taken into consideration while assessing L2 learners’ speaking ability.

 

However, although these studies did point out crucial features for evaluating learners’ speaking performance, they also have some limitations. First, these studies did not clearly distinct the difference between adjacent levels. The results of the studies cannot obviously show how level 1 and level 2 learners perform differently in the speaking tests. I think that future studies should figure out other ways to resolve this problem. In addition, both studies did not really take accent into consideration. According to Derwing and Munro (2001), accent can be a crucial factor which may influence learners’ language performance. In this regard, I consider that future research may also include accent as a feature and observe whether this feature influence learners’ speaking scores.

 

In sum, from previous studies concerning assessing speaking, I learned that vocabulary and grammar features can be good features to measure learners’ speaking ability. I also learned to interpret the results from different statistical analyses such as Pearson’s r correlation coefficients and standard multiple regression. By learning these things, I not only gained more knowledge on the ways to assess speaking, but also learned more ways to implement and interpret statistical analyses.

 

References

Brown, A. (2006a). An examination of the rating process in the revised IELTS Speaking Test. In

P. McGovern & S. Walsh (Eds.), IELTS research reports 2006:41–70. Canberra &

Manchester: IELTS Australia and British Council.

Brown, A., & Taylor, L. (2006). A world survey of examiners’ views and experience of the revised

IELTS Speaking Test. Cambridge ESOL: Research Notes 26:14–18.

Derwing, T & Munro, M. (2001). What speaking rates do non-native speakers prefer. Applied Linguistics 22.3:324-337.

Iwashita, N., Brown, A., Mcnamara, T & O’hagan, S. (2008). Assessed Levels of Second Language Speaking Proficiency: How Distinct?. Applied Linguistics 29.1:24-49.

Jin, T & Mak, B. (2012). Distinguishing features in scoring L2 Chinese speaking performance: How do they work?. Language Testing 30.1:23-47.

Wang, J. (2002). A study of the scoring of three types of oral test items (in Chinese). Chinese

Teaching in the World 4:63–77.

Zhu, S. (2009). A study on the dynamic oral text of Korean learners of Chinese (in Chinese).

Beijing: China Master Theses Full-text Database.



[1] The results also showed that there was a high correlation between word tokens and word types. In order to clearly examine that the two features has a high correlation to the scores, the researcher had to examine the two features separately in the standard multiple regression. Two standard multiple regression were thus conducted.