National Post-test Service Hotline:+86 (0)10 65906903 Email:ielts@britishcouncil.org.cn

Who accepts IELTS Book IELTS Prepare for IELTS Post-test Services

britishCouncil

IELTS

You might also be interested in

  IELTS Registration

  IELTS Test Centres

  Information for Candidates

IELTS Teacher e-newsletter – September 2020

Insights into the Aptis scoring system



Recently, Dr. Karen Dunn of the British Council’s Assessment Research Group published an updated version of a manual containing detailed explanations of the scoring system of the British Council’s Aptis test. The following article answers some common questions about Aptis’ scoring using information taken from the article. You can download Dr. Dunn’s full article here.

How can we interpret CEFR scores and numerical scores?

For each of the skills (reading, listening, writing and speaking) included in the Aptis test package purchased by the test user, both a numerical score out of 50 and a CEFR (Common European Framework of Reference) level are provided. The CEFR comprises six levels of proficiency with level A1 representing language users with only the most basic of language ability followed by levels A2, B1, B2, C1 and C2, the latter representing users of expert ability. At present, no CEFR score is provided for the Grammar & Vocabulary test paper, which is included in every test package. The numerical score is derived from the number of questions the test taker answered correctly. The CEFR score for each skill is the result of a process of standard setting, involving numerous experts evaluating the difficulty of each question or task used in the test. Based on this, cut scores are set that serve as the minimum numerical scores required for each CEFR level to be awarded. Since standard setting sessions for each of the skills tested by Aptis are separate affairs, the resulting cut scores are likely to differ between skills. As a result, the numerical scores awarded are not directly comparable across skills and were never meant to be.

As a result of the above, it would be wrong to conclude that a test taker who scored 35 for speaking and 37 on listening is better at listening than s/he is at speaking. We can only make observations about test takers’ strengths and weaknesses based on the CEFR scores provided as, while numerical scores cannot be directly compared across skills, a B2 will always have performed significantly better than a B1, which in turn is always better than an A2 etc.

The CEFR levels are quite broad, so that a test taker scoring a borderline B1 in a skill may be noticeably weaker than a B1 test taker who almost achieved B2 in that same skill. The CEFR levels awarded by Aptis do not distinguish between high and low achievers within each CEFR band. It is here Dunn says that the numerical scores can help us rank the performances of the test takers for each skill. For example, a B2 in writing with a score of 44 will have performed better than a B2 with a score of 40. Keep in mind that such comparisons can only be made between performances on the same skill, never across different skills. Also, if the difference between two test takers’ numerical scores is very small, that difference may not reflect any real difference in ability (see explanation of standard error of measurement later on).

How are overall scores calculated?

For test takers having taken the four skills package, in addition to scores for each test component, overall scores for each candidate are provided. Like the component scores, overall scores come in the form of a CEFR level as well as a numerical score on a scale of 200. The latter quite simply is the sum of the numerical scores of all four skills. The score for the Grammar & Vocabulary paper is not included in the overall numerical score. The overall CEFR score is the rounded average of the four skills’ CEFR level so that:

Figure 1: examples of how overall CEFR level is derived

ListeningReadingWritingSpeakingOverall
B2B1B1B1B1
B2B2B1B1B2
B2B2B1A1B1
How should we interpret overall scores?

When examining the Aptis group report, you may notice that sometimes two test takers with identical overall numerical scores are assigned different overall CEFR levels. In some cases, a test taker can even be awarded a lower CEFR level than a test taker with a slightly lower numerical score.

We have seen that the overall CEFR and numerical scores are calculated independently. The overall numerical score simply being the sum of the four skill scores, and these skills each having different cut-off points for each CEFR level, means that an overall score of, for example, 163 for one candidate, is not automatically equivalent to another candidate’s overall score of 163. For this reason, when comparing test takers’ overall scores, we should use the CEFR scores and not the numerical ones.

How are the results for the Grammar & Vocabulary paper used?

As you may have noticed, test taker performance on the Grammar & Vocabulary (G&V) paper is not included in the overall Aptis score, nor is it awarded with a CEFR level. Like the skill-based papers, performance on G&V is expressed as a numerical score out of 50. Performance on this paper is a factor in the decision making concerning the CEFR level to be awarded to test takers scoring around the cut score separating two levels.

As with measurement of any kind, decisions on language ability are subject to measurement error. While test development and validation processes serve to minimize such error, it cannot be eliminated completely (Carr, 2011). In practice, this means that two test takers of equal ability may not necessarily achieve exactly the same score every time they take a test but scores may vary, if only slightly. This margin of acceptable scoring variance is called the standard error of measurement (SEM). Given the relatively large difference between the different CEFR levels in terms of the range of ability each level represents, awarding the wrong CEFR level would have serious consequences for the reliability of the test. To minimize the chance of this happening, whenever a test taker’s numerical score falls within one SEM below the cut score for the next higher CEFR level, a strong performance on the G&V paper may allow the awarding of the next higher CEFR band for that skill. This is done on each of the four skills as knowledge of grammar and vocabulary, while not in itself synonymous with communicative ability, does correlate with performance in each of the communicative skills Aptis aims to test (see Dunn, 2020 for a listing of research corroborating this).

References

Carr, N.T. (2011). Designing and Analyzing Language Tests. UK: Oxford Handbooks Dunn, K. (2020). Aptis scoring system 2.0. Aptis Technical Report (TR/2020/002). London: British Council.

 

 

CLOSE
用户名
密    码
登录 注册

您输入的用户名密码不正确,请重新输入!