We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The argument-based validation research reported in this chapter was conducted from the perspective of an outside evaluator with concerns about the consistency of scores on the Telephone Standard Speaking Test (TSST), a telephone-based test of second language (L2) English speaking proficiency used to assess improvement in speaking proficiency over time. The test use requires that the warrant for generalization be plausible. It states that observed scores are estimates of expected scores, which are consistent across test tasks, forms, occasions, and raters. To guide the investigation a rebuttal, that observed scores fail to estimate expected scores due to error introduced in the testing process, was formulated. The research investigated two of its assumptions. Data of the TSST scores from 55 undergraduates at two Japanese universities collected twice within a month indicated that test forms had the same means and the same SDs, and that the two scores of each participant were highly correlated. One-third of scores for the same individual differed by one score level. Thus, the results found partial support for one of the assumptions underlying the rebuttal. This chapter concludes by highlighting the important role of rebuttals for including threats of concern to test users in an interpretation/use argument.
This argument-based validation research investigates the validity of score interpretations on a computer-based, graphic-prompt writing test, focusing on the generalization inference. The graphic-prompt writing test assesses examinees’ ability to incorporate visual graphic information into their writing,. Both analytic ratings on Graph Description, Content Development, Organization, and Grammar/Vocabulary (n = 2,424) and composite ratings (n = 606) on written test responses from 101 ESL students were analyzed using Generalizability (G) Theory and Multi-Faceted Rasch Measurement (MFRM). Findings indicated three of the four analytic scales and the composites yielded dependable scores. In addition, the results of the G-studies and MFRM analysis revealed the relative effects of the raters on the total score variance was not trivial for both composite and analytic scores and the three raters were not quite equivalent in their rating severity. Nevertheless, the findings support the generalization inference to a large extent. Thus, it can be claimed the graphic-prompt writing task scores were dependable enough to be used for the intended purposes, particularly with the two-rater and three-task test administration design.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.