The long case has been the cornerstone of the clinical examination since its inception in 1842 until the introduction of the Objective Structured Clinical Examination (OSCE) in 1964. In 2003 the Royal College of Psychiatrists substituted the long case in the Part I membership examination (MRCPsych) with the OSCE. In 2008 the College replaced the two clinical examinations with a single OSCE and renamed it the Clinical Assessment of Skills and Competencies (CASC), to be taken at the end of 3 years of training. As this new format of examination has been in use for a decade now, it is time to compare the merits of the two systems.
The traditional long case
The old long case required the candidate to spend about 60 min with a real patient, taking a history and performing an examination, while not being observed by the examiners. The candidate would then present and discuss their findings, diagnostic formulation, management options and prognosis to a pair of examiners. The candidate would be asked to interview the patient in front of the examiners and demonstrate and/or clarify aspects of the case. Marking criteria were based on the aforementioned domains.
The long case was evidently excellent at testing the very essence of medical practice, namely history-taking, examination, formulation, differential diagnosis and management planning. Presenting the case to the examiners and defending their findings and conclusions enabled the examiners to get a true sense of the candidates’ clinical skills and competence. The use of real patients, requiring genuine sensitivity, lent authenticity to the case and enabled assessment of the candidate’s overall clinical approach. In formative assessments, the long case helps highlight areas for improvement.
The disadvantages of the long case include low reliability, low validity and the inability to generalise from one long case about the candidate’s ability in other cases. The single-case format does not allow the breadth of skills to be tested Reference Benning and Broadhurst1 or to sample the curriculum widely, Reference Ponnamperuma, Karunathilake, McAleer and Davis2 which results in low reliability. Moreover, there is a lack of standardisation of diagnostic complexity. Reference Benning and Broadhurst1
The use of flexible, subjective and global judgements and the lack of clarity in the marking system lead to poor reliability Reference Wilson, Lever, Harden and Robertson3 and examiner bias. Reference Maxim and Dielman4 Hubbard et al Reference Hubbard, Levitt, Schumacher and Schnabel5 found that the correlation of independent evaluations by two examiners was only 25%. Furthermore, Leichner et al Reference Leichner, Sisler and Harper6 showed that the luck of the draw in selection of examiners and patients played a significant role in the outcome of postgraduate examinations in psychiatry.
The candidate is not observed during the interview with the patient, except for a brief period during the viva. This gives the examiners little opportunity to reliably assess the candidate’s ability to communicate with the patient, Reference Gleeson7 thereby compromising the validity of the long case. The competence in one long case does not indicate a candidate’s ability across a range of other cases and clinical situations. Reference van der Vleuten8,Reference Wass and Jolly9 The inability to assess candidate competence through a single case has been termed ‘case specificity’. Reference Eva10 Wilkinson et al Reference Wilkinson, Campbell and Judd11 estimated that at least five or six 85-minute long cases (60 min with the patient and 25 min with the examiners) were necessary to achieve 0.8 dependability (a more conservative figure of reliability).
OSCE/CASC
The OSCE/CASC replaced the long case as it was purported to have better reliability. Reference Badger, deGruy, Hartman, Plant, Leeper and Ficken12-Reference Vu and Barrows14 Currently, the MRCPsych clinical examination consists of 16 stations, split into two circuits; one circuit consists of 8 individual stations of 7 min with a preceding 1 min ‘preparation’ time, and the other circuit consists of 4 pairs of linked stations of 10 min each with an additional 2 min of ‘preparation’ time. The examination lasts 160 min.
The OSCE helps examine a greater breadth of problems than the traditional long case. Having a greater number of examiners reduces the effects of examiner variability. Standardised patients improve reliability and validity. The OSCE also enables the testing of scenarios that might have been distressing to a ‘real’ patient, for example bereavement and terminal illness.
Nevertheless, the OSCE has a number of disadvantages. The face validity and content validity of the OSCE are sufficient for testing the knowledge of junior trainees. Reference Hodges, Regehr, Hanson and McNaughton15-Reference Thompson17 However, it has questionable construct validity in assessing senior trainees (at core trainee year 3 (CT3) level), because a checklist approach would be unsuitable for assessing complex knowledge, practical and communication skills and competence of senior trainees, and risks oversimplifying real-life situations. Reference Hodges, Regehr, McNaughton, Tiberius and Hanson18 Consequently, the CASC explicitly requires the use of global scores to assign pass/fail decisions. More recently, the College has been exploring domains of competency instead of a single summative judgement of pass or fail. This suggests that global judgement of mastery is more reliable than checklists. Reference Hodges, Regehr, McNaughton, Tiberius and Hanson18,Reference Regehr, MacRae, Reznik and Szalay19 In spite of this, the OSCE is not suitable for assessing more complex, yet vital phenomena such as transference or ‘interpersonal connection’. Reference Hodges, Hanson, McNaughton and Regehr20
Validity is intrinsically linked to context. A number of 10-minute stations could never mimic a thorough, 1-hour clinical assessment, the daily bread of a jobbing psychiatrist. Reference Hodges21 The current CASC requires candidates to assess a new patient or explain a diagnosis and its management in 7-10 min. This is unlikely to reflect their skills and competencies in completing such tasks in day-to-day clinical practice. Moreover, there is a risk of trainees who are competent in routine clinical work failing the exam, whereas those who may be clinically inept, but have prepared specifically for the CASC exam, may pass it.
Another, albeit anecdotal, observation is that since the introduction of the CASC trainees have become unwilling and/or unable to assess, formulate and present whole cases. They have adapted their learning methods and focus during clinical work to passing the exam, for example undertaking only those tasks that can be completed in 10 min. In the OSCE/CASC, history-taking, examination, formulation and management skills are only partially assessed, if at all, and have led to exam-focused, short-case-competent trainees. Consequently, the new generation of trainees miss out on the vital experience of conceptualising whole cases. This represents a gradual undermining of a holistic, biopsychosocial approach central to the culture of psychiatry. Reference Benning and Broadhurst1
How do the two formats compare?
There is ample evidence suggesting that the long case is at least as reliable as the OSCE. Reference Marwaha22-Reference Wass, Jones and van der Vleuten24 Norman Reference Norman25 suggested that observed multiple long-case examinations may have better reliability than the OSCE. Wass et al Reference Wass, Jones and van der Vleuten24 found that the reliability of the long case, when carried out with two pairs of examiners, was not better or worse than that of the OSCE. They estimated that a reliability of 0.8 can be achieved with ten long cases on history-taking, with two examiners observing each long case. However, the time, logistics and cost-effectiveness issues related to running multiple long cases with multiple examiners preclude their use in standard examinations.
As we know from clinical experience, the whole is more than the sum of its parts and artificially breaking down a 1-hour clinical encounter into a number of 7- or 10-minute bites is not sufficient to assess the complex clinical skills and competencies of senior trainees. The Psychiatric Trainees’ Committee of the Royal College of Psychiatrists has been concerned about the disconnection between routine clinical practice and the CASC stations and has requested longer CASC stations (G. A. Lomax, personal communication, 2013).
Perhaps one could conceptualise the difference between the long case and the CASC in terms of competencies and competence. Competencies are a series of discrete skills that are learnt and assessed separately. They are limited to visible behaviour and its measurement. They are necessary, but not in themselves sufficient for safe and effective practice. Competence, in contrast, is a holistic understanding of practice and all-round ability to carry it out. Competence takes into account the subtleties of sensitivity, imagination, wisdom, judgement and moral awareness that are the marks of a wise doctor, and is a better goal than competencies. Reference Fish and de Cossart26 The CASC, as it states in its name, assesses competencies whereas the long case is more attuned to assess competence.
Workplace-based assessments
Workplace-based assessments (WPBAs) were introduced with the hope that they would preserve some of the advantages of the long case. They were rolled out at the same time as the CASC, as part of the formative assessment of a range of core skills mapped on to the curriculum, with the CASC offering the summative assessment at the end of basic training. One of the WPBAs, the Assessment of Clinical Expertise (ACE), gives supervisors the opportunity to observe trainees in a range of clinical situations and has the potential to assess the more abstract aspects of consultation, akin to the long case. Fitch et al Reference Fitch, Malik, Lelliott, Bhugra and Andiappan27 argue that there is limited evidence base for WPBAs and that they have been designed neither specifically for psychiatry nor for postgraduate education in the UK. Trainee psychiatrists can attempt the CASC only after they have successfully demonstrated their competencies through WPBAs. The CASC pass rate of 39.3% Reference Bateman28 suggests that the WPBAs are not assessing what they are supposed to assess. In addition, there are questions about the reliability of the WPBAs, with no systems to add any external quality assurance to the process.
Trainee attitude towards the WPBAs is revealing. Menon et al Reference Menon, Winston and Sullivan29 found that most trainees, both junior and senior, are unimpressed with WPBAs. They are dissatisfied with the evidence underpinning the assessments, the manner of introduction of the WPBAs and the training of assessors. Furthermore, the majority of trainees did not find the WPBAs benefiting their supervision, training, clinical practice or confidence. They opined that the new system was unacceptable, did not accurately reflect their progress, was no better than the previous system and should not be retained. This is a fairly damming indictment, even from a group of cynical trainees.
The training and support of trainers/assessors is fundamental in maintaining the quality of any assessment system, including the WPBAs. Noel et al Reference Noel, Herbers, Capow, Cooper, Pangaro and Harvey30 suggested that brief training interventions are insufficient to produce the required accuracy. In this context, the finding that 22% of trainers had received no training whatsoever and that only half of those receiving training felt confident in undertaking the WPBA Reference Babu, Htike and Cleak31 is worrying.
The WPBAs were hastily introduced to address the gap caused by the substitution of the long case with the CASC, but have convinced neither the trainees nor the assessors about their utility, let alone led to a reasonable pass rate.
Approaches to improve reliability/validity of examinations
There have been various attempts to meet the urgent need to increase the validity of the final clinical examination while maintaining its reliability. Reference Fitch, Malik, Lelliott, Bhugra and Andiappan27 Modifications to the long case include incorporating the best aspects of the OSCE, for example structuring the format and the marking scheme; increasing the number of examiners; observing the candidate’s behaviour; increasing the number of cases to 4-6; and shorter assessment (e.g. 20-45 min). Attempts to improve the OSCE involved increasing the duration and the examiners directly questioning the candidates. Reference Wass and Jolly9,Reference McKinley, Fraser, van der Vleuten and Hastings32-Reference Olsen, Coughlan, Rolfe and Hensley35
The Royal Australian and New Zealand College of Psychiatrists, for example, employ a hybrid scheme in their final clinical exams, first using an Observed Clinical Interview (OCI), essentially an observed long case, before the trainee progresses to their OSCE, akin to the Royal College of Psychiatrists’ CASC. 36 The Royal College of Physicians and Surgeons of Canada employ a two-stage clinical examination. The first is a structured OSCE, 37 which can be taken at any stage of training. This consists of eight to ten 20-minute stations. There is no contact with ‘real’ or simulated patients. Much of the assessment takes the form of direct questioning by an examiner, akin to the Royal College of Psychiatrists’ now abandoned ‘Patient Management Problems’. The second is the Structured Evaluation of Clinical Evaluation Report (STACER), 38 which is taken before training is completed, at the stage of a ‘junior consultant’. This is similar to the traditional long case but the candidate is observed throughout the assessment.
Gleeson Reference Gleeson39 introduced the Objective Structured Long Examination Record (OSLER), a 10-item analytical record of the traditional long case, with an examiner-observed history-taking and physical examination, and a criterion-referenced marking scheme to improve the reliability of the long case. Van der Vleuten & Schuwirth Reference van der Vleuten and Schuwirth23 noted the educational value of the OSLER in terms of providing feedback. They opined, however, that reliability could be better improved by increasing the number of cases than by focusing on observing the student during the long case. In a study using observed long cases, Pavlakis & Laurent Reference Pavlakis and Laurent40 established that postgraduate trainees did not pay attention to physical examination skills as these had not previously been observed. They highlighted the value of observation of the long case as it forced the candidates to master clinical assessment skills. They disapproved of the focus on the discussion of patient management in the long case at the expense of the assessment of clinical examination technique.
Olsen et al Reference Olsen, Coughlan, Rolfe and Hensley35 evaluated a structured question grid for the long case using two examiners, one of whom marked using a structured question grid and the other did not. They found no significant difference in ‘the chance of students being assessed as failing’ or in the likelihood of a discrepancy between the ratings. Standardising aspects of the case presentation and viva improves not only reliability but also the candidates’ perception of fairness.
Wass & Jolly Reference Wass and Jolly9 incorporated observation and multiple examiners into the long case. A pair of examiners observed and marked the history-taking and another pair marked the presentation, both using checklists and global scores. They found higher inter-examiner reliability for observation (checklist 0.72; global 0.71) than for the presentation (checklist 0.38; global 0.60). They concluded that observation of history-taking in the long case is a distinct component of clinical competence, which the traditional ‘presentation only’ format does not measure.
Norcini Reference Norcini41,Reference Norcini42 argued that: (a) case specificity; (b) examiner stringency; and (c) the aspects of competence evaluated contributed to the unreliability of the long case. He proposed: (a) increasing the number of cases; (b) minimising differences among examiners by increasing their number and standardising across examiners by training them; and (c) increasing the number of aspects of competence assessed, providing the examiners with lists of competencies and using examiner observation.
A different approach was adopted by Hamdy et al Reference Hamdy, Prasad, Williams and Salih33 in the Direct Observation Clinical Encounter Examination (DOCEE) and the Integrated Direct Observation Clinical Encounter Examination (IDOCEE). Reference Abouna and Hamdy43 In this method, two to three examiners together observed the candidates carrying out history-taking and physical examination of four to six patients. The generalisability coefficient for four cases and two examiners was 0.84 for each case. A similar reliability was achieved by using observed 14-minute history-taking component of the long case followed by a 7-minute interview. Reference Wass, Jones and van der Vleuten24 It was found that if each long case was observed by one unique examiner, at least ten observed history-taking long cases were required to achieve 0.8 reliability. In addition, Luiz et al Reference Luiz, Roberto, Fernando, Eduardo, Lio and Ana44 reported 89% examiner agreement on candidate achievement of clinical skills when each candidate took two structured, standardised, observed long cases, each marked by a different examiner. There are other approaches to increasing the reliability by using more than one long case of different duration. Reference Kroboth, Hanusa, Parker, Coulehan, Kapoor and Brown45
A proposal for a more reliable and valid examination
Considering the fact that the current summative examination has questionable validity and that the WPBAs have not filled the gap, we propose that the College replace the current CASC with fewer but longer stations. Instead of the current 16 stations in 160 min, we recommend 6-8 stations of 20-30 min duration, from different subspecialties. The stations could involve combinations of different skills from a blueprint based on the curriculum, for example history-taking, mental state examination, physical examination, formulation, explaining diagnosis, management and prognosis, etc. There could be linked stations where candidates formulate the case they assessed in the previous station and present to the examiners. The marking scheme for these stations could be global judgements, as it is with the current CASC examinations in addition to marking for two to three sub-domains. The logistics of organising such an examination and examiner training need further exploration. Two examiners at each station would increase the reliability.
This format would retain the main benefits of the OSCE/CASC but would also address the problems associated with the shorter stations, namely not assessing whole cases, not assessing the ability to formulate cases and develop comprehensive management plans, etc. It would align the examination more closely with the day-to-day clinical work of psychiatrists. In addition, it would render examination preparation crash courses obsolete, as simply taking a meticulous approach to the routine clinical work would enable passing the examination, which would benefit both patients and trainees. This system is also likely to simplify the process of organising the examination as well as making it cheaper to run, and might ensure that the future generations of psychiatrists are suitably equipped with the complex skills essential to the practice of psychiatry.
eLetters
No eLetters have been published for this article.