Introduction
As research in the field of therapist education moves towards evidence-based training for psychotherapists, a growing body of literature focuses on which methods are most effective in improving therapist skills (Bennett-Levy et al., Reference Bennett-Levy, McManus, Westling and Fennell2009a; Hill et al., Reference Hill, Spiegel, Hoffman, Kivlighan and Gelso2017; Henrich et al., Reference Henrich, Glombiewski and Scholten2023; Rakovshik and McManus, Reference Rakovshik and McManus2010). In this context, experience-based techniques, such as role-playing, modelling, and self-reflective practice, are increasingly coming to the fore (Bennett-Levy et al., Reference Bennett-Levy, McManus, Westling and Fennell2009a). The present study focuses on self-reflection, as it is considered to be a fundamental element in the development of psychotherapeutic skills (Bennett-Levy, Reference Bennett-Levy2006; Jennings and Skovholt, Reference Jennings and Skovholt1999; Prasko et al., Reference Prasko, Ociskova, Abeltina, Krone, Kantor, Vanek, Slepecky, Minarikova, Monzy, Piliarova and Bite2023; Rønnestad and Skovholt, Reference Rønnestad and Skovholt2003; Schön, Reference Schön1983; Thwaites et al., Reference Thwaites, Bennett-Levy, Davis and Chaddock2014).
Bennett-Levy et al. (Reference Bennett-Levy, Thwaites, Chaddock, Davis, Dallos and Stedmon2009b; p. 121) define self-reflection as a ‘specific form’ of reflection, in which the focus is directed towards one’s own thoughts, emotions and behaviours or personal history. Applied to the clinical context, this means that self-reflection enables the therapist to pay attention to his or her own internal processes and behaviours during therapy sessions, and to analyse them in a way that leads to an increased understanding of the therapeutic process (Griffith and Frieden, Reference Griffith and Frieden2000). According to Bennett-Levy and Thwaites (Reference Bennett-Levy, Thwaites, Gilbert and Leahy2007), this is especially important when dealing with complex interpersonal interactions, which is why self-reflection is seen as a key element for the development of interpersonal skills and psychotherapeutic skills in general (Bennett-Levy, Reference Bennett-Levy2006).
Self-reflection within training programs for psychotherapeutic skills
The value of self-reflection as a method to improve psychotherapeutic competence has mostly been studied within the framework of broader training programs, especially within self-practice/self-reflection (SP/SR) programs (Bennett-Levy et al., Reference Bennett-Levy, Turner, Beaty, Smith, Paterson and Farmer2001; Bennett-Levy et al., Reference Bennett-Levy, Lee, Travers, Pohlman and Hamernik2003; Scott et al., Reference Scott, Yap, Bunch, Haarhoff, Perry and Bennett-Levy2021). SP/SR is embedded in the framework of cognitive behavioural therapy (CBT) and consists of two main components, a self-practice component, which includes the actual practising of CBT-techniques on oneself (e.g. completing thought records or conducting behavioural experiments), and a self-reflection component, which refers to reflecting on the experiences made during self-practice (Bennett-Levy et al., Reference Bennett-Levy, Lee, Travers, Pohlman and Hamernik2003). Several studies emphasize the beneficial effects of SP/SR as a training method (Chaddock et al., Reference Chaddock, Thwaites, Bennett-Levy and Freeston2014; Chigwedere et al., Reference Chigwedere, Bennett-Levy, Fitzmaurice and Donohoe2020; Gale and Schröder, Reference Gale and Schröder2014; Prasko et al., Reference Prasko, Mozny, Novotny, Slepecky and Vyskocilova2012). For example, Bennett-Levy et al. (Reference Bennett-Levy, Turner, Beaty, Smith, Paterson and Farmer2001) conducted a study in which 19 trainees participated in a one-semester course in CBT, which included SP/SR as a formal course requirement. Using a qualitative approach, the authors found that SP/SR led to a ‘deeper sense of knowing’, that is, a better understanding of the role of the therapist, the cognitive model, and the change process. Other positive effects reported by trainees were improvements in therapeutic self-concept and of self-reported therapeutic skills. In a follow-up study, Bennett-Levy et al. (Reference Bennett-Levy, Lee, Travers, Pohlman and Hamernik2003) found that the positive effects of SP/SR, such as a refinement of specific psychotherapeutic skills, increased attention devoted to the therapeutic relationship, and increased therapeutic flexibility, were reported even 1–5 months after participation in SP/SR activities (Bennett-Levy et al., Reference Bennett-Levy, Lee, Travers, Pohlman and Hamernik2003). Because this follow-up study focused on practising therapists, it supports the notion that SP/SR may be helpful not only at the beginning of one’s career, but also after a certain level of competence has been reached (Thwaites et al., Reference Thwaites, Bennett-Levy, Davis and Chaddock2014).
Video analysis and self-reflection
Given the promising reports of the positive effects of video analysis in psychotherapy and medical training, it seems worthwhile to consider combining self-reflection with video analysis. For example, in their article, Gonsalvez et al. (Reference Gonsalvez, Brockman and Hill2016) report on two specific video techniques in CBT supervision and outline the benefits of using video feedback in supervision. They argue that video feedback allows supervisees to effectively observe and reflect on their competencies and improve potential deficiencies. In this way, they link the use of video with deeper self-reflection to enhance learning. The benefits of analysing videotaped therapy sessions were also demonstrated by Lane and Gottlieb (Reference Lane and Gottlieb2004). They conducted a study of 60 medical students who participated in a training program to improve their interviewing and self-assessment skills. Part of the program included two reviews of videotaped patient encounters, one week apart, with one or two faculty members. The results showed that, from the faculty member perspective, the improvement in students’ overall performance from one video review session to the next was large (d=1.0). In addition, the concordance between students’ and faculty members’ ratings increased significantly from the first to the second measurement point (i.e. within one week). Although this study did not explicitly include self-reflection, the results support the notion that reviewing videotapes of one’s own patient encounters can facilitate skill improvement and possibly enhance the accuracy of self-evaluation (Aafjes-van Doorn et al., Reference Aafjes-van Doorn, Liu and Kamsteeg2022).
By contrast, recalling a therapy session based on memory alone carries the risk of not only forgetfulness (Gonsalvez et al., Reference Gonsalvez, Brockman and Hill2016), but also assessment bias and distortion (e.g. Haggerty and Hilsenroth, Reference Haggerty and Hilsenroth2011; Yourman and Farber, Reference Yourman and Farber1996). These risks could be minimized or even avoided by using video recordings. Another important aspect is that video analysis shows aspects of behaviour of which therapists are not necessarily aware, such as automated non-verbal behaviour (Briggie et al., Reference Briggie, Hilsenroth, Conway, Muran and Jackson2016; Haggerty and Hilsenroth, Reference Haggerty and Hilsenroth2011). According to Haggerty and Hilsenroth (Reference Haggerty and Hilsenroth2011), awareness of one’s own non-verbal behaviour can lead to increased empathy and improved therapeutic alliance.
Nevertheless, it is important to note that not all studies report clear beneficial effects of video analysis on skill development. For example, a recent study by Weck et al. (Reference Weck, Maaß, Paunov, Heinze and Kühne2024) did not find significant differences in psychotherapeutic competence between students who received supervision based on video analysis and students who received supervision based on verbal reports alone. However, this study was not about self-reflection but about feedback from a supervisor. It is quite possible that video analysis may have a different effect in the context of self-reflection.
Self-assessment vs independent assessment of competence
Several studies have shown a low level of agreement between self-assessment and independent assessments of competence. For example, therapists and trainees tend to over-estimate their competence compared with independent observers (Brosan et al., Reference Brosan, Reynolds and Moore2008; Longley et al., Reference Longley, Kästner, Daubmann, Hirschmeier, Strauß and Gumz2023; Parker and Waller, Reference Parker and Waller2015). However, Loades and Myles (Reference Loades and Myles2016) found a significant correlation between the reflective ability of mental health professionals and the level of agreement between self-assessments and independent assessments of competence. Thus, when examining the role of self-reflection in competence development, it seems important to consider both perspectives and to gain greater insight into how self-reflection influences both the therapist’s own judgements and individually perceived development as well as externally observable skills.
Our study
Overall, there is evidence that self-reflection can be an enriching method for the therapeutic development of trainees and experienced therapists. Based on the literature, it also seems plausible that video analysis may even enhance its beneficial effects. However, most studies on the effects of self-reflection rely only on qualitative analysis and/or self-reports of trainee and therapist psychotherapeutic competencies. Accordingly, there is a need to expand research using quantitative methods and independent competence ratings (McGillivray et al., Reference McGillivray, Gurtman, Boganin and Sheen2015).
In this study, we investigated the effect and characteristics of structured self-reflection as a method for practising and enhancing therapeutic skills in a university seminar with undergraduate psychology students, using both quantitative and qualitative methods. Firstly, we investigated whether structured self-reflection is more effective when based on video recordings of therapeutic role-plays than on memory recall alone. Other studies have shown that students with minimal clinical experience can improve their skills after only a few hours of training (e.g. Maaß et al., Reference Maaß, Kühne, Ay-Bryson, Heinze and Weck2024; Weck et al., Reference Weck, Maaß, Paunov, Heinze and Kühne2024), even without feedback, but with other interventions such as modelling (e.g. Kühne et al., Reference Kühne, Heinze, Maaß and Weck2022). In this sense, we believe that in our study, self-reflection combined with other active learning strategies typically associated with significant learning success (role-playing, video analysis; Knox and Hill, Reference Knox, Hill, Barkham, Lutz and Castonguay2021) can lead to skill improvement even without the external input of a trainer. Secondly, we explored the characteristics of the content of students’ written self-reflections. More specifically, we investigated the aspects on which students reflected, whether the content of the self-reflections differed between students who practised video-based structured self-reflection (VSR) and students who practised memory-based structured self-reflection (MSR), and whether the content changed over the course of the seminar. Finally, by examining a focus group discussion that took place at the end of the training, we analysed how the students experienced the self-reflection process and what advantages and disadvantages they perceived. Our research questions were as follows:
-
(1) Do two types of self-reflection differ in terms of their effect on therapeutic competence and therapeutic alliance?
Hypothesis: Students practising video-based self-reflection (VSR) improve their therapeutic skills significantly more than students practicing memory-based self-reflection (MSR), as judged by (a) independent raters (primary outcomes) and (b) student self-reports (secondary outcome).
-
(2) What are the characteristics of students’ self-reflections?
-
(3) Does the content of the self-reflections differ between students practising VSR and students practising MSR?
-
(4) Does the content on which students reflect change during 4 weeks of practice?
-
(5) What are students’ experiences with self-reflection? What advantages and disadvantages do they perceive?
Method
This study is a mixed-method randomized controlled trial (RCT), comparing VSR with MSR in a university seminar for undergraduate psychology students. The main topic of the seminar was interpersonal therapeutic competencies, and it consisted of a 6-week theory phase, followed by a 6-week practice phase. Two independent and trained raters assessed the students’ psychotherapeutic competence and the therapeutic alliance before (pre-assessment; t1) and after the practice phase (post-assessment; t6) based on their performance in videotaped role-plays with standardized patients. During the practice phase (t2–t5), the students carried out peer role-plays and completed a structured self-reflection questionnaire, either based on video (VSR) or based on memory (MSR), and provided a self-assessment of their psychotherapeutic skills. After the post-assessment, there was a guided focus group discussion in which students were asked to report their experiences during the practice phase, and what advantages and disadvantages they perceived (t7).
Ethical approval and pre-registration
The study was pre-registered with the Open Science Framework in June 2021 (https://osf.io/r5wdy). Ethical approval was obtained from the University Ethical Committee (reference number 60/2021). The study conformed to the Declaration of Helsinki, and informed consent was obtained from all participants.
Eligibility criteria
Eligibility criteria for study participation were assessed during the first two seminar sessions: (a) study of psychology (MSc), (b) participation in the seminar, and (c) a signed informed consent. Participation in the seminar was not compulsory for graduation, and participation in the seminar was possible without participation in the study (i.e. provision of study data was voluntary).
Randomization and blinding
We used simple randomization without extra stratification, and a 1:1 allocation ratio to assign participants to the study groups (VSR vs MSR). The presentation of the videotapes to the raters was also randomized to reduce recall bias or halo effects. An independent researcher, who was not involved in the project, programmed all randomization codes using the R package randomizr (RStudio Team, 2015; Ternovski and Coppock, Reference Ternovski and Coppock2017). Study group allocation was disclosed to participants after all had provided signed informed consent. All standardized patients, raters, and data analysts were blinded to the study group.
Participants
Thirty-six students enrolled in the seminar, two of whom did not consent to participate in the study (both from the VSR-group; see Fig. 1). The final sample consisted of N=34 students (32 female) with a mean age of 25.0 years (SD=3.08; VSR: M=24.6, SD=3.44; MSR: M=25.3, SD=2.78; see Table 1). Participants indicated their level of prior helping experiences on eight items (Maaß et al., Reference Maaß, Kühne, Ay-Bryson, Heinze and Weck2024), with a 5-point rating scale (1 = ‘never’ to 5 = ‘at least once a week’; example item: ‘Within the last five years, I completed internships in a consulting or therapeutic setting’). The students’ average level of prior helping experience was moderate (M=2.74, SD=0.57; VSR: M=2.79, SD=0.48; MSR: M=2.70, SD=0.66). The differences in age and prior helping experience between the groups were not significant (age: t 28.89=0.60, p=.551; prior helping experience: t 30.8=–0.45, p=.658).

Figure 1. Participant flow chart.
Table 1. Mean age and prior helping experience of students

VSR, video-based self-reflection; MSR, memory-based self-reflection.
Standardized patients
Two undergraduate psychology students who worked as student assistants in our department acted as standardized patients (SPs) during the pre- and post-assessments (one female: MSc student, 23 years old; one male: BSc student, 21 years old). We ensured that SPs and participants did not know each other. The SPs were prepared for their task by attending a total of 8 hours of training. SP authenticity was evaluated by the independent raters using the Authenticity of Patient Demonstrations Scale (APD; Ay-Bryson et al., Reference Ay-Bryson, Weck and Kühne2022), which includes 10 items on a 4-point rating scale (1 = ‘strongly disagree’ to 4 = ‘strongly agree’; example item: ‘The way of speaking (pitch, tempo) is appropriate in the context of the disorder’). The average authenticity rating across time points and raters was good (M=3.64, SD=0.38, range: 2.80–4.0), and reliability at pre- and post-measurement was excellent (t1: α=.96; t6: α=.97).
Measures
Primary outcomes
The primary outcomes were the students’ psychotherapeutic skills, and the therapeutic alliance as assessed at pre- and post-assessment (t1 and t6) by two independent raters (both female, advanced psychotherapy trainees, clinical experience of at least 2 years). In order to counteract rater drift (i.e. changes in rater performance over time), the common understanding of the rating scales was discussed not only during the rater training (8 hours), but also after the first 18 videos had been rated. The intra-class correlation (ICC) was initially only fair, ICC2,2<.60. We suspect that the low inter-rater reliabilities were a consequence of the difficulties in rating associated with the online setting. We had initially trained our raters to assess the behaviour of therapists in ‘real life’ therapy sessions, i.e. sitting across from their patients in the same room. Due to the COVID-19 pandemic, the practice phase of the seminar had to take place online from the beginning, which meant that the raters could not see a ‘regular’ therapy setting, but only two frontal views of the faces superimposed in separate windows. As a result, they lacked both spatial and non-verbal information (e.g. eye contact and gestures related to one another, such as leaning forward or nodding, posture), which made the situation more ambiguous and possibly made it more difficult to reach a consistent judgement. It was then decided that the raters should come to a common assessment for a random subsample of k=24 videos within a consensus rating procedure. The final ICCs reported in the section below therefore refer to the average of the consensus and independent ratings. The independent raters used the following standardized rating scales to assess the therapeutic alliance, students’ CBT skills, and communication skills:
Therapeutic alliance
The German Helping Alliance Questionnaire (HAQ; Luborsky, Reference Luborsky1984; German version: Bassler et al., Reference Bassler, Potratz and Krauthauser1995) assesses the collaborative and affective bond between therapist and patient with 11 items (1 = ‘strongly disagree’ to 6 = ‘strongly agree’). We used the independent rater’s version of the HAQ in German, which was developed by Richtberg et al. (Reference Richtberg, Jakob, Höfling and Weck2016). An example item is: ‘I believe the patient is working together with the therapist in a joint effort’. The intra-class correlation in this study was ICC2,2=0.70.
CBT skills
The German version of the Cognitive Therapy Scale (CTS; Weck et al., Reference Weck, Hautzinger, Heidenreich and Stangier2010; Young and Beck, Reference Young and Beck1980) was used to assess competence in delivering CBT skills. Although the seminar focused on interpersonal competencies and the therapeutic relationship, we were also interested in whether the process of self-reflection would influence a broader range of therapeutic skills. Therefore, we included the CTS as an additional measure of CBT skills. The CTS consists of 14 items that each represent a specific CBT skill (e.g. ‘agenda setting’, ‘guided discovery’, ‘resource activation’). Skills are assessed on a 7-point rating scale (0 = ‘poor’, 6 = ‘excellent’). Five items (i.e. Item 7: ‘reviewing previous homework’; Item 11: ‘rationale’; Item 12: ‘selection of appropriate strategies’; Item 13: ‘appropriate implementation of techniques’; Item 14: ‘homework setting’) were excluded because they were considered inappropriate for the specific task in this study (conducting an initial interview with the patient). The intraclass correlation was ICC2,2=0.90.
Communication skills
The Clinical Communication Skills Scale – Short Version (CCSS-S; Maaß et al., Reference Maaß, Kühne, Heinze, Ay-Bryson and Weck2022) assesses professional clinical communication skills with 14 items (1 = ‘extremely inadequate’ to 4 = ‘extremely adequate’). A sample item is: ‘The therapist summarizes interim results’. If any particular item cannot be judged, the option ‘not applicable’ can be used. The intraclass correlation in this study was ICC2,2=0.74.
Secondary outcomes
The secondary outcomes include both qualitative as well as quantitative data. The qualitative data refers to the written self-reflections. The quantitative data represents self-ratings of students’ CBT skills during practice (t2–t5).
CBT skills
As part of the self-reflections during the practice phase (t2–t5), students judged their own CBT skills by rating the following three items from the CTS: ‘interpersonal effectivity’, ‘feedback’, and ‘focusing on key cognitions and behaviors’ (α=.50 at the first measurement point).
Role-plays
In all role-plays, the students’ task was to conduct an initial interview with a patient with the aim of exploring the patient’s main problem. However, there were two types of role-plays: with a SP and with a peer. The role-plays with an SP were conducted at pre- and post-assessment, were videotaped, lasted 20 minutes, and formed the basis for the independent raters’ assessment of the primary outcomes. The peer role-plays were conducted during the practice phase, lasted 15 minutes, and formed the basis for analysing the secondary outcomes (written self-reflections and self-assessments of competence), as students completed a self-reflection questionnaire and provided self-assessments after each peer role-play. From practice session to practice session, students alternated between the roles of therapist and patient in their peer role-plays. In the VSR group, peer role-plays were videotaped, whereas in the MSR group, they were not.
Self-reflection
The structured self-reflection questionnaire mentioned above included the following aspects: (1) own behaviour (e.g. empathy, collaboration, appreciation), (2) SP behaviour during the session (e.g. shy, dominant), (3) own reaction to SP (e.g. own body language, facial expression, feelings, and thoughts; e.g. ‘nervous’, ‘stern look’), and (4) suggestions for improvement. Students were asked to write individual words or short sentences in response to the questions. In addition, they were asked to rate their own performance on three items of the CTS (‘interpersonal effectivity’, ‘feedback’, and ‘focusing on key cognitions and behaviors’; see secondary outcomes).
Procedure
The undergraduate seminar focused on interpersonal skills and the therapeutic relationship. It was part of a module called ‘Prevention and Treatment’ in a Master’s program at our university specializing in clinical psychology, psychotherapy, and counselling psychology. The aim of the seminar was to combine scientific theoretical input with hands-on practice, thus promoting a close link between research and practice. During a theory phase, students familiarized themselves with current research on the therapeutic relationship. During a following practice phase, they developed their interpersonal skills through self-reflection in practical role-plays.
Theoretical input
During a 6-week theory phase, students and teachers discussed current evidence on the therapeutic relationship, and students gave presentations on different aspects of the topic, which were later graded by the teachers. Topics included the measurement of therapeutic alliance, strategies for establishing a strong therapeutic alliance, CBT and psychodynamic approaches, and alliance ruptures. Theoretical models were presented, and current research was discussed.
Practice with peer role-plays
Students were then familiarized with the procedure for the following practice phase, which consisted of four weekly practice sessions. Each practice session included two peer role-plays and structured self-reflection. There were different role scripts for the patient role for each practice session. This means that, since there were two peer role-plays per practice session, students used the same standardized therapy situation for two consecutive peer role-plays in one session, and a different situation for the next session.
Students remained in the same role (i.e. ‘therapist’ or ‘patient’) during one practice session, but switched roles during the next practice session one week later. This means that if a student played the ‘therapist’ in two peer role-plays during the first practice session, he or she would play the ‘patient’ in the next session. Then, he or she would play the ‘therapist’ again in the third session and the ‘patient’ again in the fourth session. Ultimately, each student acted as both the ‘therapist’ and the ‘patient’ in two practice sessions, each of which included two peer role-plays. Thus, each student played each role four times. After the first peer role-play, ‘therapists’ in the VSR group watched the videotape of their role-play and completed a structured self-reflection questionnaire. ‘Therapists’ in the MSR group completed the same questionnaire. This was done from memory, as their sessions were not video recorded. The time frame for the self-reflection unit in both groups was 20 minutes (i.e. students in the VSR group were not given extra time to watch the video). After a short break, students repeated their peer role-play (15 minutes of peer role-play followed by 20 minutes of self-reflection).
Pre- and post-assessment
The pre- and post-assessment role-plays were scheduled outside of the seminar time and took place before and after the practice phase.
Group discussion
During the last session of the seminar, a guided group discussion was conducted in each study group and recorded on audiotape (t7; Fig. 1). The discussion was guided by the following questions: (1) what went well with the self-reflection?, what was difficulty?; (2) what did you learn?; (3) what are the main advantages of self-reflection?; (4) what are the main disadvantages?; and (5) were there any side-effects?, i.e. did it hurt you in any way?
Data analysis
All statistical analyses were performed using RStudio (RStudio Team, 2015).
Sample size
We expected a moderate effect size for the difference between the two study groups. The a priori power analysis for the repeated measures ANOVA resulted in a total sample size of N=34 (G*power; Faul et al., Reference Faul, Erdfelder, Buchner and Lang2009; power = 0.80, d=0.50, alpha = .05).
Changes in therapeutic skills and therapeutic alliance
To answer research question (RQ) 1 and to test the corresponding hypothesis of VSR superiority, we conducted two analyses. First, we analysed the primary outcomes with a repeated measures MANOVA (i.e. independent ratings of the CTS, CCSS-S, and HAQ from pre- to post-assessment, t1 vs t6). The modified ANOVA-Type Statistic (MATS) from the R package MANOVA.RM (RStudio Team, 2015; Friedrich et al., Reference Friedich, Konietschke and Pauly2023) was used to interpret the results. We chose MATS because we had distribution violations in the data, and MATS is less sensitive to distribution violations (Friedrich and Pauly, Reference Friedrich and Pauly2018). Secondly, we analysed the secondary outcome with a repeated measures ANOVA (i.e. students’ self-assessments during the practice phase, t2–t5). In both analyses, the group variable (i.e. VSR=1 vs MSR=0) was the independent variable. Post hoc comparisons between measurement points were performed using t-tests with Bonferroni-Holm correction. Generalized eta2 (η2 G>.01 = small effect, η2 G>.05 = moderate effect, η2 G>.14 = large effect) and Cohen’s d (d<0.50 = small effect, 0.50<d<0.80 = moderate effect, d>0.80 = large effect) were used to interpret the effect sizes (Cohen, Reference Cohen1988).
There were no missing data for the primary outcomes (i.e. rater-based data). For the secondary outcome (i.e. students’ self-ratings), missing values on individual items (three in total) were imputed using the student’s mean score on the remaining items of the CTS. If all of a student’s values were missing at a given measurement point, the student was excluded from the analysis of the corresponding measurement point. This was the case for four questionnaires.
Written self-reflections
To answer RQs 2 to 4 (characteristics, group differences, and changes over time of the self-reflections), we analysed the written self-reflections from each peer role-play, using qualitative content analysis and frequency analyses, and compared the content and frequency of each topic in the self-reflections, both between groups (VSR vs MSR) and over time (t2 to t5). The qualitative content analysis was performed using MAXQDA 2020 (VERBI Software, 2019) and following an inductive-deductive procedure (Kuckartz and Rädiker, Reference Kuckartz and Rädiker2019). This means that the main content categories were pre-determined by the structure of the self-reflection questionnaire, while the subcategories were derived inductively, based on the specific content of the individual text segments. We were particularly interested in students’ reflections on their ‘own behaviour’, as this is the category most closely related to observable performance and competence. Therefore, we will focus on analyses of the ‘own behaviour’ category. A preliminary system of subcategories of ‘own behaviour’ was created and discussed by two researchers after having analysed three self-reflection sheets. Inter-rater agreement was then calculated from another 14 randomly selected self-reflection sheets (κ=.74, 11% of data material). Of course, narrowing the focus in this way and analysing only the ‘own behaviour’ category has its drawbacks (e.g. limiting the breadth of responses to our qualitative research questions). At the same time, the seminar focused on improving students’ (observable) skills, and students’ thoughts about this aspect are best reflected in their descriptions of their external behaviour. Therefore, to follow a clear thematic line, we decided to focus on the category of ‘own behaviour’ and leave out the other categories.
Students’ experiences with self-reflection and their perceived advantages and disadvantages
To answer RQ 5, we analysed the key points of the focus group discussions using a procedure based on Braun and Clark’s (Reference Braun and Clarke2006) instructions for thematic analysis. Prior to the analysis, both discussions were transcribed using MAXQDA 2020 (VERBI Software, 2019). All analysis steps (familiarization with the data, generation of codes, collation of codes into themes and subthemes) were carried out by the first author, and applied across the two study groups, rather than separately for each group. Again, an inductive-deductive approach was used, as the main themes were pre-determined by the questions that guided the discussion (‘advantages’, ‘disadvantages’, ‘learning progress’), and the subthemes were deduced inductively from the data itself.
Deviations from pre-registration
Due to the COVID-19 pandemic, we had to switch to an online setting from the fourth week of the seminar. This meant that all role-plays were conducted using Zoom software (Zoom, 2021). As an additional analysis, we have included in this paper the quantitative examination of students’ self-assessments on the three items of the CTS, which was not previously planned. Also, instead of applying qualitative content analysis to the transcripts of the group discussions, we analysed them using thematic analysis, as this enabled a more flexible approach to the data.
Results
The descriptive statistics for the primary and secondary outcomes are displayed in Tables 2 and 3.
Table 2. Descriptive statistics of the primary outcomes (independent ratings of therapeutic skills and alliance)

VSR, video-based self-reflection; MSR, memory-based self-reflection; HAQ, Helping Alliance Questionnaire; CTS, Cognitive Therapy Scale; CCSS-S, Clinical Communication Skills Scale – Short Version.
Table 3. Descriptive statistics of the secondary outcome (students’ self-rated CBT skills)

VSR, video-based self-reflection; MSR, memory-based self-reflection; RP, role-play.
Changes in therapeutic skills and alliance
Primary outcomes
The repeated measures MANOVA for the primary outcomes (i.e. independent ratings of CTS, CCSS-S, HAQ) yielded no significant differences for the main effects of time (pre- vs post-assessment, MATS=0.85, p=.582), or group (VSR vs MSR; MATS=2.20, p=.551), or for the time × group interaction (MATS=0.54, p=.741).
Secondary outcomes
The repeated measures ANOVA for the secondary outcome (i.e. students’ self-ratings of the CTS items during the practice phase) revealed a significant time effect, F 1.92,53.77=9.94, p<.01, η2 G=0.12, but no significant group effect, F 1,28=1.62, p=.214, η2 G=0.03, and no significant time × group interaction, F 1.92,53.77=3.13, p=.054, η2 G=0.04 (see Fig. 2). Post hoc comparisons are displayed in Table 4. All students self-reported improved CBT skills from the first to the fourth peer role-play (t2–t5, d=0.68), and within a practice session, that is, from the first to the second peer role-play (t2–t3, d=1.16), and from the third to the fourth peer role-play (t4–t5, d=0.77). In addition, there was a significant decrease in skill performance from the second to the third role-play (t3–t4), that is, from one session to the next with a 2-week break in between (see Fig. 2).

Figure 2. Development of students’ self-assessed skills during training.
Table 4. Post hoc comparisons between measurement points for the secondary outcome (students’ self-rated CBT skills)

Written self-reflections
Characteristics of the written self-reflections
Students reflected on their behaviour regarding six categories, which were mentioned with the following frequencies relative to the total number of comments in the dataset: (1) empathy (reflections on validation, empathy, and understanding, 30%), (2) appreciation (reflections on praise, respect, and appreciation of the patient, 23%), (3) collaboration (reflections on cooperation, support, and collaboration, 10%), (4) encouragement (reflections on encouragement and conveying confidence and hope, 11%), (5) normalization (reflections on normalization and eradicating the fear of being crazy, 1%) and (6) communication (reflections on structuring the conversation, summarizing and exploring, 23%). We divided each category into a ‘positive’ subcategory, including all text segments that emphasized positive aspects of one’s own behaviour, and a ‘negative’ subcategory, including all text segments that emphasized negative aspects. Frequency analysis revealed that, across both groups and across all measurement points, students reflected more on positive aspects of their behaviours than on the negative ones (79% vs 21%, see Table 5).
Table 5. Frequencies of the individual subcategories of ‘own behaviour’

n refers to the absolute number of text elements within the whole dataset.
Differences between the two study groups
In addition, we examined whether the content of the self-reflections differed between the VSR and the MSR groups. Our analyses showed that almost all of the subcategories derived in the first step had similar frequency distributions in both study groups, with discrepancies occurring only in the categories of appreciation and collaboration. Specifically, the frequency distributions in both study groups were as follows: empathy: 30% (VSR and MSR), appreciation: 19% (VSR) vs 27% (MSR), collaboration: 15% (VSR) vs 6% (MSR), encouragement: 11% (VSR) vs 12% (MSR), normalization: 2% (VSR) vs 1% (MSR), communication: 23% (VSR) vs 24% (MSR; see Supplementary material). However, it was noticeable that the self-reflections of the students from the MSR group tended to be more extensive, as indicated by a greater number of text segments coming from the MSR group than from the VSR group [n=464 (MSR) vs n=398 (VSR)]. We also compared the two study groups in terms of the ratio of positive to negative reflections, and found that the MSR group reflected more on positive aspects than the VSR group, VSR: N positive=275 (74%), N negative=98 (26%); MSR: N positive=366 (84%), N negative=72 (16%).
Changes over time
Finally, we examined whether the frequencies of the subcategories changed over time. Except for the fact that the frequency of the communication category decreased considerably from the first to the fourth role-play (29% vs 19%) and the collaboration category increased by 6% from the first to the fourth role-play (7% vs 13%), this was not the case (see Fig. 3). In terms of the ratio of positive to negative reflections, the proportion of positive reflections increased over the course of the practice phase in both study groups, as shown in Fig. 4.

Figure 3. Frequencies of the individual subcategories of ‘own behavior’ over time.

Figure 4. Frequencies of positive and negative reflections of ‘own behavior’ over time.
Experience of self-reflection and perceptions of advantages and disadvantages
The thematic analysis of the focus group discussions provided deeper insights into student perceptions of the advantages, disadvantages, and learning progress associated with self-reflection. With regard to the main theme of ‘advantages’, students mainly mentioned the ‘helpful aspects of the self-reflection sheet’ (subtheme 1) and of the ‘video analysis’ (subtheme 2). Regarding the self-reflection sheet, they appreciated the fact that it gave them structure and drew their attention to aspects that would otherwise have been overlooked, such as facial expressions and posture. Students also felt that they became more comfortable answering the questions as the practice phase progressed. Students in the VSR group found it helpful that the videos enabled them to perceive immediate (non-verbal) reactions (e.g. facial expressions). With regard to the main theme of ‘disadvantages’, many arguments concerned ‘difficulties associated with the self-reflection sheet’ (subtheme 1), e.g. a lack of clarity of the questions and lack of an open-ended response field for free reflection. Students also pointed out the ‘difficulties associated with video analysis’ (subtheme 2), which some of them found demanding in the short time available. Other disadvantages related to ‘difficulties associated with other structural aspects of the practice phase’ (subtheme 3), i.e. the lack of external feedback and the brevity of the self-reflection units, which was perceived as overwhelming. Students also pointed out the brevity of the practice phase as a whole as a major disadvantage. Students in the VSR group indicated that the self-reflection process generally increased their awareness of their own weaknesses. With regard to the main theme of ‘learning progress’, students reported improvements in self-evaluation, structuring a therapy session, and dealing with difficult situations.
Discussion
The aim of this study was to investigate whether video-based structured self-reflection (VSR) and memory-based structured self-reflection (MSR) differed in terms of their effect on students’ psychotherapeutic competence and on the therapeutic alliance. Contrary to our expectations, there were no significant differences between the two study groups. This means that VSR was not more effective than MSR in improving students’ CBT and communication skills and the therapeutic alliance. In fact, according to the judgement of the independent raters, structured self-reflection did not improve students’ skills and the alliance from pre- to post-assessment at all. There were also no differences between the two study groups in terms of students’ self-assessed skills during the practice phase, but students reported significant improvements in their skills from one role-play to the next and from the first to the last role-play. The qualitative analysis of the written self-reflections showed that students reflected on various aspects of their behaviour during the standardized therapy sessions, such as empathy, appreciation, collaboration, encouragement, normalization and communication. Both study groups focused significantly more on the positive aspects of their behaviour than on the negative aspects, with the proportion of positive reflections actually increasing over the course of the practice phase. Finally, the analysis of the focus group discussions showed that the students perceived both advantages and disadvantages of the self-reflection practice, particularly with regard to the self-reflection sheet and the video analysis. Important difficulties from the students’ perspective were the time constraints (both for the individual self-reflection units and for the practice phase as a whole) and the lack of external feedback. Students in the VSR group, but not the MSR group, indicated that the self-reflection process increased their awareness of their own weaknesses.
Several possible explanations exist for the lack of differences between MSR and VSR in the quantitative outcomes. Firstly, in order to keep the time factor constant, we did not give the students in the VSR group extra time to watch the video, which in turn may have created time pressure and excessive demands that prevented the students from benefiting from the video analysis. This is reflected not only in the arguments that occurred in the group discussion, where the students in the VSR group mentioned that they had great difficulty in combining self-reflection with video analysis within the given time frame, but also in the written self-reflections, where the number of reflections in the MSR group exceeded that of the VSR group. Secondly, the students were not given any instructions on how to analyse the video. It is possible that a brief introduction to important aspects to pay attention to when watching the video would have helped the students to benefit from the video analysis. However, despite the non-significant results regarding the quantitative outcomes, the qualitative results of our study provide evidence that VSR offers advantages that MSR does not. Although the two study groups reflected on similar content, the tendency to focus on the positive aspects of one’s behaviour was less pronounced in the VSR group, indicating that students in the VSR group appeared to have a more critical view of their own performance. This suggests that watching videotapes of one’s own performance could help to focus on one’s weaknesses and thus recognize more easily what still needs to be improved. This is consistent with the results of the group discussion, where students in the VSR group noted that the videos allowed them to see immediate non-verbal reactions and that the self-reflection process generally made them more aware of their weaknesses.
Looking at the effect of self-reflection in general, regardless of the study group, the current findings regarding the independent raters’ perspective are somewhat at odds with the literature, which emphasizes the positive impact of self-reflection on professional (and personal) development (e.g. Bennett-Levy et al., Reference Bennett-Levy, Turner, Beaty, Smith, Paterson and Farmer2001; Bennett-Levy et al., Reference Bennett-Levy, Lee, Travers, Pohlman and Hamernik2003; Chigwedere et al., Reference Chigwedere, Bennett-Levy, Fitzmaurice and Donohoe2020; Scott et al., Reference Scott, Yap, Bunch, Haarhoff, Perry and Bennett-Levy2021). One explanation for the discrepancy between the findings may be that methodological aspects of our study design may have limited the benefits of the self-reflection practice for the students. For example, the structured self-reflection sheet may have been too long and extensive for a 20-minute self-reflection unit. This is also revealed in the feedback from the students in the focus group discussions, who pointed out the time limitations during the self-reflection units. Furthermore, the students only received instructions during the theoretical phase of the seminar and were not given further guidance during the following practice phase. They also reported in the group discussions that they missed external feedback. This is consistent with the recommendations of Collard (Reference Collard2024), who advocates combining SP/SR with expert feedback. In line with this, Henrich et al. (Reference Henrich, Glombiewski and Scholten2023) conclude in their review that additional supervision or guidance by an expert has a significant incremental benefit on competence development. This suggests that external feedback is a necessary factor in the development of psychotherapeutic competence, and that the lack of it may have been an important reason for no significant increase in externally rated competence among the students in our study.
Another important aspect may be that our participants had too little practical experience to really engage in self-reflection in a way that would benefit them. According to Bennett-Levy’s (Reference Bennett-Levy2006) three-systems (DPR) model of therapist skill development, the reflective system compares past, present, or future experiences with existing knowledge, and adjusts the existing information if necessary. It is considered the central system through which experienced therapists refine their skills, as they no longer need to learn skills from scratch, but must apply existing knowledge to new contexts. With this in mind, it seems possible that self-reflection is most effective when a certain level of competence already exists, which then needs to be refined and adapted to different contexts. At the beginning of a career, there may not be enough experience for the reflective system to draw on, which could reduce the impact of self-reflection.
Also, when attempting to explain the non-significant results on the primary outcomes, it is important to note that the literature has mainly looked at self-reflection in combination with self-practice components and has not isolated the effect of self-reflection. Furthermore, it has mainly focused on self-reports of psychotherapeutic skills. In our study, too, the students themselves noticed a significant improvement in CBT skills from one role-play to the next within one practice session and from the first to the last role-play. On the one hand, this finding is not surprising, given that the same standardized therapy situation was repeated twice within a practice session. Other studies also point to a training effect, especially in training sessions in which difficult situations can be repeated (Kühne et al., Reference Kühne, Heinze, Maaß and Weck2022; Maaß et al., Reference Maaß, Kühne, Ay-Bryson, Heinze and Weck2024; Weck et al., Reference Weck, Maaß, Paunov, Heinze and Kühne2024). On the other hand, the discrepancies in judgements between the independent observers and the students are consistent with the general finding in the literature that the two perspectives often differ. One interpretation is that the self-perceived progress indicates a positive self-assessment bias, i.e. an over-estimation of one’s own abilities relative to the judgement of independent observers. As mentioned in the introduction, such a bias is well documented in the literature (Longley et al., Reference Longley, Kästner, Daubmann, Hirschmeier, Strauß and Gumz2023; Probst et al., Reference Probst, Humer, Jesser and Pieh2022; Walfish et al., Reference Walfish, McAlister, O’Donnell and Lambert2012). Another explanation might be that the self-reported progress indicates an increase in self-awareness and self-confidence rather than objectively measurable skills. This improvement may be a direct result of the self-reflective task completed by all students, regardless of their study group. From this perspective, the results are consistent with the qualitative responses as well as the literature on SP/SR, which shows that self-reflection is often associated with increased self-awareness (e.g. Bennett-Levy, Reference Bennett-Levy2019). Although such improvements are inconsistent with observers’ ratings of students’ skills, they remain an important milestone in therapist development (Pieterse et al., Reference Pieterse, Lee, Ritmeester and Collins2013). However, there was a decline in skill performance from the second to the third role-play (t3–t4). We hypothesize that this decline was due to the 2-week break for the therapist role and the associated change in the therapy scenario, suggesting that students found they could not maintain the level of competence they had achieved in the first practice session and were unable to generalize their learning experiences to the new context of the second practice session 2 weeks later. This is consistent with the notion that learning is often highly context-specific and requires extra effort and strategies to transfer new insights and skills to new situations (Butler et al., Reference Butler, Black-Maier, Raley and Marsh2017; Rivière et al., Reference Rivière, Jaffrelot, Jouquan and Chiniara2019). For our study, this means that explicit practice of different scenarios would likely have been needed to maintain the perceived positive effects of self-reflection and to further strengthen the development of self-awareness and self-confidence that began during the first practice session.
The qualitative analysis of the written self-reflections shows that the students focused mainly on the positive aspects of their performance. Furthermore, 74% of the comments in the written self-reflections belong to subcategories that reflect skills that are important for building a sustainable therapeutic relationship and motivating the patient (e.g. empathy, appreciation, collaboration, encouragement). This is certainly due in large part to the specific task in the role-play (conducting an initial interview) and the focus of the seminar (therapeutic relationship and interpersonal skills), but it may also indicate that students do not pay attention to other skills (e.g. guided discovery, identifying dysfunctional thoughts, etc.). In this context, it may be relevant to note that the frequency of the communication subcategory decreased by 10% from the first to the last role-play. This subcategory entailed more structuring skills (e.g. summarizing the conversation) compared with relationship-related subcategories, and seems to be one of the areas of competence that students tend to lose track of unless their attention is explicitly drawn to it, and with which they may still be struggling. Overall, however, the themes in the self-reflections did not change considerably over time, but the focus on strengths became more entrenched and the basic tone became more and more positive. Explicit feedback and guidance would probably have been needed to counteract the tendency to over-emphasize strengths over weaknesses.
Strengths and limitations
This RCT contributes to research in the field of therapist development by examining the effects of structured self-reflection on competence, using not only qualitative, but also quantitative methods, and relying not only on self-ratings of competence but also on competence ratings by independent observers. In addition, it took place in the naturalistic setting of a university seminar, which facilitates converting conclusions into practice, and it has isolated the effect of self-reflection, in contrast to other studies examining multi-component training programs. This approach enhances our understanding of the effects of individual training elements and may help develop effective training programs. However, there are several limitations to be aware of. First of all, our sample consisted of undergraduate students with little therapeutic experience. While examining competence at all levels (novice, advanced, expert, etc.) may help establish benchmarks (see, e.g. Hill and Knox, Reference Hill, Knox, Castonguay and Hill2023) for therapist competence, it is plausible that the lack of experience of our participants may have limited the potential benefits of self-reflection, and that with more experienced therapists, the self-reflection practice in our study may have resulted in greater changes in competence. With regard to our study design, it is important to note that all the role-plays, including the pre- and post-assessments, had to take place online, which led to difficulties with the independent competence ratings. The self-ratings of the students during the practice phase consisted of an average value of three items of the CTS, which resulted in a low measurement reliability. Furthermore, as already mentioned above, the self-reflection sheet we used may have been too extensive, and we should perhaps have either reduced the number of guiding questions or provided more time for the single self-reflection units. The time aspect also refers to the duration of the practice phase in total. Four weeks of practice with four role-plays in the therapist role may not have been sufficient to improve skills significantly. Finally, the generalizability of our findings to training programs is limited because our program is likely to be significantly different from most other training programs due to the lack of external input from a trainer.
Implications for practice and future research
As mentioned above, self-reflection in general, and video-based self-reflection in particular, must probably meet certain requirements in order to be effective in visibly improving psychotherapeutic skills at the beginning of a therapist’s professional career. Most importantly, self-reflection may need to be accompanied by external input and guidance from a trainer. This includes individualized competency feedback and, with respect to the self-reflection process, specific instructions on what exactly to reflect on, how to analyse video recordings, how to focus specifically on negative aspects of one’s performance, and how to draw conclusions and suggestions from this. In addition, the focus of self-reflection should be narrowed and directed to specific individual aspects. It is possible that some participants in our study were overwhelmed by the range of competencies to which they had to pay attention. Finally, our results support the notion that learning self-reflection is time-consuming (Bennett-Levy et al., Reference Bennett-Levy, Turner, Beaty, Smith, Paterson and Farmer2001; Spafford and Haarhoff, Reference Spafford and Haarhoff2015). A series of multiple practice sessions is probably necessary for self-reflection to have beneficial effects, with self-reflection units ideally lasting more than 20 minutes. Together with the finding that a trainer is needed for effective self-reflection, this leads to the conclusion that self-reflection is not as time- and resource-efficient as one might initially think.
It is important to note that our results do not allow conclusions about the validity of the DPR model, as we examined SR as an isolated skill. According to the DPR model, an integration of all three systems (declarative, procedural, and reflective) is necessary to promote change. For novice therapists with limited experience, engaging in self-reflection alone may not be sufficient to promote psychotherapeutic skills. Therefore, in addition to written self-reflections, the students in our study may have needed the sharing of theoretical knowledge and direct instruction on how to practice certain skills.
What can be learned from our study for future research is that future studies should combine self-reflection with theoretical input and direct feedback and instructions in order to investigate the effect on competence development. For example, to determine whether self-reflection has an incremental benefit on students’ development of psychotherapeutic competence, future studies could compare training consisting of feedback and self-reflection with training consisting of feedback alone. In addition, studies should be designed with longer time frames for the self-reflection units, as well as studies that examine not only changes in skills but also changes in self-awareness and self-confidence as primary outcomes, using not only qualitative but also quantitative measures [e.g. the Counselor Activity Self-Efficacy Scales-Revised (CASES-R; Hahn et al., Reference Hahn, Weck, Witthöft and Kühne2021)]. Certainly, larger samples should be used, and competence development should be evaluated monitoring changes over time to investigate the sustainability of potential training effects. Finally, the effects of different methods of promoting self-reflection should be compared between samples of inexperienced trainees and licensed therapists, and open formats for self-reflection should be evaluated.
Conclusion
In summary, the results of this RCT suggest that VSR, as it was conducted in our study, was not more effective than MSR in improving students’ therapeutic skills. Doubts have also been raised about whether self-reflection alone, without further guidance and feedback, can visibly improve students’ psychotherapeutic competencies. However, it is important to note that students reported progress in their perceived skills (quantitative measures) and mainly reported positive aspects of their behaviour in their self-reflections (qualitative measures), suggesting a potential increase in self-awareness. This leads to the conclusion that well-prepared self-reflection, combined with more instruction and external feedback, could increase students’ psychotherapeutic competence.
Key practice points
-
(1) For self-reflection to be effective, students may need to be extensively prepared and trained (e.g. on important aspects to reflect on, how to analyse video recordings, and how to focus specifically on negative aspects of one’s performance).
-
(2) It may be necessary to guide the self-reflection process and provide performance feedback.
-
(3) Self-reflection can quickly become overwhelming for students, and it is therefore necessary that they take sufficient time to adequately process and reflect.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S1754470X25100287
Data availability statement
This study design and its analysis were pre-registered at the Open Science Framework (7 October 2021). All analysis codes are available at https://osf.io/r5wdy/. The datasets are available from the corresponding author on request.
Acknowledgements
We thank our student assistants for supporting the study and Brian Bloch (University of Münster, Germany) for his English language editing of the article.
Author contributions
Klara Eisert: Data curation (equal), Formal analysis (equal), Writing - original draft (lead), Writing - review & editing (lead); Franziska Kühne: Conceptualization (equal), Funding acquisition (equal), Investigation (equal), Methodology (equal), Project administration (equal), Writing - review & editing (equal); Florian Weck: Resources (equal), Writing - review & editing (equal); Anna Schimmrigk: Formal analysis (supporting); Ulrike Maaß: Conceptualization (equal), Data curation (equal), Formal analysis (equal), Funding acquisition (equal), Investigation (equal), Methodology (equal), Project administration (equal), Supervision (lead), Writing - review & editing (equal).
Financial support
U.M. and F.K. received funding for this study from a program of the University of Potsdam for innovative teaching projects (grant number: 2021-03-15).
Competing interests
The authors declare none.
Ethical standards
Ethical approval was obtained from the University Ethical Committee (reference number 60/2021). The study conformed to the Declaration of Helsinki, and informed consent was obtained from all participants.
Comments
No Comments have been published for this article.