Nomenclature
- AI
- 
Artificial Intelligence 
- APU
- 
auxiliary power unit 
- B738
- 
Boeing 737-800 
- CASA
- 
Civil Aviation Safety Authority (Australia) 
- CBT
- 
competency-based training 
- EBT
- 
evidence-based training 
- FCOM
- 
Flight Crew Operating Manual 
- HMD
- 
head-mounted display 
- ICAO
- 
International Civil Aviation Organization 
- MCC
- 
multi-crew cooperation 
- SET
- 
simulation for experiential training 
- SQLR
- 
systematic quantitative literature review 
- VE
- 
virtual environment 
- VR
- 
virtual reality 
- VRFS
- 
virtual reality flight simulator 
1.0 Introduction
Since the advent of aviation, flight simulators have been an integral and essential part of flight crew training [Reference Hays, Jacobs, Prince and Salas1]. Virtual reality (VR) is increasingly being used for gaming and has provided multiple opportunities in industries such as education, fashion, healthcare and tourism [Reference Cross, Boag-Hodgson and Mavin2, Reference Tadeja, Seshadri and Kristensson3]. With recent developments of immersive technologies, virtual reality flight simulators (VRFSs) are increasingly attracting attention (e.g. Ref. [4]) since they offer significant advantages over traditional training methods including their versatility, mobility, increased throughput and reduced size and cost [Reference Cross, Boag-Hodgson, Ryley, Mavin and Potter5]. In addition, the effectiveness of immersive training supports improved psychomotor performance, knowledge acquisition, engagement and spatial ability when individuals or teams are trained with VR [Reference Abich, Parker, Murphy and Eudy6].
Although the feasibility of VRFSs for flight training has been demonstrated (e.g. Refs. [Reference Cross, Boag-Hodgson and Mavin2, Reference Aslandere, Dreyer and Pankratz7–Reference Tran, Behrend, Funning and Arango9]), it is limited to single-user simulators. Most commercial pilot operations employ at least two pilots, and their collaboration is paramount in any multi-crew operation [Reference Hamman, Kanki, Helmreich and Anca10–Reference Salas, Fowlkes, Stout, Milanovich and Prince12]. Since evidence-based training (EBT) is a global safety improvement initiative that is being introduced for multi-crew professional pilot training [13], the efficacy of collaborative VRFS employing EBT must be researched.
The overarching aim of this paper is therefore to explore the feasibility of incorporating EBT in a multi-crewed VRFS using the application of professional pilot training competencies. While competencies will be observed against standards prescribed by the International Civil Aviation Organization (ICAO) [13], this study’s focus lies in assessing the viability of a VRFS for EBT while also offering a comparative analysis against a non-immersive environment, namely a desktop simulator. The study does not aim to gauge proficiency in specific competencies; rather, it employs ICAO competencies as a benchmark to ascertain the feasibility of employing a VRFS for EBT purposes.
To assess competencies such as teamwork, it is important to ensure collaboration between two pilots in a VRFS, and that the system provides adequate fidelity and usability to complete the required tasks. To further align the research to commercial pilot operations, the study simulated a multi-crewed Boeing 737-800 (B738) aircraft with exercises aligned to that of a typical initial First Officer airline training programme. For comparison, competencies were measured in a virtual environment (VE) and a desktop simulator.
2.0 Literature review
2.1 Collaboration
Multi-crew cooperation (MCC) is the collaboration, or teamwork, required by pilots to operate in a multi-crewed aircraft. Operationalisations of the terms collaboration and teamwork are often blurred, especially when describing the interactions of pilots on a flight deck. Although the difference is subtle, this paper will predominantly use the term collaboration, since this research is focused on investigating the competencies of participants working together in a VE to achieve a shared goal.
This ambiguity has led to many interpretations of collaboration, but a generic definition put forward by Ref. [Reference Terveen14] that has withstood time is that collaboration is ‘a process in which two or more agents work together to achieve a shared goal’ (p. 67). Therefore, the flight crew can be considered as two or more pilots who are responsible for individual tasks, yet they must collaborate to operate an aircraft efficiently. Since aviation is safety-critical, the shared goal exceeds the constituent parts, analogically to that of Gestalt theory which emphasises that the whole of anything is greater than its parts [Reference Geremek, Greenlee, Magnussen, Geremek, Greenlee and Magnussen15]. When working within a team, collaborators bring their individual expertise to achieve a shared goal [Reference Salas, Sims and Burke16]. Without the individuals willing to share their expertise through collaboration, achieving the shared goal may not be possible. Therefore, the collaborative team is not merely a summation of individuals, but a ‘distinguishable set of two or more people who interact dynamically, interdependently and adaptively toward a common valued goal/object/mission, who have each been assigned specific roles or functions to perform and who have a limited life span of membership’ (Ref. [Reference Salas, Dickinson, Converse, Tannenbaum, Swezey and Salas17], p. 4). More recent research has focused on teamwork and the importance of achieving the shared goal. Teamwork is subsequently defined as a set of interrelated thoughts, actions, and feelings of each team member that are needed to function as a team and that combine to facilitate coordinated, adaptive performance and task objectives resulting in value-added outcomes (e.g. Refs [Reference Salas, Sims and Burke16, Reference Crawford and Lepine18]).
2.2 Event driven training
It is essential to measure the suitability and transferability of the skills gained in simulators for real-world flight operations. Traditional training sessions have focused on the application of manoeuvres and procedures through repeated exposure to known emergencies based on the evidence of hull losses from early generation jets [13]. The assumption was that to mitigate a risk, simply repeating an event in a training programme was sufficient. Over time, many new events occurred and the subsequent addition of these events to the training requirements saturated recurrent pilot training programs and created an inventory or ‘tick box’ approach to training [13]. Regulations also dictate that pilot training be standardised and scenarios are therefore relatively predictable. Not surprisingly, trainees became aware of this list of events, and subsequently rehearsed them, which led to an inability to recognise and subsequently react in a suitable manner to variations of those events when they unexpectedly occurred in real life. Skills taught in this manner are ‘brittle’ as opposed to adaptive and transfer well to predictable situations like tests, but they may not be sufficient in emergency situations, which are typically novel and unexpected [Reference Landman, van Oorschot, van Paassen, Groen, Bronkhorst and Mulder19].
There are many examples where simulator-taught skills were inadequate and led to fatal accidents, including the notorious case of Air France 447, an Airbus A330 with 228 occupants that fatally crashed into the Atlantic Ocean on 1st June 2009. The cause was attributed to the crew, who failed to recognise that the aircraft had stalled and subsequently made inappropriate control inputs that destabilised the flight path even though stall recovery was possible [20]. Although the crew had practiced stall recovery on numerous occasions in the airline’s fixed-base simulator, one explanation for the inappropriate control input is that the training and testing for these situations has become a highly predictable routine with pilots often aware of what to expect [Reference Casner, Geven and Williams21–Reference White, Padfield, Lu, Advani and Potter23].
2.3 Evidence-based training and pilot competencies
Event-driven simulator training is currently being superseded by the concept of competency-based learning called evidence-based training (EBT) [13]. Although EBT and assessment is still based on operational data, it is characterised by developing and assessing the overall capability of a trainee across a range of core competencies, rather than by measuring their performance in individual events or manoeuvres. Competencies are a combination of knowledge, skills and attitudes required to perform a task to the prescribed standard. EBT is designed to ensure trainees possess the required competencies rather than demonstrate the minimum skills required for an event. An industry-wide consensus led to the implementation of EBT in order to reduce aircraft accident rates after reviewing existing training for airline pilots [13]. Therefore, the goal of an EBT program is to identify, develop and assess the competencies required by pilots to operate safely, effectively and efficiently in a commercial air transport environment by managing the most relevant threats and errors based on evidence collected in operations and training.
Modern aviation systems are highly reliable, but also complicated. It is therefore impossible to foresee all plausible accident scenarios. EBT addresses this by moving from pure event-based training to prioritising the development and assessment of key competencies [13]. A recent enhancement of EBT is the introduction of simulation for experiential training (SET) which provides a more focused framework than operational and simulator data [Reference Cameron, Dahlström and Kennedy24]. Mastering a finite number of competencies should allow a pilot to manage unforeseen situations in flight and for which the pilot has not been specifically trained. EBT and SET therefore avoid rote-memorised skills which may not enable pilots to identify and respond in an appropriate manner during a real flight [Reference Casner, Geven and Williams21], potentially avoiding accidents such as Air France 447. However, there has been criticism of competency-based training (CBT) and assessment methods, which is a key component of modern EBT. For example, Ref. [Reference Franks, Hay and Mavin25] propose that while CBT may be used appropriately for initial development of physical flying skills for ab initio level pilots, its application is limited in areas of training which require complex decision-making and critical judgement because it fails to comprehensively address the full range of training requirements.
Since the publication of a systematic quantitative literature review (SQLR) by Ref. [Reference Cross, Boag-Hodgson, Ryley, Mavin and Potter5], there has been little empirical evidence collected on the use of any type of VRFSs. Evidence of multi-crewed VRFSs is limited to Ref. [Reference Cross and Boag-Hodgson26] who demonstrated the fidelity and useability of a collaborative VRFS, as well as reduced participant workload (effort) and heightened team situational awareness in the VE compared to a desktop flight simulator. Therefore, a similar methodology will be utilised in this study. There is no known literature on VRFS employing EBT, although, for example [Reference Dapica, Hernández and Peinado27], provide a proof of concept for gamification of flight instructor learning in EBT scenarios.
2.4 The importance of presence in a VRFS
A large proportion of the immersive experiments examined in the SQLR were only partially successful due to issues with the technology that led to a lack of presence, which in turn created an unrealistic user experience. Therefore presence, the adopted definition of which is a feeling of ‘being there’, is essential for successful simulation training in a VE. Composed of immersion and involvement factors, presence was successfully demonstrated in a single-pilot VRFS [Reference Cross, Boag-Hodgson and Mavin2], and most critical to this study is that presence can be demonstrated in a multi-crewed VRFS.
3.0 Method
3.1 Measuring collaboration
Since there is no consensus on what constitutes collaboration, various survey instruments are used to assess collaborative dimensions. To determine appropriate measurement tools, a conceptual model must be applied that is relevant to the team task. A ‘generic’ teamwork model was suggested by Ref. [Reference Salas, Sims and Burke16] that, regardless of the team task that is being examined, has components that are found in almost all teamwork taxonomies. Their ‘Big Five’ model is composed of the components team leadership, mutual performance monitoring, backup behaviour, adaptability and team orientation. However, Ref. [Reference Salas, Sims and Burke16] also concur with Ref. [Reference Dyer, Muckler, Neal and Strother28] in that teamwork is dynamic and its manifestation can vary based on many variables, including the team environment, type of task, individual differences and perceived workload. Therefore, to fully understand team performance, academics typically stipulate that it is insufficient to take a single snapshot of team performance such as a post-experiment questionnaire (e.g. Refs [Reference Harrison, Mohammed, McGrath, Florey and Vanderstoep29, Reference Marks, Mathieu and Zaccaro30]). Instead, performance should be sampled during a variety of conditions and situations [Reference Salas, Sims and Burke16].
Measuring collaboration requires a conceptual consistency between the method employed and the theory explored in the research context [Reference Valentine, Nembhard and Edmondson31]. Furthermore, methods for assessing teamwork skills generally involve the use of observer-based rating scales, which often are specific to the task at hand [Reference Wright, Segall, Hobbs, Phillips-Bute, Maynard and Taekman32]. Rating scales have the distinct advantage of continuous assessment throughout the task, although they are subject to rater bias (discussed later). Specifically for aviation, behavioural markers are commonly used in training and crew resource management [33, Reference Flin and Martin34]. The behavioural markers are descriptions of observable, non-technical behaviours that are present in teams or individuals. Therefore, this study adopted the various recommendations suggested in the literature by using live teamwork observation coupled with post-experiment analysis to measure collaboration as follows:
- 
• Observation of the pertinent behavioural indicators such as leadership and teamwork during the experiment 
- 
• Post-experiment participant feedback on how well they worked together 
- 
• Successful completion of a scenario, which necessitated collaboration 
3.2 Evidence-based training behavioural competencies
EBT was assessed against the core-competency framework proposed by ICAO [35]. The eight competencies and their associated descriptions are shown in Table 1. The table also includes a sample of the behavioural indicators/observables that were observed, a full list of which can be found in Ref. [13, p. II-App 1-1].
Table 1. The eight pilot competencies observed in this study

3.3 Measuring participant feedback
It is important to understand from the participant’s perspective how user-friendly and lifelike the experiment interfaces were. It is also valuable to obtain participant feedback on how well they feel they worked together and addressed the competencies. For this purpose, a post-experiment debrief was utilised. The qualitative experiential aspect of this study therefore focused on participant interpretations of their experiences, as discussed by Ref. [Reference Braun and Clarke36]. This was achieved using semi-structured reflective debriefs to elicit detailed descriptions of participant experiences during the study.
The debriefs were led by the researcher and undertaken with both participants simultaneously upon completing the experiment and lasted between 10 and 20 mins with each participant pair. All debriefs were audio-recorded and then subjected to thematic analysis. In addition, the researcher took notes during the debriefs. The structure of the debrief composed of asking participants if they knew each other, discussing previous flight, simulator and VRFS experience, their experience during the experiment in the VE or real-world, and other pertinent questions.
As part of a qualitative research paradigm, thematic analysis of the reflective debriefs was used to generate the key descriptive themes in the participant’s narrative using the six-phase guidelines and checklist described by Ref. [Reference Braun and Clarke36]. The first phase involved transcribing each reflective debrief, which included initial familiarisation and examining the breadth and depth of the data. This was performed by reading all transcriptions in a single pass, and non-relevant narrative was removed at this point. For example, in one instance, a participant excused himself momentarily from the debrief. In his absence, the conversation with the remaining participant was not related to the study and was deleted from the transcript. When the participant returned, the reflective debrief resumed, and the transcript continued.
The second phase involved creating initial thematic codes. These codes were based on meanings and patterns observed in the transcript. During this phase, a semantic approach was employed, in that the codes were identified within the explicit or surface meanings of the transcript, and the analysis did not look for anything beyond what had been transcribed. Codes were independently verified by a second researcher. As discussed by Ref. [Reference Bazeley37], all codes were emergent since the reflective debrief structure composed of non-leading and unbiased prompts. Each individual transcript was analysed in its entirety before moving to the next one.
The third phase involved selecting which excerpts to code, still employing a semantic approach, underpinned by an essentialist paradigm [Reference Braun and Clarke36]. This was achieved by reading through the transcripts, identifying interesting excerpts and applying appropriate codes to them. Excerpts with the same meaning had the same code applied to them. New codes were added as necessary. The codes were generated inductively, allowing them to be driven from the transcript rather than trying to fit it into a pre-existing coding frame or the researcher’s analytic preconceptions. The fourth phase was to collate all excerpts associated with a particular code. This allowed all the excerpts for a given code to be analysed and compared against each other. Some adjustment and revision of the codes was made during this phase. During the fifth phase, all the excerpts associated with a particular code were grouped together into a theme. The sixth phase revealed the themes discovered by the thematic analysis and is discussed in the Results section.
In terms of presence, Ref. [Reference Cross, Boag-Hodgson and Mavin2] employed a quantitative approach to measure the feeling of ‘being there’. However, to reduce the number of measures imposed on participants, this study adopted a qualitative approach in which participants were invited to discuss their feeling of ‘being there’ in the semi-structured reflective debriefs (as discussed by Ref. [Reference Cross, Boag-Hodgson and Mavin2]).
3.4 Scenario development and competency observation
Scenarios were developed to ensure that high levels of interdependence were incorporated so that the ‘Big Five’ teamwork processes [Reference Salas, Sims and Burke16] would apply. It was also necessary to create scenarios sufficiently challenging to demonstrate the competencies prescribed by ICAO, yet not overly complicated given the limited flying experience and aviation knowledge of participants. The concepts of pilot flying (PF) and pilot not flying (PNF) roles were utilised in this study and implemented as follows:
- 
• The PF occupied the left seat. The PF operated the flight controls of the aircraft and was responsible for all the activities which directly affected flight path management (i.e. taxiing and flying the aircraft). The PF was also responsible for confirming the actions of the PNF. 
- 
• The PNF occupied the right seat. The PNF monitored the course of the flight and was responsible for reading and actioning the checklists, navigation (on the ground and in the air), all radio communication, landing gear and flap operation and generally assisting the PF as necessary. 
The scenarios in this study were based on a typical initial First Officer type rating and the ICAO Evidence-Based Training Implementation Guide [38]. A type rating assumes a level of flying proficiency and aviation knowledge. This study was not concerned with a participant’s flying proficiency or aviation knowledge, but rather the exploration of EBT in a multi-crewed VRFS. Therefore, this study did not obligate the assumptions of the type rating. In addition, the lack of professional pilots in this study did not hinder the concepts that were measured, which are independent of operational experience.
Participants only received sufficient instruction to adequately complete each phase of their allotted scenario. For example, aircraft systems (electrical, pneumatic, hydraulic, fuel) and the flight management system were not explained, but basic operational use of the autopilot and autoland was provided. This, along with the supporting materials, provided sufficient information to enable all participants to complete their designated scenario. In addition, some allowance was made for the lack of professional pilot experience among participants during measurement of the ICAO behavioural indicators/observables. For example, participants were not expected to adhere to standard radiotelephone phraseology or manage the flight path with any degree of accuracy. In addition, although participants were observed against each ICAO competency and the same level of scrutiny consistently applied by the same researcher across the whole experiment, they were not experienced commercial pilots, so it was not expected that they could complete the ICAO competency to the same standard as commercial pilots.
3.4.1 The type rating
Preparation for a type rating comprises theoretical and simulator training components that cover areas such as aircraft systems, flight procedures, aircraft handling and crew cooperation for a particular ‘type’ (make and model) of aircraft [39, 40]. Since these programmes can be several weeks in duration, for realism and potential transferability, this study duplicated two typical simulator sessions, one an exercise involving aircraft handling and normal procedures, and the other an exercise involving non-normal and emergency procedures. These two simulator sessions will be referred to as scenarios.
3.4.2 Scenario competencies
Both scenarios involved the operation of a B738, which is categorised as a third-generation jet by ICAO [38]. Therefore, the scenarios were designed with reference to the training modules for third-generation jets in this manual and the ICAO Manual of Evidence-Based Training [13]. A scenario typically involves all eight ICAO competencies to a greater or lesser extent. However, the two scenarios in this study were designed to focus on the specific competencies shown in Table 2. Participants were required to perform all actions specified in the scenario. A summary of the scenarios is as follows:
- 
• Scenario 1 (aircraft handling and normal procedures): based at Heathrow Airport during daylight, in a heavy rainstorm, and involved pushback, taxi, takeoff, a circuit, landing and parking. 
- 
• Scenario 2 (non-normal and emergency procedures): based at Sydney Airport during daylight in CAVOK (i.e. fair weather), and involved starting the Auxiliary Power Unit (APU), pushback, starting the main engines and APU shutdown, taxi and managing an aborted takeoff due to an engine fire. 
Table 2. Scenario competencies

To increase realism, various static aircraft were present in the scenario, and also eight operational aircraft controlled by artificial intelligence (AI).
3.5 Description of the system
Two computers were set up alongside each other, both running the X-Plane Flight Simulator. The left computer was for the PF (as viewed from behind the PF, or from the flight deck door) and the right computer for the PNF. The two computers were networked and set up as ‘master’ (for the PF) and ‘slave’ (for the PNF) within X-Plane to allow independent views of the same simulation; that is, the PF controlled their own view, and the PNF controlled their own view. See Fig. 1.
The PF controlled the aircraft using a yoke, throttle quadrant and rudder pedals connected to the master computer. The PNF operated the cockpit switches, buttons and levers using a mouse, also connected to the master computer. Any simulation flight data (e.g. location, speed, altitude) or movement of switches was fed from the master to the slave computer. This enabled both PF and PNF to be in the same simulation and cockpit environment. A researcher sat behind both the PF and the PNF and could monitor their screens, and also follow the simulation on a tablet.
Three configurations of the real-world and the VE were used in the study, as shown in Table 3, and Figs 2, 3 and 4. Participants immersed in the ‘Mixed VR/Desktop’ configuration can be seen in Fig. 5.
3.6 Materials developed for the study
Substantial supporting materials were developed for the study, summarised below:
- 
• Participant briefing PowerPoint slides used pre-experiment 
- 
• Flight crew operating manual (FCOM), based on a typical commercial airline’s B738 FCOM, although significantly simplified 
- 
• Scenario guides (objectives and instructions), one for each for scenario 
- 
• Apron and aerodrome ground movement charts, obtained for both Heathrow and Sydney airports 
- 
• Checklists, based on a real B738 checklist 
In any configuration that involved the real-world, participants had access to hardcopy documentation including the checklists. In any configuration that involved the VE, participants were restricted to virtual checklists and reliant on the researcher to provide additional guidance where necessary (discussed later).
Table 3. Virtual environment and real-world configurations


Figure 1. Hardware configuration and data flow.

Figure 2. ‘Dual Desktop’ configuration.

Figure 3. ‘Mixed VR/Desktop’ configuration.

Figure 4. ‘Dual VR’ configuration.

Figure 5. Participants in the ‘Mixed VR/Desktop’ configuration.
Furthermore, Excel-based electronic competency observation sheets were generated for both scenarios, intended for the researcher’s utilisation. While the primary focus of this study wasn’t on assessing participants’ competency levels in specific areas, but rather on evaluating the feasibility of a VRFS for EBT, these sheets offer insights into the effectiveness of various competencies within the system. The sheets were based on those presented by ICAO (Ref. [38], pp. 40–45 & p. 87). The spreadsheets were aligned to the scenario guides, and broken down into phase of flight (e.g. Taxi, Takeoff) since the breakdown is seen as advantageous [Reference Helmreich, Foushee, Wiener, Kanki and Helmreich11]. Within each phase of flight, a list of the ICAO competencies being observed was presented. An example is given in Fig. 6. Note that it was not appropriate to complete every ICAO competency for both the PF and the PNF for every phase of flight. For example, the PF’s responsibility was to taxi the aircraft and was subsequently observed against the ICAO competency Aircraft Flight Path Management – Manual Control (AFPM – Man) during taxi. Since the PNF was not required to taxi the aircraft, they were not assessed against this ICAO competency.

Figure 6. An extract of the competency observation spreadsheet.
3.7 Experimental procedure
A within-participants design was utilised which involved four stages: pairing and scenario assignment, pre-simulation, flight simulation and post-simulation, as follows:
3.7.1 i. Pairing and scenario assignment
In the first stage, participants were randomly paired together, and within the pair, randomly assigned to act as the PF or the PNF. Participant-pairs were assigned to either the first or the second scenario and allocated to one of the three configurations in a cyclic fashion.
3.7.2 ii. Pre-simulation
In the second stage, participant-pairs were asked to read an information sheet and provide their consent. They then completed a pre-simulation demographic survey, after which they were given a short presentation which was completed at a spare workstation to allow practice with the flight controls. Participants indicated familiarity with the simulator and head-mounted display (HMD) (if used) by demonstrating fluid and proficient actions i.e. participants demonstrated confidence with the use of the controls (yoke, throttles, rudders, mouse, etc.) and the flight simulator environment. There were no measures taken at this stage, but adequate familiarity was deemed to have occurred after about 10 mins, which established a baseline of experience. This reduced confounds or unrelated performance errors due to the effects of familiarity, apprehension or novelty bias.
3.7.3 iii. Flight simulation
In the third stage, participant pairs were taken to the two flight simulator computers where use of the flight controls was demonstrated once again, and the scenario objectives were restated. If the PNF was operating in any configuration that involved the real-world (see Figs 2 and 3), that participant was provided with the necessary materials (e.g. scenario guide, checklists, charts and paper/pen). If either or both of the PF and/or the PNF were assigned to operate in any configuration that involved the VE and use VR (see Figs 3 and 4), they donned the HMD and were shown how to adjust the strap fittings and the interpupillary distance. Participants were advised to stop the simulation if they felt any simulator sickness and then asked to complete their designated scenario.
During the experiment, the researcher acted as air traffic control and provided some cues, although unnecessary interference was avoided, and conformed to ICAO instructor training guidelines (Ref. [13], p. I–7-2, para. 7.4.3). The researcher also took observational notes and completed the appropriate competency observation sheet (see Fig. 6) by recording grades against each ICAO competency for all phases of the scenario’s flight. Marking was performed consistent with the competency grades (range 1 – low to 5 – high) as described by ICAO (Ref. [13], p. 150). Where any low or high scores were recorded, these were annotated, as were general observations, for use in the reflective debrief.
A concern for ensuring the quality of any system for rating pilots’ behaviour is the reliability of the raters’ judgements [Reference Flin and Martin34]. A degree of bias or systematic error can be expected in any performance rating task, arising from personal interpretation, scale use and biases due to motivation. However, such errors are largely mitigated by using the same rater (in the case of this study, the researcher) across the whole experiment who consistently applied the same level of marking as described by ICAO (Ref. [13], p. 150). The researcher was previously employed as a commercial check pilot and is proficient in this process.
3.7.4 iv. Post-simulation
In the fourth stage immediately after completion of the scenario, participant pairs were asked to take part in the reflective debrief, driven by the predefined structure and researcher annotations compiled during the experiment.

Figure 7. Number of participants familiar with desktop simulators and number of participants familiar with virtual reality flight simulators.

Figure 8. The last time participants used desktop simulators and the last time participants used virtual reality flight simulators.
4.0 Results
4.1 Participants
After ethical clearance was obtained from Griffith University, data was collected from 24 participants sourced from the Griffith University aviation programmes and the local flight school utilised by the university for flight training. One participant already had some experience as a commercial pilot. The participants composed of males (n = 21, 88%) and females (n = 3, 12%) between the age of 18 and 42 (M = 21.1, SD = 4.8). Post-experiment debriefs revealed that most participant-pairs did not know each other before the study (8 pairs; n = 16, 67%).
Familiarity with desktop simulators and VRFSs is shown in Fig. 7, and the most recent time desktop simulators and VRFSs were used by each participant is shown in Fig. 8. Most participants had some actual flying experience (n = 21, 88%) in light, single-engine aircraft, and only one participant had multi-engine turbine experience. Familiarity with a B738 and MCC is shown in Fig. 9. The B738 familiarity was revealed in the post-experiment debrief to be recreational simulator usage, that is, no participants had any actual B738 experience.

Figure 9. The number of participants familiar with the B738 and the number of participants familiar multi-crew cooperation.
As determined by the participant pairing, there was an even distribution of participants across the two scenarios and the three configurations. In total, 12 participants acted as PF and 12 acted as PNF. Also, in total, 12 participants operated in any configuration that involved the real-world, and 12 operated in any configuration that involved the VE. Although this study did not measure simulator sickness, no participants reported any sickness symptoms. Even though the sample size was small, which could lead to more type II errors (i.e. an increase in false negatives), it was deemed appropriate for the aims of the study.
4.2 Pilot competencies
Given that this study aimed to evaluate the feasibility of a VRFS for EBT rather than assessing participants’ specific competencies, statistical analysis on individual competency observation data was not conducted. Moreover, participants generally lacked professional experience in these competencies, and the small sample size would hinder meaningful interpretation (i.e. four participant-pairs in each configuration). Instead, competency observation served to determine which aspects of scenarios worked well and steer post-experiment debriefs.
In order to complete a scenario, participants had to utilise all the competencies outlined in Table 2 to some degree. Given that all participant pairs successfully completed their designated scenarios, the study illustrates the efficacy of a VRFS in eliciting professional pilot training competencies, although they may not be to industry standards. As anticipated, certain competencies were more effectively demonstrated in one scenario compared to the other. For instance, Scenario 2 (Sydney) provided optimal conditions for observing competencies such as Application of Procedures, Communication and Leadership and Teamwork, given the involvement of relatively intricate checklists during engine and APU start procedures. Similar to how EBT highlights common problem areas to airline management during pilot training and assessment, incorporating pilot competencies in this study helped identify which aspects of the scenarios worked well for participants and which did not.
4.3 Reflective debriefs and observations
Scenario 1 (Heathrow) required more researcher guidance than scenario 2 (Sydney) because it involved some flying and a landing, and participants had very little B738 operating experience. Similarly, design limitations imposed some restrictions in the VE, including restrictive checklists (items had to be presented one at a time), inability to take notes and the unavailability of scenario guides and charts. Although this meant that more researcher guidance was given to participants in the VE, there was no evidence in the audio recordings to suggest that this affected their ability to complete the scenario. In addition, the guidance provided was limited in nature (i.e. to specific B738 operational issues and navigation), and as such did not interfere with the researcher’s objective observations.
The themes discovered by the thematic analysis and researcher observations are discussed in the following sections.
4.3.1 Leadership, teamwork and collaboration
In both of the ‘Dual Desktop’ and ‘Dual VR’ configurations, participants reported, and were observed, to engage in a constant flow of communication with one another, worked well together as a cohesive team, and checklist actions flowed from one item to the next. However, in the ‘Mixed VR/Desktop’ configuration, the synergy was less apparent. There appeared to be a lack of unison in this mixed environment, and participants – the majority of whom had not met before the experiment – appeared to bond less and there was greater miscommunication.
The following two narratives demonstrate the differences between participant’s self-observed synergy in a ‘dual’ environment compared to a ‘mixed’ environment:

On two occasions, when participant pairs were using the ‘Mixed VR/Desktop’ configuration, it was observed that the VE participant tended to take the lead in the scenario (e.g. took control of the taxiing navigation or dealing with the engine fire), irrespective of their PF or PNF status and even if their role did not warrant this. A possible explanation for the VE participants taking the lead is that they felt more involved in the scenario and experienced more situational awareness than their real-world co-pilot [Reference Cross and Boag-Hodgson26].
4.3.2 Presence, immersion, involvement and situational awareness
The debriefs revealed positive feedback on presence (i.e. ‘being there’) for participants immersed in the VE, especially for the ‘Dual VR’ configuration where both participants utilised VR. Participants using VR reported being immersed in the VE, whereas those operating in the real-world reported a feeling of not being involved in the flight.
Participants in the VE stated that they were very aware of the surroundings in the simulation environment. For example, in the first scenario (Heathrow, in a heavy rainstorm) some participants (n = 3, 13%) stated that they could almost ‘feel’ the rain around them, which was made more intense with the noise of the rain ‘drumming’ against the aircraft windshield. Real-world participants had an opposite experience, with participants recalling an unidentified background noise, and observing the rain on the monitor/screen in front of them which tended to diminish their experience due to blurring. It can therefore be concluded that the soundscape provided by the VE contributed to a sense of presence.
In the following narrative, participants have just settled themselves at the beginning of the scenario:

Participants reported physiological responses to the critical event in Scenario 2 (engine fire) and their reactions were observed to be more intense when immersed in the VE as opposed to the desktop simulator. These findings are synonymous to those found by Ref. [Reference Cross, Boag-Hodgson and Mavin2]. The same conclusion can be made from the observations in this study, which is that VEs give rise to a sense of presence due to involvement and immersion factors, and instigate stress during critical events.
In the following narrative, the participants are discussing the critical event. The fire alarm for the engine fire – which is a bell and a red flashing ‘master caution’ light – immediately gets the attention of the participant immersed in the VE, whereas the real-world participant doesn’t initially associate the bell with the flight simulator:

4.3.3 Communication
Generally, communication improved as participant pairs progressed through the scenario, becoming more familiar with both the procedures and each other. However, the ‘Mixed VR/Desktop’ configuration generated the least desirable outcomes in terms of communication. Some participants (n = 5, 21%) that used VR commented that initially it was ‘odd’ not being able to see the other participant or see their hands and follow which switches/buttons their co-pilot was accessing. However, some of these participants (n = 3, 13%) also added that they didn’t feel that it interfered with communication since it caused them to focus more on the semantics of communication, while others (n = 2, 8%) reported that they initially felt disconnected from their co-pilot and required constant reassurance that the other participant was sat next to them.
In the following narrative, the participants are discussing their experience in the VE:

Two international participants in the ‘Dual VR’ configuration, whose first language was not English (and different to each other), reported that they had to concentrate hard on the words used in communication, both as the transmitter and the receiver. Like most participants, they were observed to adopt a closed communication loop to overcome the disconnect felt in the VE.
4.3.4 Non-verbal communication
Non-verbal communication includes deictic gestures and body language. Observed deictic gestures predominantly consisted of pointing, although some instances of giving and reaching were also seen in the ‘Dual Desktop’ configuration. Body language observed included body shifts, raised eyebrows, lip pursing, pouting and shoulder shrugs. It is beyond the scope of this paper to interpret the exact meaning of such body language, but in the context they were made, they can broadly be likened to frustration, disagreement or confusion. Pointing was predominantly observed in the ‘Dual Desktop’ configuration and consisted of participants indicating their own screen, the other participant’s screen or a hardcopy document (e.g. a chart, or scenario guide). While working through the checklists, some participant pairs (n = 4, 17%) tended to point and share the same big screen, even though both participants had their own screen, and only one of them had control of the view on the big screen.
Pointing was observed in the ‘Mixed VR/Desktop’ and the ‘Dual VR’ configurations, as demonstrated by the following narrative, where the participants are discussing taxiing onto the takeoff runway:

Body language was also observed indicating frustration (shaking of the head, blowing, fist clenching). Such episodes sometimes included pointing out of frustration, even though one or both participants were immersed in the VE. The most observed pointing was while working through checklists in the VE, especially when accessing the B738 overhead panel.
In the following narrative, the participants are discussing the engine start and using the checklists:

4.3.5 Locating switches and checklist usage
A software design limitation caused four of the switches between the slave computer and the master computer not being mapped correctly. This issue was identified pre-experiment which allowed participants to be forewarned.
Many participants (n = 13, 54%) reported that they felt some level of frustration when one or both of them were in the VE and trying to describe the location of a button or switch. In response to the question, ‘What was hard/difficult/challenging (Why? Describe…)’, the majority of participants (n = 15, 63%) mentioned that it took some time to locate the appropriate switches, especially on the overhead panel. This might also be the case during a real type rating course where pilots would be unfamiliar with the cockpit layout. In this study, only three participants had any familiarity of the B738 (see Fig. 9) which was in a recreational capacity.
In the following narrative, one participant is discussing their difficulty with checklists:

4.3.6 Overall visual experience
Towards the end of the experiment, a PF and PNF pair reported observing different static aircraft (not the AI controlled aircraft). This occurred while they were parking the aircraft when the PF observed an occupied gate while the PNF observed the gate to be vacant. The participants (correctly) concluded that this was a software issue and proceeded to a different gate.
In response to the questions,
‘What was easy (Why? Describe…)’ or ‘Any other things you would like to talk about?’, a few participants who operated in the VE (n = 4, 17%) commented that it was easy to look around which provided a better overall ‘picture’ of the cockpit, the location of various switches and controls, and the outside environment. This is in comparison to participants operating in the real-world and using a traditional desktop simulator in which the mouse (and sometimes the keyboard cursor keys, pre-defined numeric keypad keys or any combination thereof) had to be used to change the view. This type of two-dimensional representation of the environment is less similar to an actual B738 cockpit which does wrap around the crew to ensure controls are readily within reach.
5.0 Discussion
The study demonstrated that EBT and observation of the eight ICAO core pilot competencies (Table 1) can be accomplished in both a multi-crewed VRFS and a multi-crewed desktop simulator. The competencies, coupled with the post-experiment debrief, highlighted the extensive use of deictic gestures. This is a paradigm of Mehrabian’s 7-38-55 rule [Reference Mehrabian41], which states that 7% of meaning is communicated through spoken word, 38% through tone of voice and 55% through body language. Although the limitations of the rule have been subsequently exposed and it is not a measure of cockpit communication, it still serves to emphasise the relative insignificance of the actual words used in normal communication. It was observed that by creating a barrier to the normal communication channel (i.e. donning of the HMDs), non-verbal communication was replaced by verbal communication, the language of which became more precise during the experiment. This finding is synonymous with [Reference Müller, Radle and Reiterer42] who state that ‘Collaborators used significantly less deictic gestures (in a VE) in favour of more unambiguous verbal references’ (para. 1). Therefore, participants recognised the various barriers to communication and compensated by replacing the majority of the 55% of body language with spoken words.
The adoption of precise verbal communication is particularly beneficial since there is currently no commercially available technology that permits collaborators in a VRFS to usefully see each other in a VE, including the use of ‘passthrough’ functionality [Reference Cross and Boag-Hodgson26]. In addition, single-pilot commercial aircraft operations may be introduced in the early 2030s [Reference Harris43], which may ultimately involve physically separated users, and the ability of VRFSs to deliver such remote training would be advantageous.
The scenarios were designed to necessitate collaboration for their successful completion. The fact that all participant pairs managed to fully complete their assigned scenarios in all configurations suggests that the scenarios effectively facilitated collaboration. This implies that users: agreed on the shared goals; planned, allocated responsibility, and coordinated; shared context; communicated; and adapted and learnt [Reference Terveen14].
The study has shown that behaviours related to the ICAO professional pilot training competencies can be invoked within a multi-crewed VRFS and a desktop simulator, indicating its applicability for EBT. The scenarios in this research replicated two typical simulator sessions for initial First Officer type rating training, resulting in varying degrees of competency demonstration across different areas. For example, despite none of the participants having any MCC experience flying a B738, the ICAO competency Leadership/Teamwork (i.e. collaboration) was successfully demonstrated. In a similar manner, task completion also demonstrated that the competencies of Problem Solving and Decision Making, Application of Procedures (i.e. checklists) and Workload Management can successfully be employed in a VE. For example, all participants successfully followed checklist procedures to manage the engine failure (fire) before takeoff (Sydney, Scenario 2) by extinguishing the fire and shutting down the engine.
When contrasting a VRFS with a desktop simulator, this study not only demonstrated the viability of integrating EBT into a multi-crewed VRFS but also highlighted that, once initial communication hurdles were overcome, VRFS fosters a stronger sense of presence. This, in turn, may amplify the benefits of immersive environments mentioned in the introduction, such as enhanced psychomotor skills, knowledge retention and spatial awareness.
Limitations of the study have been discussed in the main body of the text, and include restrictive checklists, inability to take notes and the unavailability of scenario guides and charts in the VE (Section 4.3), switch mapping between the slave computer and the master computer (Section 4.3.5), and the small sample size (Section 4.1).
6.0 Conclusions
The present research contributes to a growing body of evidence suggesting that VRFSs can be used to augment professional pilot training methods, such as computer-based training and full flight simulators. Furthermore, the research has demonstrated that a multi-crewed VRFS using a complex jet, with scenarios aligned to a typical initial First Officer airline training programme, can be used to develop and assess pilot core competencies. However, to ensure the efficacy of VRFSs for professional pilot training, much more research needs to be undertaken, especially in the areas of knowledge acquisition, development of procedures and flying skills, transfer of training and the application of human factors principles. Although this study has demonstrated that a VRFS can elicit the eight-pilot core-competencies, further research needs to be conducted on specific measurement of each ICAO competency during EBT.
Competing interests
The authors declare none.
 
 











