Introduction
Mood is an affective dynamic, which naturally varies across time and contexts (Trull et al. Reference Trull, Lane, Koval and Ebner-Priemer2015). Problems with regulating mood can play a key role in the development and trajectory of a range of psychopathologies (Paris, Reference Paris2004; Crowell et al. Reference Crowell, Beauchaine and Linehan2009; Marwaha et al. Reference Marwaha, Balbuena, Winsper and Bowen2015). Traditionally, mood has been assessed with retrospective measures (Trull et al. Reference Trull, Lane, Koval and Ebner-Priemer2015). This can increase the risk of recall bias subsequently reducing accuracy (Schwartz et al. Reference Schwartz, Neale, Marco, Shiffman and Stone1999; Reid et al. Reference Reid, Kauer, Dudgeon, Sanci, Shrier and Patton2009). The relatively recent use of ecological momentary assessment (EMA) facilitates the real-time assessment of mood by collecting data on multiple occasions throughout the day (Wenze & Miller, Reference Wenze and Miller2010). Thus, it may be more suitable for understanding daily mood changes (Cristobal-Narvaez et al. Reference Cristobal-Narvaez, Sheinbaum, Ballespi, Mitjavila, Myin-Germeys, Kwapil and Barrantes-Vidal2016; Myin-Germeys et al. Reference Myin-Germeys, Klippel, Steinhart and Reininghaus2016; van Knippenberg et al. Reference van Knippenberg, de Vugt, Ponds, Myin-Germeys, van Twillert and Verhey2016).
Various EMA techniques exist, ranging from paper-and-pencil to physiological assessment (Wenze & Miller, Reference Wenze and Miller2010) to digital data collection. A number of UK governmental reports (HM Government, 2011; Department of Health, 2013) highlight the benefits of digital tools and Information and Communications Technology (ICT) in aiding the objective, reliable assessment and care of mental health problems. With demand for mental health services outgrowing available resources (Department of Health, 2013), technology might relieve some of this pressure by providing remote resources that increase access to effective treatment while reducing clinician load.
Applications (‘apps’) offer great promise to young people who are disproportionately affected by mental illness or may struggle to engage with mental health services (Seko et al. Reference Seko, Kidd, Wiljer and McKenzie2014). Apps are delivered in a medium young people are familiar with. Figures from Ofcom (2015) indicate that 90% of youth between the ages of 16 and 24 own a smartphone, regardless of sociodemographic domain. Given this widespread ownership and apparent attachment to mobile technology (Ofcom, 2015), youths might feel more comfortable with assessments and treatments utilising mobile apps.
Mental health services increasingly use apps (Olff, Reference Olff2015), many of which have the capacity for EMA to monitor mood (e.g. Sandstrom et al. Reference Sandstrom, Lathia, Mascolo and Rentfrow2016b ). Several reviews with mainly adult studies (e.g. Donker et al. Reference Donker, Petrie, Proudfoot, Clarke, Birch and Christensen2013; Naslund et al. Reference Naslund, Marsch, McHugo and Bartels2015; Nicholas et al. Reference Nicholas, Larsen, Proudfoot and Christensen2015; Torous & Powell, Reference Torous and Powell2015; Bakker et al. Reference Bakker, Kazantzis, Rickwood and Rickard2016; Faurholt-Jepsen et al. Reference Faurholt-Jepsen, Munkholm, Frost, Bardram and Kessing2016; Walsh et al. Reference Walsh, Golden and Priebe2016) have appraised evidence for the use of mood-monitoring apps.
Studies included in these reviews provide some evidence for the psychometric properties, e.g. internal consistency (Palmier-Claus et al. Reference Palmier-Claus, Ainsworth, Machin, Barrowclough, Dunn, Barkus, Rogers, Wykes, Kapur, Buchan, Salter and Lewis2012) and concurrent validity (Faurholt-Jepsen et al. Reference Faurholt-Jepsen, Frost, Vinberg, Christensen, Bardram and Kessing2014) of these apps. There is also evidence for usability (Bardram et al. Reference Bardram, Frost, Szántó, Faurholt-Jepsen, Vinberg and Kessing2013). Participation rates are generally high across studies sampling adults, ranging from 65% (Depp et al. Reference Depp, Ceglowski, Wang, Yaghouti, Mausbach, Thompson and Granholm2015) to 88% (Ainsworth et al. Reference Ainsworth, Palmier-Claus, Machin, Barrowclough, Dunn, Rogers, Buchan, Barkus, Kapur, Wykes, Hopkins and Lewis2013), though Depp et al. (Reference Depp, Kim, de Dios, Wang and Ceglowski2012) reported much higher completion rates for paper and pencil compared with app measures (82.9% v. 42.1%). Evidence also suggests that apps may help people with mental health problems to monitor triggers (Bardram et al. Reference Bardram, Frost, Szántó, Faurholt-Jepsen, Vinberg and Kessing2013), that the capacity to convey experience can be therapeutic, and that apps could be a useful tool for improving patient–clinician communication (Palmier-Claus et al. Reference Palmier-Claus, Rogers, Ainsworth, Machin, Barrowclough, Laverty, Barkus, Kapur, Wykes and Lewis2013).
Less is known about the use of mental health apps, particularly mood-monitoring apps, in youth (10–24 years). A scoping review by Seko et al. (Reference Seko, Kidd, Wiljer and McKenzie2014) suggested that mood-monitoring apps are positively perceived by youth (Matthews et al. Reference Matthews, Doherty, Coyle, Sharry and Lumsden2008a ), may improve treatment adherence (Matthews et al. Reference Matthews, Doherty, Sharry and Fitzpatrick2008b ) and possibly improve mental wellbeing (Kauer et al. Reference Kauer, Reid, Crooke, Khor, Hearps, Jorm, Sanci and Patton2012). While intriguing, findings were preliminary due to the low quality of available evidence (NCCMH, 2014), the small number of studies on mood-monitoring apps specifically and the limited number of apps studied (n = 2) (NCCMH, 2014; Seko et al. Reference Seko, Kidd, Wiljer and McKenzie2014).
In summary, mood-monitoring apps offer a potentially important step change in the assessment of mood and delivery of youth mental health services. Despite this potential and the widespread advocacy for their use (e.g. Firth et al. Reference Firth, Torous and Yung2016; Sandstrom et al. Reference Sandstrom, Lathia, Mascolo and Rentfrow2016a ), there are no extant reviews examining the psychometric properties, usability and clinical impacts of mood-monitoring apps in young populations. Therefore, a systematic review was completed to address the following research questions: (1) what are the psychometric properties of mobile mood-monitoring apps; (2) what is their usability; (3) and what are their positive and negative clinical impacts among clinical and non-clinical youth populations? Our secondary aims were to frame our findings within the adult literature, and conduct a quality assessment to examine potential sources of bias.
Method
Following a scoping review, the authors developed the protocol delineating the planned methodology. The review was conducted in adherence to this protocol, and in line with the PRISMA statement (Moher et al. Reference Moher, Liberati, Tetzlaff and Altman2009).
Information sources and search strategy
The following sources were searched: Medline, EMBASE, PsycINFO, ProQuest Dissertations & Theses, ProQuest SciTech Collection, the Association for Computing Machinery (ACM) Guide to Computing Literature and Web of Science for articles published from 2008 [the year when the first app was launched (Donker et al. Reference Donker, Petrie, Proudfoot, Clarke, Birch and Christensen2013)]. Search terms were informed by previous reviews (Seko et al. Reference Seko, Kidd, Wiljer and McKenzie2014), and modified following advice from a medical librarian and field experts. The search was conducted by combining five groups of terms (see online Supplementary Table S1) relating to: type of technology (e.g. ‘mhealth’), type of assessment (e.g. ‘ambulatory assessment’), mood-related outcome or problem (e.g. ‘bipolar disorder’), youth population (e.g. ‘youth’), usability/treatment-related outcomes and psychometric properties (e.g. ‘reliability’, ‘validity’). We were interested in all forms of validity potentially examined in the app literature, e.g. concurrent, face or predictive (Faurholt-Jepsen et al. Reference Faurholt-Jepsen, Munkholm, Frost, Bardram and Kessing2016), though we anticipated a paucity of studies due to the novelty of the field. We defined the ‘usability’ of mood-monitoring apps in accordance with the International Organisation for Standardisation (2001) definition of usability, i.e. ‘the capability of the software product to be understood, learned, used and attractive to the user, when used under specified conditions’. Consistent with previous systematic reviews (Donker et al. Reference Donker, Petrie, Proudfoot, Clarke, Birch and Christensen2013), we included young people's participation rates (i.e. compliance, response and completion) and how apps were perceived by youths (including their acceptability – how satisfied they were with the app, whether it could be used with ease) as markers of usability.
MD conducted a hand search of articles published in Cyberpsychology, Behavior and Social Network, the Journal of Medical Internet Research (JMIR), the JMIR Mental Health, and the JMIR mHealth and uHealth over the last 5 years. An additional search of the first 15 pages of Google Scholar was conducted (search terms ‘mood’, ‘phone’, ‘app’ and ‘monitoring’). Reference lists and in-text citations of relevant articles were inspected. Finally, subject experts were approached to identify additional articles.
Study selection
Inclusion criteria were:
-
(1) Apps must have been developed for, and delivered through, mobile phones or smartphones;
-
(2) Participants aged 10–24 years (consistent with the World Health Organisation's definition of young people; World Health Organisation, 1986);
-
(3) Studies included published and unpublished research reported in the grey literature;
-
(4) Studies must have been published in the English language;
-
(5) Studies must have been published in 2008 or later;
-
(6) Studies must have included community or clinical populations (to ensure the inclusion of sub-clinical youth, who may subsequently access care).
Screening procedure
Following removal of duplicates, MD and ML independently screened 100% of titles and abstracts for full-text retrieval. MD assessed full-text articles against the inclusion criteria and extracted relevant data.
Quality assessment
MD evaluated the quality of included studies for potential risk of bias using Cochrane's risk of bias tool, in which studies are allocated a rating of high, low or unclear risk of bias (Higgins et al. Reference Higgins, Altman, Gøtzsche, Jüni, Moher, Oxman, Savović, Schulz, Weeks and Sterne2011).
Data synthesis
Quantitative and qualitative data were synthesised narratively.
Results
Study selection
A total of 1747 articles were identified in the initial search, and 19 from the hand search (Fig. 1). Following removal of duplicates, 1176 abstracts were screened, 86 of which were selected for full-text retrieval. There was a high level of agreement between raters (κ = 0.90). In total, 64 articles were excluded following full-text review. Three additional articles were identified following inspection of included studies. Twenty-five articles were included in the final review.
Study characteristics
Table 1 outlines study methodology, the characteristics and features assessed in the studies, and main findings. Three studies reported on a randomised controlled trial (RCT): one was the primary RCT (Reid et al. Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2011), and two reported secondary analyses with the same dataset (Kauer et al. Reference Kauer, Reid, Crooke, Khor, Hearps, Jorm, Sanci and Patton2012; Reid et al. Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2013). The remaining studies were non-experimental or quasi-experimental. The search identified 19 published studies and six unpublished studies (four conference proceedings; two theses). The majority of studies (n = 16) were quantitative; the remaining nine employed mixed methods.
a The accessibility of mood-monitoring apps was assessed through a search of Google and three app stores (iTunes, Google Play and Microsoft store) in June 2016.
b Please refer to Table 2 for coefficient values.
c These studies utilised the same data.
d These studies utilised the same data.
e These studies utilised the same data.
f These studies partly utilised the same data.
g These studies utilised the same data.
Sample size ranged from 6 to 1 08 996 participants. Eight studies recruited healthy participants. Eleven studies recruited participants from clinical populations including youth with a range of mental health, emotional or behavioural problems, such as depression (n = 8), high-functioning autism/Asperger's disorder (n = 2) and substance or alcohol use (n = 1). The remaining six studies recruited participants from mixed populations comprising healthy, mentally ill or substance-using individuals. Mean ages across studies ranged from 10.95 to 23.7 years.
Methods across studies varied greatly. For example, some studies lent participants a phone, whereas others let participants use their own device. Please see Table 1 for a description of the different data collection methods used in each study. As observed in the adult literature, terminology also varied greatly across studies (please see Usability section for more details).
Various apps were used, the most frequent of which was the ‘Mobiletype’ programme (Reid et al. Reference Reid, Kauer, Dudgeon, Sanci, Shrier and Patton2009). Mood outcomes were either direct mood assessments, or described mood-related constructs or behaviours (e.g. stress, hostility). Outcomes were monitored over variable time periods. The shortest period was 24 h (Bossmann et al. Reference Bossmann, Kanning, Koudela-Hamila, Hey and Ebner-Priemer2013), the longest 326 days (Matthews & Doherty, Reference Matthews and Doherty2011). Monitoring schedules also varied, and could comprise hourly, daily or weekly monitoring, or requirements to complete measures a fixed number of times per day (with or without pre-specified time intervals). Reimbursements or incentives were available in 18 studies (e.g. payments, gift vouchers).
Psychometric properties of mood-monitoring apps
Nine studies reported on the reliability or validity of mood-monitoring apps.
Reliability
The internal consistency (correlation between items within a scale) was assessed in four studies (Dunton et al. Reference Dunton, Liao, Intille, Wolch and Pentz2011, Reference Dunton, Huh, Leventhal, Riggs, Hedeker, Spruijt-Metz and Pentz2014; Huh et al. Reference Huh, Shin, Leventhal, Spruijt-Metz, Abramova, Cerrada, Hedeker and Dunton2014; Ansell et al. Reference Ansell, Laws, Roche and Sinha2015). As demonstrated in Table 2, levels ranged from questionable to excellent (George & Mallery, Reference George and Mallery2003).
Note: O, Overall, WS, within-subject level, BS, between-subject level. Internal consistency coefficients values interpretation: ‘>0.9 – excellent, >0.8 – good, >0.7 – acceptable, >0.6 – questionable, >0.5 – poor and <0.5 – unacceptable’ (George & Mallery, Reference George and Mallery2003, pp. 231).
Validity
Concurrent validity
Three studies examined concurrent validity (the correlation between an assessment and a previously validated assessment of the same construct). Concurrent validity was mostly moderate across studies (see Table 1). Khor et al. (Reference Khor, Gray, Reid and Melvin2014a ) compared relationships between participant and parent-reported data from the retrospective Responses to Stress Questionnaire (Connor-Smith et al. Reference Connor-Smith, Compas, Wadsworth, Thomsen and Saltzman2000) and mobile app data recording participants’ responses to stress. In two studies of university students, Ben-Zeev et al. (Reference Ben-Zeev, Scherer, Wang, Xie and Campbell2015) and Wang et al. (Reference Wang, Chen, Chen, Li, Harari, Tignor, Zhou, Ben-Zeev and Campbell2014) compared momentary app and retrospective questionnaire data on perceived stress.
Face validity
Two studies described participants’ views on the face validity of the ‘Mobiletype’ app (see Table 1 for numerical details). Reid et al. (Reference Reid, Kauer, Khor, Hearps, Sanci, Kennedy and Patton2012), using a sample with various mental health problems, found that the app was relatively successful in capturing participants’ feelings and current situation. Khor et al. (Reference Khor, Gray, Reid and Melvin2014a ), using a sample with high-functioning autism and Asperger's found that the app was not quite as successful in these domains. In both studies, the apps were less successful in capturing participants’ thoughts.
Usability of mood-monitoring apps
Participation rates
Twenty-one studies examined participation rates, which ranged from 30% to 99%. Average percentages were not computed in four studies. Instead, these studies described the mean number of diary entries per participant (Bossmann et al. Reference Bossmann, Kanning, Koudela-Hamila, Hey and Ebner-Priemer2013), between-group differences (Matthews et al. Reference Matthews, Doherty, Sharry and Fitzpatrick2008b ; Kauer et al. Reference Kauer, Reid, Sanci and Patton2009), or evidence of ongoing compliance (Tregarthen et al. Reference Tregarthen, Lock and Darcy2015). There was some indication that response rates were higher in studies with incentives. For example, Dennis et al. (Reference Dennis, Scott, Funk and Nicholson2015) offered an incentive of $50 per week, and had a participation rate of 89% (see Table 1 for comparative rates and incentive details). Participation rates also appeared to be affected by response fatigue. In Reid et al. (Reference Reid, Kauer, Dudgeon, Sanci, Shrier and Patton2009), for instance, response rates decreased from 91% on day 1 to 67% on day 7. Finally, participation rates were potentially affected by sample-specific characteristics. In a study with high-functioning autistic participants, Khor et al. (Reference Khor, Gray, Reid and Melvin2014a ) found a significant positive correlation between full-scale IQ and compliance rates (r = 0.46, p < 0.01).
Participants’ perceptions
Nine studies considered participants’ perceptions of the apps. Three of these studies specifically referred to the ‘acceptability’ of apps. In Dennis et al. (Reference Dennis, Scott, Funk and Nicholson2015), 95% of adolescents felt that the EMA app ‘was not too long’. Tregarthen et al. (Reference Tregarthen, Lock and Darcy2015) measured app utilisation data as a proxy for acceptability. There were over 100 000 users over a 2-year period (with 89% using the application at least three times), which the authors interpreted as a demonstration of broad acceptability. While they did not define acceptability specifically, Reid et al. (Reference Reid, Kauer, Dudgeon, Sanci, Shrier and Patton2009) concluded that their app was ‘acceptable’ based on the data they captured (e.g. completion rates, participants’ feedback).
Across studies, 93–100% of respondents found apps easy to learn or use (Dennis et al. Reference Dennis, Scott, Funk and Nicholson2015; Kenny et al. Reference Kenny, Dooley and Fitzgerald2015; Sacco, Reference Sacco2015). In addition, participants rated apps as useful (Kenny et al. Reference Kenny, Dooley and Fitzgerald2015), convenient, user-friendly (Bachmann et al. Reference Bachmann, Klebsattel, Budde, Riedel, Beigl, Reichert, Santangelo and Ebner-Priemer2015), youth-friendly and non-invasive (Reid et al. Reference Reid, Kauer, Dudgeon, Sanci, Shrier and Patton2009). Despite these positive experiences, technological difficulties (e.g. software crashes, reduced battery life) were reported to negatively affect user experience and participation (Loventoft et al. Reference Loventoft, Norregaard and Frokjaer2012; Huh et al. Reference Huh, Shin, Leventhal, Spruijt-Metz, Abramova, Cerrada, Hedeker and Dunton2014; Dennis et al. Reference Dennis, Scott, Funk and Nicholson2015; Sacco, Reference Sacco2015). Although most young people reported a preference for mobile phone mood charting in comparison to paper diaries (Matthews et al. Reference Matthews, Doherty, Sharry and Fitzpatrick2008b ), not all young people preferred mobile technology (Reid et al. Reference Reid, Kauer, Dudgeon, Sanci, Shrier and Patton2009; Scotti, Reference Scotti2015). Scotti (Reference Scotti2015), e.g. found that several participants from a sub-diagnostic eating disorder sample favoured paper-and-pencil to track their data.
Positive and negative clinical impacts of mood-monitoring apps
Mental health and awareness
Five (two were from the same RCT) studies examined potential clinical impacts of the apps. Reid et al. (Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2011) found a significant improvement in emotional self-awareness, but no significant improvements in depression, anxiety or stress scores in youth with mental health or emotional problems. In a secondary analysis of the same RCT, Kauer et al. (Reference Kauer, Reid, Crooke, Khor, Hearps, Jorm, Sanci and Patton2012) reported an indirect association between app use and depression symptoms via increased emotional self-awareness. The app, however, did not significantly reduce rumination.
Qualitative feedback from two studies also suggested that mood-monitoring apps can help improve self-awareness (Kenny et al. Reference Kenny, Dooley and Fitzgerald2015), and self-reflection on emotions or behaviours (Sacco, Reference Sacco2015).
Though they did not test this premise directly, Ansell et al. (Reference Ansell, Laws, Roche and Sinha2015) hypothesised that app-based monitoring could have promoted self-awareness in participants subsequently reducing (perceived) interpersonal hostility.
In Khor et al. (Reference Khor, Melvin, Reid and Gray2014b ), parents rated their children with high-functioning autism as showing fewer symptoms of behaviour and emotional problems following use of the self-monitoring app.
Treatment implications
Five studies reported results that could have implications for the prevention and treatment of mental health problems. Mobile app data gathered by Dennis et al. (Reference Dennis, Scott, Funk and Nicholson2015) were used to identify high-risk groups for substance use, which could potentially help with relapse prevention. Crooke et al. (Reference Crooke, Reid, Kauer, McKenzie, Hearps, Khor and Forbes2013) suggested that mood-monitoring apps could help investigate adolescents’ motivations for drinking, thus informing the development of interventions.
Qualitative feedback from therapists suggests that the use of mobile apps could help facilitate engagement with participants suffering from various mental health problems (Matthews & Doherty, Reference Matthews and Doherty2011). Reid et al. (Reference Reid, Kauer, Khor, Hearps, Sanci, Kennedy and Patton2012) reported that the Mobiletype app facilitated the assessment and management of youth mental health problems and reduced consultation time with paediatricians; the data captured enabled more individually focused consultations, which assisted in rapport building and communication.
In the third of a series of papers detailing their RCT, Reid et al. (Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2013) explored the potential treatment benefits of ‘Mobiletype’. In comparison to the control programme, the app significantly increased general practitioners’ (GPs) understanding of their patients’ health and current functioning, and aided diagnoses, communication, medication and referrals. However, there was no significant effect on doctor's confidence, doctor–patient rapport or pathways to care.
Finally, in a conference paper by Loventoft et al. (Reference Loventoft, Norregaard and Frokjaer2012), clinicians highlighted the usefulness of self-monitoring when combined with therapy.
Quality assessment
Please see online Supplementary Fig. S1 for an overall depiction of the risk of bias domains across studies.
Risk of selection bias was difficult to assess in many studies, as they often lacked treatment, control or comparison groups. Three studies (all using the same RCT data) were deemed at low risk of selection bias due to a clear description of the randomisation and concealment allocation process (Reid et al. Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2011, Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2013; Kauer et al. Reference Kauer, Reid, Crooke, Khor, Hearps, Jorm, Sanci and Patton2012). Two studies were at unclear risk of selection bias because randomised sequence generation and method of allocation concealment were not sufficiently described (Matthews et al. Reference Matthews, Doherty, Sharry and Fitzpatrick2008b ; Reid et al. Reference Reid, Kauer, Dudgeon, Sanci, Shrier and Patton2009). One study was considered at high risk of selection bias (Scotti, Reference Scotti2015) as there was no random allocation process for the control condition.
Only the RCT study (three publications) addressed the blinding of participants and personnel, and was thus considered at low risk of performance bias (Reid et al. Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2011, Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2013; Kauer et al. Reference Kauer, Reid, Crooke, Khor, Hearps, Jorm, Sanci and Patton2012). The risk of detection bias in these studies was unclear due to a lack of clarity on blinding of outcome assessments.
The risk of attrition bias was difficult to ascertain in three studies. In one study (Kenny et al. Reference Kenny, Dooley and Fitzgerald2015), a number of participants were not included in the final sample due to restrictions on school access (no other information was available). Bossmann et al. (Reference Bossmann, Kanning, Koudela-Hamila, Hey and Ebner-Priemer2013) excluded 15 participants from the final sample due to ‘missing data’, but did not provide further information, including whether any analyses were performed to address missing data. Reid et al. (Reference Reid, Kauer, Khor, Hearps, Sanci, Kennedy and Patton2012) was considered at unclear risk of attrition bias, as there was no information on the participants (21%) lost to follow-up. The remaining studies appeared to be at low risk of attrition bias. There was insufficient information to assess the risk of reporting bias in all studies but those of the RCT, which addressed pre-specified outcomes and appeared to be at low risk (Reid et al. Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2011, Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2013; Kauer et al. Reference Kauer, Reid, Crooke, Khor, Hearps, Jorm, Sanci and Patton2012). All studies appeared to be at unclear or high risk of other types of bias.
Discussion
The aim of this review was to summarise and evaluate evidence for the use of mobile mood-monitoring apps in young people (aged 10–24 years) from clinical and non-clinical populations. We specifically focused on psychometric properties, usability and clinical impacts.
Psychometric properties of mood-monitoring apps
Few studies assessed psychometric properties. There was limited evidence for reliability, with four studies demonstrating questionable to excellent levels of internal consistency. Studies examining concurrent (n = 3) and face (n = 2) validity were also sparse, making it difficult to draw firm conclusions. Face validity findings, e.g. could have been moderated by sample characteristics, e.g. reduced insight in participants with autism (Khor et al. Reference Khor, Gray, Reid and Melvin2014a ).
The limited assessment of psychometric properties observed in the youth literature mirrors the adult literature. Evidence for concurrent validity in adult populations is inconclusive (Depp et al. Reference Depp, Kim, de Dios, Wang and Ceglowski2012; Palmier-Claus et al. Reference Palmier-Claus, Ainsworth, Machin, Barrowclough, Dunn, Barkus, Rogers, Wykes, Kapur, Buchan, Salter and Lewis2012; Faurholt-Jepsen et al. Reference Faurholt-Jepsen, Frost, Vinberg, Christensen, Bardram and Kessing2014). Inconsistent methodology across these studies, e.g. momentary (Depp et al. Reference Depp, Kim, de Dios, Wang and Ceglowski2012) v. retrospective assessments (Faurholt-Jepsen et al. Reference Faurholt-Jepsen, Frost, Vinberg, Christensen, Bardram and Kessing2014), varying periods between the event and participants’ recollection of the event (Palmier-Claus et al. Reference Palmier-Claus, Ainsworth, Machin, Barrowclough, Dunn, Barkus, Rogers, Wykes, Kapur, Buchan, Salter and Lewis2012), likely contribute to variable findings. Previous evidence suggests that real-time mood measurement methods (e.g. EMA) only have a modest correlation with retrospective assessments, such as questionnaires (Ebner-Priemer & Trull, Reference Ebner-Priemer and Trull2009). This leads to the conceptual question of whether retrospective measures are the most appropriate comparators when assessing the validity of mood-monitoring apps. Questionnaires measure an individual's retrospective view of their mood state over a number of days. While they are subject to recall bias, this bias incorporates other emotional processing (e.g. contexts) that the more instantaneous assessment of mood (e.g. EMA) may not capture, or at least as richly. Thus, the two assessment methods may be measuring different types of affective experience. As it is difficult to draw robust conclusions about the validity of apps using retrospective assessments, future studies should further examine psychometric properties using other sources of comparative data, e.g. active smartphone app data (i.e. app assessments) with passive sensor smartphone data (Nicholas et al. Reference Nicholas, Larsen, Proudfoot and Christensen2015; Sandstrom et al. Reference Sandstrom, Lathia, Mascolo and Rentfrow2016b ), associations with clinical rating scales (Faurholt-Jepsen et al. Reference Faurholt-Jepsen, Munkholm, Frost, Bardram and Kessing2016).
Usability of mood-monitoring apps
The usability of mood-monitoring apps was more extensively studied, and overall studies suggest that apps are usable for young people. However, there were some within- and between-study differences in participants’ perceptions of apps, and participation rates.
Generally, participation rates were lower in studies where participants had mental health difficulties (Reid et al. Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2011; Kauer et al. Reference Kauer, Reid, Crooke, Khor, Hearps, Jorm, Sanci and Patton2012), problematic drinking patterns (Kauer et al. Reference Kauer, Reid, Sanci and Patton2009) or autism spectrum disorders – especially those with lower IQ (Khor et al. Reference Khor, Gray, Reid and Melvin2014a ). In particular, participation levels were low for those living without set routines (Kauer et al. Reference Kauer, Reid, Sanci and Patton2009). This is an important consideration, as youths with mood-related problems, e.g. borderline personality disorder, often have disorganised daily routines (Fleischer et al. Reference Fleischer, Schäfer, Coogan, Häßler and Thome2012). This suggests a need to tailor apps for different clinical populations (Kauer et al. Reference Kauer, Reid, Sanci and Patton2009).
Some studies indicated that incentives could positively influence participation rates (e.g. Ansell et al. Reference Ansell, Laws, Roche and Sinha2015; Dennis et al. Reference Dennis, Scott, Funk and Nicholson2015). It may not be financially feasible to offer incentives in non-research settings. However, results tentatively suggest that participation rates may be better for mobile apps than traditional paper-based assessments irrespective of incentives (Matthews et al. Reference Matthews, Doherty, Sharry and Fitzpatrick2008b ). Participation rates for paper-based diaries are as low as 11% (Stone et al. Reference Stone, Shiffman, Schwartz, Broderick and Hufford2003) compared with 30–99% for mood-monitoring apps in the current review. This supports that apps could lead to better adherence rates than non-digital assessment tools in young populations. Factors that could improve participation rates include the use of less intensive assessments (e.g. once-daily rather than multiple times), shorter assessments and the incorporation of staff monitoring or automatic reminders (Huh et al. Reference Huh, Shin, Leventhal, Spruijt-Metz, Abramova, Cerrada, Hedeker and Dunton2014).
Studies from the adult literature are somewhat congruent in supporting the usability of mood-monitoring apps (Bardram et al. Reference Bardram, Frost, Szántó, Faurholt-Jepsen, Vinberg and Kessing2013), though evidence suggests that increasing age (e.g. ‘middle age’) may lower likelihood of mood-monitoring app use (Depp et al. Reference Depp, Kim, de Dios, Wang and Ceglowski2012). Both adult (Palmier-Claus et al. Reference Palmier-Claus, Rogers, Ainsworth, Machin, Barrowclough, Laverty, Barkus, Kapur, Wykes and Lewis2013) and adolescent (Bradford & Rickwood, Reference Bradford and Rickwood2014) populations expressed some reservations about using apps due to the perceived risk of reduced personal contact (Palmier-Claus et al. Reference Palmier-Claus, Rogers, Ainsworth, Machin, Barrowclough, Laverty, Barkus, Kapur, Wykes and Lewis2013).
Overall our review demonstrated that young people positively perceive apps (Reid et al. Reference Reid, Kauer, Dudgeon, Sanci, Shrier and Patton2009) and would be willing to use this technology in real-life settings (Kenny et al. Reference Kenny, Dooley and Fitzgerald2015; Tregarthen et al. Reference Tregarthen, Lock and Darcy2015). Very few studies considered clinician perspectives on mood-monitoring apps. Matthews & Doherty (Reference Matthews and Doherty2011) found that therapists’ confidence with technology was the biggest barrier to the use of mood apps. More qualitative studies are now needed to further explore young peoples’ (and clinicians’) perceptions (Hollis et al. Reference Hollis, Falconer, Martin, Whittington, Stockton, Glazebrook and Davies2016) to broaden our understanding of factors pertinent to the uptake of mood-monitoring apps in real-life settings.
Positive and negative clinical impacts of mood-monitoring apps
Few of the included studies assessed the clinical impacts of the mood-monitoring apps. Although evidence was generally positive (e.g. facilitating assessment, management and GPs’ understanding), most studies relied on subjective participant feedback (Sacco, Reference Sacco2015) rather than RCT methodology with objective outcome measures.
The preliminary evidence (Kauer et al. Reference Kauer, Reid, Crooke, Khor, Hearps, Jorm, Sanci and Patton2012) very tentatively suggests that electronic mood-monitoring apps could function as an intervention tool (Seko et al. Reference Seko, Kidd, Wiljer and McKenzie2014; Olff, Reference Olff2015; Faurholt-Jepsen et al. Reference Faurholt-Jepsen, Munkholm, Frost, Bardram and Kessing2016). Intriguingly, results from the one RCT indicated that mood-monitoring apps might reduce depression in youths by increasing their levels of emotional awareness (Kauer et al. Reference Kauer, Reid, Crooke, Khor, Hearps, Jorm, Sanci and Patton2012). Similarly, though in a non-experimental study, Khor et al. (Reference Khor, Melvin, Reid and Gray2014b ) reported that self-monitoring improved parent-reported behavioural and emotional problems in participants with autism. While these results are promising, they require replication and future studies may further explore the mechanisms via which apps could potentially impact on clinical outcomes. One possibility is that mood apps could have a positive impact on clinical symptoms due to patient/participant expectations regarding their benefits. This phenomenon, coined the digital placebo effect, is an overlooked area, which also merits future investigation (Torous & Firth, Reference Torous and Firth2016).
We were unable to fully examine the potential negative impacts of mood-monitoring apps in youth populations, as they were not directly investigated in studies. However, Reid et al. (Reference Reid, Kauer, Dudgeon, Sanci, Shrier and Patton2009) found that participants did not always respond to questions truthfully to avoid having to answer further questions. Thus, this type of assessment could potentially lead to the inaccurate assessment (and treatment) of mental health problems.
A small number of adult studies report on the negative effects of mood-monitoring apps. There is some suggestion that apps may increase negative reactivity (Ainsworth et al. Reference Ainsworth, Palmier-Claus, Machin, Barrowclough, Dunn, Rogers, Buchan, Barkus, Kapur, Wykes, Hopkins and Lewis2013), increase focus on negative symptoms and thoughts (Palmier-Claus et al. Reference Palmier-Claus, Rogers, Ainsworth, Machin, Barrowclough, Laverty, Barkus, Kapur, Wykes and Lewis2013), and potentially maintain depressive symptoms (Faurholt-Jepsen et al. Reference Faurholt-Jepsen, Frost, Ritz, Christensen, Jacoby, Mikkelsen, Knorr, Bardram, Vinberg and Kessing2015). Given the evidence from the adult literature, research on the possible harmful effects of app use in youths is needed before these tools are routinely used in clinical practice. Part of this endeavour should seek to identify the optimal balance between a monitoring schedule, which accurately captures affective dynamic processes, while minimising respondent workload (Bolger et al. Reference Bolger, Davis and Rafaeli2003; Trull et al. Reference Trull, Lane, Koval and Ebner-Priemer2015). This is particularly important, not only because it affects participation rates, but also because the responsibility of self-monitoring could impose a burden on young people (Shiffman et al. Reference Shiffman, Stone and Hufford2008), might result in unnecessary pressure (Lupton, Reference Lupton2013; Seko et al. Reference Seko, Kidd, Wiljer and McKenzie2014) and exacerbate mental health problems (Conner & Reid, Reference Conner and Reid2012; Faurholt-Jepsen et al. Reference Faurholt-Jepsen, Frost, Ritz, Christensen, Jacoby, Mikkelsen, Knorr, Bardram, Vinberg and Kessing2015).
Future work may investigate potential ethical issues surrounding the use of mood-monitoring apps. For example, their use could lead to an over-reliance on technology in young populations, which could exacerbate mental health problems (Thomée et al. Reference Thomée, Härenstam and Hagberg2011). There could also be information security-related risks (e.g. digital theft) that could compromise confidentiality (Prentice & Dobson, Reference Prentice and Dobson2014). Finally, youths could use apps as a replacement for treatment and health monitoring (Tregarthen et al. Reference Tregarthen, Lock and Darcy2015). Considering the importance of the therapeutic alliance for successful treatment outcomes (Karver et al. Reference Karver, Handelsman, Fields and Bickman2006), the efficacy of smartphone apps could be reduced if they are used without clinicians’ involvement (Prentice & Dobson, Reference Prentice and Dobson2014).
Strengths and limitations
As far as we are aware, this is the first review to systematically examine and quality assess the evidence for the psychometric properties, usability and clinical outcomes of mood-monitoring apps in youth. However, our results should be considered through the lens of a number of limitations.
First, despite undertaking a comprehensive search, there were very few high-quality studies available for inclusion in the review. There was only one primary RCT highlighting the need for more trials on the efficacy of mood-monitoring apps in young people. Indeed, our quality assessment indicated that the majority of studies included some form of bias. For example, many studies were at high or unclear risk of sampling (e.g. self-selected samples) and attrition bias. This could have affected the generalisability of our findings or led to an overestimation of positive effects, e.g. our findings may only apply to individuals with less severe psychopathology who are more likely to engage with services.
Second, studies demonstrated a great variability in terminology (especially for implementation outcomes, e.g. acceptability) making interpretations and cross-study comparisons difficult (inconsistent terminology is also a common feature of the adult app literature). For example, we found that ‘acceptability’ was defined very differently across studies, ranging from proxy markers, i.e. utilisation data (Tregarthen et al. Reference Tregarthen, Lock and Darcy2015) to participants’ experience of burden (Dennis et al. Reference Dennis, Scott, Funk and Nicholson2015). This highlights the need for more careful delineation and measurement of implementation outcomes in future work (Proctor et al. Reference Proctor, Silmere, Raghavan, Hovmand, Aarons, Bunger, Griffey and Hensley2011).
Third, there were large variations in samples and methodologies, again making cross-study comparisons difficult and quantitative synthesis (i.e. meta-analysis) impossible. Thus, some of our conclusions remain tentative pending further rigorous, higher quality research (e.g. RCTs).
Fourth, it should be noted that studies in this review often used apps that were specifically developed for the study, and therefore not publically available through app platforms (e.g. iTunes). Thus, there is a need for more research to assess the evidence for apps that are freely downloaded and used by youth, and whether their use can be incorporated into clinical care (Nicholas et al. Reference Nicholas, Larsen, Proudfoot and Christensen2015).
Clinical and research implications
Mood-monitoring apps could potentially have positive effects in both clinical and sub-clinical youth populations. Indeed, mood-monitoring apps may help youth identify and address burgeoning mental health and substance use problems (Dennis et al. Reference Dennis, Scott, Funk and Nicholson2015), and possibly utilise more adaptive coping strategies (Kauer et al. Reference Kauer, Reid, Crooke, Khor, Hearps, Jorm, Sanci and Patton2012). Further research is needed to examine the effects of these apps in samples with serious mental disorders, such as bipolar disorder (Grunerbl et al. Reference Grunerbl, Muaremi, Osmani, Bahle, Ohler, Troster, Mayora, Haring and Lukowicz2015), borderline personality disorder (Lederer et al. Reference Lederer, Grechenig and Baranyi2014) and psychosis (Ben-Zeev et al. Reference Ben-Zeev, Brenner, Begale, Duffecy, Mohr and Mueser2014; Palmier-Claus et al. Reference Palmier-Claus, Taylor, Ainsworth, Machin, Dunn and Lewis2014).
Evidence, though limited, suggests that mood-monitoring apps could potentially aid diagnosis and treatment decision-making (Reid et al. Reference Reid, Kauer, Hearps, Crooke, Khor, Sanci and Patton2013). Future studies should explore whether this technology could aid in the assessment of disorders that can be difficult to differentiate [e.g. borderline personality disorder, bipolar disorder (Yen et al. Reference Yen, Frazier, Hower, Weinstock, Topor, Hunt, Goldstein, Goldstein, Gill, Ryan, Strober, Birmaher and Keller2015)] by providing rich data about the timing and extent of mood fluctuations.
As technological innovations have been endorsed at a government level, integrating mood-monitoring apps within mental health services may improve access and relieve some of the strain these services are currently experiencing [e.g. by improving access to mental health treatment (Department of Health, 2013)]. However, to date, the potential positive and negative impacts of apps have not been sufficiently investigated in youth.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291717001659.
Acknowledgements
MD is funded by an Economic and Social Research Council (ESRC) Collaborative Award Studentship – ES/J500203/1. Funding for open access publication was kindly provided by Research Councils UK (RCUK).