Introduction
Bipolar disorder (BD) is a complex psychiatric disorder characterized by severe mood dysregulation (APA, 1994). BD is associated with significant psychosocial morbidity and mortality (Angst et al. Reference Angst, Stassen, Clayton and Angst2002; Baldessarini & Tondo, Reference Baldessarini and Tondo2003) and is among the leading causes of disability worldwide (WHO, 2004). Despite illness severity, the delay between onset and accurate diagnosis is typically between 5 and 10 years (Lish et al. Reference Lish, Dime-Meenan, Whybrow, Price and Hirschfeld1994; Hirschfeld et al. Reference Hirschfeld, Calabrese, Weissman, Reed, Davies, Frye, Keck, Lewis, McElroy, McNulty and Wagner2003; Berk et al. Reference Berk, Dodd, Callaly, Berk, Fitzgerald, de Castella, Filia, Filia, Tahtalian, Biffin, Kelin, Smith, Montgomery and Kulkarni2007). Surveys of BD patients conducted over the past 20 years show no evidence of improvement in timely illness recognition (Hirschfeld et al. Reference Hirschfeld, Calabrese, Weissman, Reed, Davies, Frye, Keck, Lewis, McElroy, McNulty and Wagner2003; WHO, 2004; Berk et al. Reference Berk, Dodd, Callaly, Berk, Fitzgerald, de Castella, Filia, Filia, Tahtalian, Biffin, Kelin, Smith, Montgomery and Kulkarni2007). Delayed diagnosis in BD has adverse consequences in terms of increased periods in episode, greater psychosocial morbidity (Stensland et al. Reference Stensland, Schultz and Frytak2008) and emerging treatment resistance (Post et al. Reference Post, Leverich, Altshuler, Frye, Suppes, Keck, McElroy, Kupka, Nolen, Grunze and Walden2003; Ketter et al. Reference Ketter, Houston, Adams, Risser, Meyers, Williamson and Tohen2006). Delayed diagnosis is also associated with increased treatment costs (Stensland et al. Reference Stensland, Schultz and Frytak2008) whereas early and accurate identification of BD leads to significant cost savings (Menzin et al. Reference Menzin, Sussman, Tafesse, Duczakowski, Neumann and Friedman2009). The importance of early recognition in BD is further underscored by emerging yet compelling evidence that the illness is associated with evolving neurobiological changes that may drive subsequent clinical deterioration (Berk et al. Reference Berk, Malhi, Hallam, Gama, Dodd, Andreazza, Frey and Kapczinski2009; Kapczinski et al. Reference Kapczinski, Dias, Kauer-Sant'Anna, Brietzke, Vazquez, Vieta and Berk2009). Therefore, the timely diagnosis of BD is currently the single most important unmet need in enhancing clinical and functional outcomes.
Neuroimaging studies to date have established that structural magnetic resonance imaging (sMRI) can be used to identify brain morphological differences between BD patients and controls. Meta-analytic studies that have synthesized the extensive available literature have confirmed that BD is reliably associated with structural abnormalities in the ventral prefrontal cortex, the cingulate gyrus, amygdala/parahippocampal complex and the basal ganglia (Kempton et al. Reference Kempton, Geddes, Ettinger, Williams and Grasby2008; Arnone et al. Reference Arnone, Cavanagh, Gerber, Lawrie, Ebmeier and McIntosh2009; Vita et al. Reference Vita, De Peri and Sacchetti2009; Bora et al. Reference Bora, Fornito, Yücel and Pantelis2010; Ellison-Wright & Bullmore, Reference Ellison-Wright and Bullmore2010; Kempton et al. Reference Kempton, Salvador, Munafò, Geddes, Simmons, Frangou and Williams2011; Selvaraj et al. Reference Selvaraj, Arnone, Job, Stanfield, Farrow, Nugent, Scherk, Gruber, Chen, Sachdev, Dickstein, Malhi, Ha, Ha, Phillips and McIntosh2012). Despite the contribution of these findings to our understanding of the pathophysiology of BD, their clinical usefulness has been negligible. This is primarily because conventional sMRI data analyses compute mean group differences in spatial localized anatomical regions and do not make use of information about the distributed pattern of relationships among regions or voxels. Information about these spatial patterns is of particular relevance as it can be used for the diagnostic classification of individual patients, thus bridging the gap between neuroscience and clinical practice. In this respect, recent advances in multivariate pattern recognition techniques represent a major development.
The most commonly used pattern recognition algorithm has been the support vector machine (SVM) classifier. The SVM classifier has been used for the classification of patients with Alzheimer's disease (Klöppel et al. Reference Klöppel, Stonnington, Chu, Draganski, Scahill, Rohrer, Fox, Jack, Ashburner and Frackowiak2008; Vemuri et al. Reference Vemuri, Gunter, Senjem, Whitwell, Kantarci, Knopman, Boeve, Petersen and Jack2008), autism (Ecker et al. Reference Ecker, Rocha-Rego, Johnston, Mourao-Miranda, Marquand, Daly, Brammer, Murphy and Murphy2010), aphasia (Wilson et al. Reference Wilson, Ogar, Laluz, Growdon, Jang, Glenn, Miller, Weiner and Gorno-Tempini2009) and psychosis (Koutsouleris et al. Reference Koutsouleris, Meisenzahl, Davatzikos, Bottlender, Frodl, Scheuerecker, Schmitt, Zetzsche, Decker, Reiser, Möller and Gaser2009; Mourao-Miranda et al. Reference Mourao-Miranda, Reinders, Rocha-Rego, Lappin, Rondina, Morgan, Morgan, Fearon, Jones, Doody, Murray, Kapur and Dazzan2012) and to predict clinical variables based on patterns of brain activation in functional MRI (Fu et al. Reference Fu, Mourao-Miranda, Costafreda, Khanna, Marquand, Williams and Brammer2008; Marquand et al. Reference Marquand, Mourao-Miranda, Brammer, Cleare and Fu2008). However, the SVM classifier yields binary (case or control) and not probabilistic outcomes. For many applications, probabilistic predictions are desirable as they have two key advantages: they provide accurate quantification of predictive uncertainty, reflecting variability within subject groups (e.g. in quantifying the probability that a subject has a psychiatric disorder within a population where illness severity can be expected to vary between individuals), and they allow adjustment of predictions to compensate for different frequencies of diagnostic classes within the general population (Bishop, Reference Bishop2006). Gaussian process classifiers (GPCs) represent a significant advance over SVM as they are fully probabilistic pattern recognition models based on Bayesian probability theory. For neuroimaging, GPCs combine equivalent predictive performance to SVM with the additional benefit of probabilistic classification (Marquand et al. Reference Marquand, Howard, Brammer, Chu, Coen and Mourao-Miranda2010).
Therefore, we used GPCs to examine the predictive value of whole-brain gray (GM) and white matter (WM) anatomy in discriminating patients with BD from healthy individuals. We embedded the classifier in a recursive feature elimination (RFE) framework (Guyon et al. Reference Guyon, Weston, Barnhill and Vapnik2002; Marquand et al. Reference Marquand, De Simoni, O'Daly, Williams, Mourao-Miranda and Mehta2011) to identify and localize the subset of brain voxels that provide optimal discrimination accuracy. We focused on sMRI, rather than other neuroimaging techniques, as it is widely available, safe and has an established role in the diagnosis and management of brain disorders. Thus, a diagnostic aid based on sMRI data could be easily incorporated into routine clinical practice and is likely to have high patient acceptability. We enrolled patients with bipolar disorder, type 1 (BP-I; APA, 1994), whose diagnosis was further confirmed following detailed clinical assessment. Patients were in remission, free of any other lifetime psychiatric co-morbidity and matched to healthy controls on age, sex and general intellectual ability (IQ). This careful sample selection was designed to maximize the probability that discrimination between patients and controls would be attributable to brain structural changes relating to BD rather than other factors such as co-morbidity or general cognitive ability. Furthermore, we included two independent cohorts of patients and controls to determine the reliability of our findings.
Method
Samples
Cohort 1 comprised 26 patients fulfilling criteria for BP-I according to DSM-IV criteria (APA, 1994) and 26 healthy controls derived from participants in the Maudsley Bipolar Disorder Project (Frangou, Reference Frangou2005; Frangou et al. Reference Frangou, Donaldson, Hadjulis, Landau and Goldstein2005). Demographic and clinical information on the sample is shown in Table 1. Nineteen BD patients were prescribed psychotropic medications, often in combination (typical antipsychotics = 4, atypical antipsychotics = 4, lithium = 10, carbamazepine = 7, sodium valproate = 2). None of the patients were prescribed benzodiazepines, anticholinergics or any other medication.
BD, Bipolar disorder; HAMD, Hamilton Depression Rating Scale; MRS, Mania Rating Scale; GAF, Global Assessment of Functioning.
Continuous data expressed as mean (standard deviation).
Cohort 2 comprised 14 patients fulfilling DSM-IV criteria for BP-I and 14 healthy controls derived from participants in the Vulnerability Indicators to Bipolar Disorder Study (VIBES; Frangou, Reference Frangou2009). Demographic and clinical information on the sample is shown in Table 1. All patients in this sample were on treatment with anticonvulsant (sodium valproate = 10, carbamazepine = 4) monotherapy and did not receive any other type of medication.
For both samples, patients had an established diagnosis of BD and were receiving out-patient treatment within secondary care services. They were individually matched on sex, age and IQ to an equal number of healthy controls without a personal or family history of any DSM-IV Axis I disorders. All participants were screened to exclude past, current and hereditary neurological disorders, current medical conditions, DSM-IV current or lifetime drug or alcohol dependence or abuse, and other DSM-IV Axis I current or lifetime co-morbidity and contraindications to MR imaging. All participants were assessed by qualified psychiatrists using the Structured Clinical Interview for DSM-IV for Axis I Disorders, Patient or Non-Patient Version (SCID-I/P and SCID-I/NP; First et al. Reference First, Spitzer, Gibbon and Williams2002a ,Reference First, Spitzer, Gibbon and Williams b ) and the Family Interview for Genetic Studies (FIGS; Maxwell, Reference Maxwell1992), with additional information supplemented by medical notes as appropriate. Psychopathology was assessed using the Hamilton Depression Rating Scale (HAMD; Hamilton, Reference Hamilton1960) and the Mania Rating Scale (MRS; Spitzer & Endicott, Reference Spitzer and Endicott1978), and psychosocial functioning was assessed with the Global Assessment of Functioning (GAF) scale (APA, 1994). Patients were scanned when in remission operationalized as (a) the absence of syndromal episode for ⩾3 months, (b) being prescribed the same type and dose of medication for ⩾3 months, and (c) having HAMD and MRS total scores of <10 on the day of scanning. An estimate of general intellectual ability was obtained using the National Adult Reading Test (NART; Nelson & Wilson, Reference Nelson and Wilson1992). Patients in cohort 2 were younger, had an earlier age of onset and higher IQ than those in cohort 1 (p < 0.01).
This study was approved by the Joint Ethics Committee of the Institute of Psychiatry and the South London and Maudsley National Health Service (NHS) Foundation Trust. Written informed consent was obtained from all participants after a detailed description of the study.
MRI data acquisition
Participants were scanned using a 1.5-T GE NV/i Signa MR system (GE Medical Systems, USA) at the Maudsley Hospital, London. The whole brain was imaged with a three-dimensional (3D) inversion recovery prepared fast spoiled gradient-recalled acquisition in the steady state (SPGR) T1-weighted dataset. These T1-weighted images were obtained in the axial plane with 1.5-mm contiguous sections (echo time = 5.1 ms, repetition time = 18 ms, flip angle = 20°, slice thickness = 1.5 mm, in-plane resolution = 0.9375 × 0.9375 mm, number of excitations = 1). Image contrast for all datasets was chosen with the aid of optimizing software (Simmons et al. Reference Simmons, Arridge, Barker and Williams1996).
Data preprocessing
For both samples, all images were first visually inspected for artifacts or gross structural abnormalities using criteria described previously (Simmons et al. Reference Simmons, Arridge, Barker and Williams1996, Reference Simmons, Westman, Muehlboeck, Mecocci, Vellas, Tsolaki, Kłoszewska, Wahlund, Soininen, Lovestone, Evans and Spenger2011). Subsequently, images were preprocessed using SPM5 (www.fil.ion.ucl.ac.uk/spm/software/spm5). Using the unified segmentation step included in SPM5, images were normalized and segmented (Ashburner & Friston, Reference Ashburner and Friston2005). Normalized and modulated GM and WM segmented images were then smoothed with 8-mm isotropic Gaussian kernels and used as input into the classification algorithms.
Pattern classification analysis
The probability of group membership was determined separately in each cohort using GPCs to the MRI data. Technical descriptions of GPC inference have been presented elsewhere (Bishop, Reference Bishop2006; Rasmussen & Williams, Reference Rasmussen and Williams2006; Marquand et al. Reference Marquand, Mourao-Miranda, Brammer, Cleare and Fu2008, Reference Marquand, Howard, Brammer, Chu, Coen and Mourao-Miranda2010) and are summarized in the online Supplementary Material. In brief, the classifier is first trained to determine a predictive distribution that best distinguishes cases from controls; any parameters controlling the behavior of this distribution are computed by maximizing the logarithm of the marginal likelihood on the training data only. Then, in the test phase, the classifier predicts the group membership of a previously unseen example. This is achieved by integrating over the predictive distribution for the test case and passing the output through a sigmoidal function, resulting in predictive probabilities scaled between 0 and 1 that precisely quantify the predictive uncertainty of the classifier for the test case.
In each cohort, the GPCs for GM and WM were implemented separately in the PROBID software package (http://www.kcl.ac.uk/iop/depts/neuroimaging/research/imaginganalysis/Software/PROBID.aspx). We embedded each classifier in a recursive feature elimination (RFE) framework (Guyon et al. Reference Guyon, Weston, Barnhill and Vapnik2002; Marquand et al. Reference Marquand, De Simoni, O'Daly, Williams, Mourao-Miranda and Mehta2011), which enabled us to identify the subset of brain voxels that provided the optimal discrimination accuracy and to accurately localize the most discriminative brain voxels. To achieve this, we used nested (three-way) cross-validation where we first excluded a matched pair of subjects (one from each group) to comprise the test set, and then performed a second split where we repeatedly repartitioned the remaining subject pairs into a validation and a training set. We then repeatedly trained the classifier on the training set, removing a subset of the least informative features at each iteration, until no features remained. We used a common ranking criterion based on the GPC predictive weights to quantify the information content of each voxel at each iteration (Marquand et al. Reference Marquand, De Simoni, O'Daly, Williams, Mourao-Miranda and Mehta2011) and used a small step size (∼1% of voxels) to provide fine-grained control over the number of features retained. In each case we selected the number of features that produced maximal accuracy on the validation set before applying it to the test set. We thresholded the probabilistic predictions at 0.5 to convert the probabilistic predictions to class labels and computed the proportion of subjects having the correct label across all test splits to estimate the classification accuracy. The statistical significance of each classifier was determined by permutation testing. This test was used to derive a p value to determine whether the classification accuracy exceeded chance levels (50%). To achieve this, we permuted the class labels from the training set 1000 times (i.e. each time randomly assigning class labels to each structural MRI pattern) and repeated the entire RFE procedure. We then counted the number of times the permuted test accuracy was equal to or greater than the one obtained for the true labels. Dividing this number by 1000, we derived a p value of the classification accuracy.
Cross-validation
The performance of the GM and WM classifiers for each cohort separately was estimated in four ways. First, for each classifier within each sample we computed the proportion of images correctly classified as BD patients or controls (i.e. classification accuracy). Second, we quantified the sensitivity and specificity of each classifier defined as: sensitivity = TP/(TP + FN) and specificity = TN/(TN + FP), where TP is the number of true positives (number of images of patients correctly classified), TN is the number of true negatives (number of images of controls correctly classified), FP is the number of false positives (number of images of controls classified as patients) and FN is the number of false negatives (number of images of patients classified as controls). Third, we compared the results obtained through the GPCs to those derived from conventional univariate voxel-based morphometry (VBM) implemented in SPM5. In VBM analyses we used a statistical threshold of p < 0.0001 uncorrected for multiple comparisons. Thus we preserve a reasonable degree of specificity in favor of increased sensitivity as the aim of this analysis was to assist in making inferences about the contribution of different regions to the spatially distributed pattern associated with a diagnosis of BD. This is in contrast to the more stringent inferential methods used for controlling type I error to find highly localized, spatially segregated focal group differences. Fourth, regression analyses, thresholded at p < 0.001 uncorrected, were implemented in SPM5 using cumulative exposure to lithium or antipsychotics (based on doses transformed to chlorpromazine equivalents) to identify potential effects of medication on the GM and WM volumes of patients.
GPC discrimination maps
The discrimination map (Rasmussen & Williams, Reference Rasmussen and Williams2006) is a spatial representation of the vector of GPC predictive weights and describes the relative contribution of each brain voxel to the classifier decision. Technical details of GPC discrimination mapping have been published elsewhere (Bishop, Reference Bishop2006; Marquand et al. Reference Marquand, Howard, Brammer, Chu, Coen and Mourao-Miranda2010) and are described in the online Supplementary Material. The process is illustrated in Fig. 1 based on a simplified hypothetical example of a two-voxel image.
Results
Prediction accuracy
Classification accuracy reflects the predictive power of the algorithm and is therefore of direct diagnostic relevance. For cohort 1, classification accuracy using GPC analysis of GM images was 73% with a sensitivity and specificity of 77% and 69% respectively. In other words, based on a GM anatomical scan, if a participant had a clinical diagnosis of BD, the probability of correct classification was 0.77. Conversely, if a participant did not have BD, the probability of being correctly classified as a control was 0.69. The GPC analysis using WM images for cohort 1 yielded an accuracy of 69% with a sensitivity of 69% and specificity of 69%. For cohort 2, the sensitivity and specificity of the GM classification were 64% and 99% respectively and the overall accuracy was 72%. In the same cohort, GPC analysis using WM images yielded an accuracy of 78% with a sensitivity of 71% and specificity of 86%. For both cohorts, the models were significant at p < 0.001.
Discrimination maps
Discrimination maps showing the global spatial pattern by which the groups differ are illustrated in Fig. 2 and detailed in Tables 2 (GM) and 3 (WM) for cohort 1 and in Fig. 3 and Tables 4 (GM) and 5 (WM) for cohort 2. For both cohorts, the maps highlight those regions that, according to our GPC-RFE classification approach, contain the most discriminating voxels between BD patients and controls. This optimal discriminative pattern was obtained following removal of ∼98% of all voxels. Because of the multivariate character of the GPC, the discrimination maps should not be interpreted as describing focal effects within individual brain regions. Instead they represent a spatially distributed pattern of coefficients that quantify the contribution of each voxel to the GPC decision function (i.e. the value of a voxel in the discrimination map reflects its contribution or predictive value towards one class or the other). We used the following convention: class 1 was the BD group, with labels +1, and class 2 was the control group, with labels −1. In the discrimination map, positive coefficients indicate voxels with a predictive value for BD (class 1; visualized in red color scale) and negative coefficients indicate voxels with a predictive value for controls (class 2; visualized in blue color scale); by definition, a voxel with negative predictive weight for patients has the same positive predictive weight for controls.
BA, Brodmann area; R, right; L, left.
Coordinates are shown in Montreal Neurological Institute (MNI) standard space; x = sagittal, y = coronal, z = axial.
a Number of voxels.
b Highest weights within individual clusters.
L, Left; R, right.
Coordinates are shown in Montreal Neurological Institute (MNI) standard space; x = sagittal, y = coronal, z = axial.
a Number of voxels.
b Highest weights within individual clusters.
BA, Brodmann area; R, right; L, left.
Coordinates are shown in Montreal Neurological Institute (MNI) standard space; x = sagittal, y = coronal, z = axial.
a Number of voxels.
b Highest weights within individual clusters.
There is significant overlap between the cohorts. In terms of GM, regions within the frontopolar and ventral prefrontal cortex, the parietal lobules, the middle/superior temporal gyri, the lingual gyrus and cuneus and within the thalamus and cerebellum emerge as being implicated most consistently in the diagnosis of BD. A similar conclusion can be drawn for WM tracts traversing ventral prefrontal regions, parietal and postcentral regions, the middle occipital gyrus and the cuneus, the cingulum and genu of the corpus callosum. Discriminative regions were more extensive in cohort 2, which consisted of younger and less medicated patients than cohort 1, suggesting that the results of the GPCs are not driven by medication or age.
VBM
At the threshold of p < 0.0001 uncorrected, GM and WM volumetric differences between patients and controls were noted in multiple brain regions in both cohorts. Details of the regional maxima are provided in Supplementary Tables S1–S4. As discussed, the output of the GPC and VBM analyses are not directly comparable as the former reflects the predictive value of voxels in discriminating between patients and controls whereas the latter represents the mean differences between patients and controls. Nevertheless, there is significant overlap between the two outputs and between cohorts. An effect of medication was identified only in cohort 1, with cumulative lithium exposure being positively associated with the right anterior cingulate GM volume (x = 5.2, y = 41.1, z = 4, cluster size = 85). No correlations with medication dose were noted in cohort 2.
Discussion
To our knowledge, this is the first study to evaluate the feasibility of using pattern recognition algorithms for the automatic classification of sMRI data of patients with BD and healthy controls. We found that GPCs applied to GM reliably achieved above chance discriminative power solely on the basis of anatomical data, with classification accuracy ranging between 69% and 78%.
Prediction accuracy of GPCs applied to sMRI in BD
Neuroanatomical studies using conventional analyses have established the presence of morphological changes in BD (Kempton et al. Reference Kempton, Geddes, Ettinger, Williams and Grasby2008; Arnone et al. Reference Arnone, Cavanagh, Gerber, Lawrie, Ebmeier and McIntosh2009; Vita et al. Reference Vita, De Peri and Sacchetti2009; Bora et al. Reference Bora, Fornito, Yücel and Pantelis2010; Ellison-Wright & Bullmore, Reference Ellison-Wright and Bullmore2010; Kempton et al. Reference Kempton, Salvador, Munafò, Geddes, Simmons, Frangou and Williams2011; Selvaraj et al. Reference Selvaraj, Arnone, Job, Stanfield, Farrow, Nugent, Scherk, Gruber, Chen, Sachdev, Dickstein, Malhi, Ha, Ha, Phillips and McIntosh2012). However, these findings have had limited translational application primarily for three reasons: (a) there is considerable between-group overlap in brain morphological variables derived from group-level neuroimaging analyses (Kempton et al. Reference Kempton, Geddes, Ettinger, Williams and Grasby2008, Reference Kempton, Haldane, Jogia, Grasby, Collier and Frangou2009, Reference Kempton, Salvador, Munafò, Geddes, Simmons, Frangou and Williams2011), (b) voxel-based analysis methods are significantly biased toward detecting group differences that are highly localized in space but are limited in detecting group differences that are spatially distributed and subtle (Davatzikos, Reference Davatzikos2004), and (c) voxel-based analyses do not lend themselves to making predictions at the level of individual subjects.
The data presented here demonstrate that these limitations may be surmounted with the aid of multivariate pattern recognition techniques. The application of GPC analysis to anatomical scans in BD provided diagnostic accuracy in the range 69–78%. As with any new test, the accuracy of the GPC classification for BD was determined against ‘gold standard’ diagnostic assessments. In this study ‘true positive cases’ (i.e. patients with BD) were identified using the SCID-I, conducted by clinicians with expertise in mood disorders. The SCID-I is designed to elicit the presence or absence of the operational criteria that define the syndrome of BD itself and is therefore expected to have the highest diagnostic accuracy (Williams et al. Reference Williams, Gibbon, First, Spitzer, Davies, Borus, Howes, Kane, Pope, Rounsaville and Wittchen1992; Fennig et al. Reference Fennig, Craig, Lavelle, Kovasznay and Bromet1994; Segal et al. Reference Segal, Hersen and Van Hasselt1994). A more appropriate comparison would be with behavior-based case-finding instruments, whose sensitivity and specificity are about 70%, such as the Mood Disorder Questionnaire (MDQ; Hirschfeld et al. Reference Hirschfeld, Williams, Spitzer, Calabrese, Flynn, Keck, Lewis, McElroy, Post, Rapport, Russell, Sachs and Zajecka2000; Hirschfeld, Reference Hirschfeld2010; Zimmerman et al. Reference Zimmerman, Galione, Chelminski, Young and Dalrymple2011). However, even more important is the comparison of our results to ‘real world’ clinical assessments where BD is either missed or misdiagnosed resulting in nearly a third of patients having to wait for approximately 10 years before they receive an accurate diagnosis (Lish et al. Reference Lish, Dime-Meenan, Whybrow, Price and Hirschfeld1994; Hirschfeld et al. Reference Hirschfeld, Calabrese, Weissman, Reed, Davies, Frye, Keck, Lewis, McElroy, McNulty and Wagner2003; Berk et al. Reference Berk, Dodd, Callaly, Berk, Fitzgerald, de Castella, Filia, Filia, Tahtalian, Biffin, Kelin, Smith, Montgomery and Kulkarni2007). This is because of the substantial overlap between clinical symptoms of BD and those of other disorders, particularly major depressive disorder (MDD) because depressive symptoms are commonly present at onset (Perugi et al. Reference Perugi, Micheli, Akiskal, Madaro, Socci, Quilici and Musetti2000) and often dominate the clinical picture thereafter (Judd et al. Reference Judd, Akiskal, Schettler, Endicott, Maser, Solomon, Coryell, Maser and Keller2002). Additionally, the presence of psychosis during manic or depressive episodes often leads to difficulties in distinguishing BD from schizophrenia and schizo-affective disorder (Schimmelmann et al. Reference Schimmelmann, Conus, Edwards, McGorry and Lambert2005). Further diagnostic challenges arise from the high level of co-morbidity of BD with other disorders, particularly substance abuse and anxiety disorders (McElroy et al. Reference McElroy, Altshuler, Suppes, Keck, Frye, Denicoff, Nolen, Kupka, Leverich, Rochussen, Rush and Post2001; Merikangas et al. Reference Merikangas, Jin, He, Kessler, Lee, Sampson, Viana, Andrade, Hu, Karam, Ladea, Medina-Mora, Ono, Posada-Villa, Sagar, Wells and Zarkov2011). In this context, a classifier that is trained to identify true positives BD cases might have an important role in assisting clinicians when used in combination with other clinical measures.
The results presented here for the predictive value of sMRI data in BD compare favorably with classification accuracies of approximately 80% reported for Alzheimer's disease and schizophrenia, even though the magnitude of neuroanatomical deviance is greater for these disorders (reviewed by Klöppel et al. Reference Klöppel, Abdulkadir, Jack, Koutsouleris, Mourão-Miranda and Vemuri2012).
Brain regions discriminating patients with BD from controls
The GM and WM discriminative maps generated by the GPCs show that clusters contributing to the distinction between BD patients and healthy controls are spatially distributed within cortical and subcortical regions (Tables 2–5). GM discriminative clusters consistently associated with BD in both cohorts were localized primarily within the frontopolar and ventral prefrontal cortex, the inferior parietal lobule, the medial and lateral temporal cortex, the cingulate cortex, occipital regions in the lingual gyrus and cuneus, the thalamus and cerebellum. This is in keeping with previous morphometric studies that have repeatedly shown an association between volumetric changes in these regions and disease expression for BD (Kempton et al. Reference Kempton, Geddes, Ettinger, Williams and Grasby2008, Reference Kempton, Haldane, Jogia, Grasby, Collier and Frangou2009, Reference Kempton, Salvador, Munafò, Geddes, Simmons, Frangou and Williams2011; Scherk et al. Reference Scherk, Kemmer, Usher, Reith, Falkai and Gruber2008; Arnone et al. Reference Arnone, Cavanagh, Gerber, Lawrie, Ebmeier and McIntosh2009; Yu et al. Reference Yu, Cheung, Leung, Li, Chua and McAlonan2010; Hallahan et al. Reference Hallahan, Newell, Soares, Brambilla, Strakowski, Fleck, Kieseppä, Altshuler, Fornito, Malhi, McIntosh, Yurgelun-Todd, Labar, Sharma, MacQueen, Murray and McDonald2011). Previous research on global (Scherk et al. Reference Scherk, Kemmer, Usher, Reith, Falkai and Gruber2008; Vita et al. Reference Vita, De Peri and Sacchetti2009) and regional (McIntosh et al. Reference McIntosh, Job, Moorhead, Harrison, Lawrie and Johnstone2005; Stanfield et al. Reference Stanfield, Moorhead, Job, McKirdy, Sussmann, Hall, Giles, Johnstone, Lawrie and McIntosh2009) WM volume changes in BD yielded variable results. However, there is increasing consensus for an association between disease expression for BD and WM pathology within the cingulum (Vederine et al. Reference Vederine, Wessa, Leboyer and Houenou2011) and the genu of the corpus callosum (Bellani et al. Reference Bellani, Yeh, Tansella, Balestrieri, Soares and Brambilla2009; Walterfang et al. Reference Walterfang, Malhi, Wood, Reutens, Chen, Barton, Yücel, Velakoulis and Pantelis2009a ,Reference Walterfang, Wood, Barton, Velakoulis, Chen, Reutens, Kempton, Haldane, Pantelis and Frangou b ; Bearden et al. Reference Bearden, van Erp, Dutton, Boyle, Madsen, Luders, Kieseppa, Tuulio-Henriksson, Huttunen, Partonen, Kaprio, Lönnqvist, Thompson and Cannon2011). Our findings in both cohorts suggest that WM regions of predictive value for BD patients are widespread but consistently include the cingulum and genu. Although the results of the VBM and GPC analyses cannot be compared directly, they showed significant overlap in terms of the spatial distribution of regions influenced by the diagnosis of BD.
R, Right; L, left.
Coordinates are shown in Montreal Neurological Institute (MNI) standard space; x = sagittal, y = coronal, z = axial.
a Number of voxels.
b Highest weights within individual clusters.
Methodological considerations and future directions
In cohort 1, the majority of BD patients were medicated with antipsychotic or mood-stabilizing medication or both. Treatment with lithium has been associated with volumetric changes in BD, and specifically increases in global and regional volumes (Bearden et al. Reference Bearden, Thompson, Dalwani, Hayashi, Lee, Nicoletti, Trakhtenbroit, Glahn, Brambilla, Sassi, Mallinger and Frank2007; Kempton et al. Reference Kempton, Geddes, Ettinger, Williams and Grasby2008, Reference Kempton, Haldane, Jogia, Grasby, Collier and Frangou2009; Phillips et al. Reference Phillips, Travis, Fagiolini and Kupfer2008; Germana et al. Reference Germana, Kempton, Sarnicola, Christodoulou, Haldane, Hadjulis, Girardi, Tatarelli and Frangou2010; van Erp et al. Reference van Erp, Thompson, Kieseppa, Bearden, Marino, Hoftman, Haukka, Partonen, Huttunen, Kaprio, Lönnqvist, Poutanen, Toga and Cannon2012). In line with this, we observed a positive association between lithium dose and GM volume in the anterior cingulate in cohort 1. However, medication effects related to lithium or antipsychotics cannot fully explain the results because patients in cohort 2 were not on these medications.
The GPCs were trained to segregate healthy controls from patients with BD. This represents the necessary first step in developing pattern recognition approaches for use as neurodiagnostic tools. Future studies are required to replicate these findings in larger samples and across different sites. Another important task is to evaluate the performance of pattern recognition classifiers for the identification of biologically meaningful subtypes of BD and for the differential diagnosis of BD from disorders with overlapping clinical phenotypes.
In summary, our results demonstrate that GPC-based neuroanatomical pattern recognition techniques may prove clinically useful in improving the timely diagnosis of BD, which currently relies entirely on clinical symptoms.
Supplementary material
For supplementary material accompanying this paper visit http://dx.doi.org/10.1017/S0033291713001013.
Acknowledgments
This work was partially supported by the European College of Neuropsychopharmacology, Networks Initiative, Neuroimaging Network, which had no further role in the study design, in the collection, analysis and interpretation of data, in the writing of the manuscript, and in the decision to submit it for publication. A.M. gratefully acknowledges support from the King's College London Centre of Excellence in Medical Engineering, funded by the Wellcome Trust and the Engineering and Physical Sciences Research Council (EPSRC) under grant no. WT088641/Z/09/Z. J.M.M. was funded by a Wellcome Trust Career Development Fellowship under grant no. WT086565/Z/08/Z. V.R.R. and A.S. were supported by the National Institute for Health Research (NIHR) Biomedical Research Centre for Mental Health at the South London and Maudsley NHS Foundation Trust and Institute of Psychiatry, King's College London.
Declaration of Interest
None.