We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Chapter 5 is dedicated to the most important part of predictive modeling for biomarker discovery based on high-dimensional data – multivariate feature selection. When dealing with sparse biomedical data whose dimensionality is much higher than the number of training observations, the crucial issue is to overcome the curse of dimensionality by using methods capable of elevating signal (predictive information) from the overwhelming noise. One way of doing this is to perform many (hundreds or thousands) parallel feature selection experiments based on different random subsamples of the original training data and then aggregating their results (for example, by analyzing the distribution of variables among the results of those parallel experiments). Two designs of such parallel feature selection experiments are discussed in detail: one based on recursive feature elimination, and the other on implementing the stepwise hybrid selection with T2. The chapter includes also descriptions of three evolutionary feature selection algorithms: simulated annealing, genetic algorithms, and particle swarm optimization.
Chapter 17 describes the second real-life study, whose goal is the identification of multivariate biomarkers for liver cancer. This study implements parallel recursive feature elimination experiments coupled with random forests and support vector machines. Included are also considerations for rebalancing class proportions. Three multivariate biomarkers for liver cancer have been identified. The study has been performed in an R environment, and R scripts for all of its steps are provided.
Apart from the psychiatric symptoms, cognitive deficits are also the core symptoms of schizophrenia. Brain network control theory provided information on the role of a specific brain region in the cognitive control process, helping understand the neural mechanism of cognitive impairment in schizophrenia.
Objectives
To characterize the control properties of functional brain network in first-episode untreated patients with schizophrenia and the relationships between controllability and psychiatric symptoms, as well as exploring the predictive value of controllability in differentiating patients from healthy controls (HCs).
Methods
Average and modal controllability of brain networks were calculated and compared between 133 first-episode untreated patients with schizophrenia and 135 HCs. The associations between controllability and clinical symptoms were evaluated using sparse canonical correlation analysis. Support vector machine (SVM) and SVM-recursive feature elimination combined with the controllability were performed to establish the individual prediction model.
Results
Compared to HCs, the patients with schizophrenia showed increased average controllability and decreased modal controllability in dorsal anterior cingulate cortex (dACC). Brain controllability predominantly in somatomotor, default mode, and visual networks was associated with the positive symptomatology of schizophrenia. The established model could identify patients with an accuracy of 0.68. Furthermore, the most discriminative features were located in dACC, medial prefrontal lobe, precuneus and superior temporal gyrus.
Conclusions
Altered controllability in dACC may play a critical role in the neuropathological mechanisms of cognitive deficit in schizophrenia, which could drive the brain function to different states to cope with varied cognitive tasks. As symptom-related biomarkers, controllability could be also beneficial to individual prediction in schizophrenia.
This chapter presents a staple of Feature Engineering: the automatic reduction of features, either by direction selection or by projection to a smaller feature space.Central to Feature Engineering are efforts to reduce the number of features, as uninformative features bloat the ML model with unnecessary parameters. In turn, too many parameters then either produces suboptimal results, as they are easy to overfit, or require large amounts of training data. These efforts are either by explicitly dropping certain features (feature selection) or mapping the feature vector, if it is sparse, into a lower, denser dimension (dimensionality reduction). There are also cover some algorithms that perform feature selection as part of their inner computation (embedded feature selection or regularization). Feature selection takes the spotlight within Feature Engineering due to its intrinsic utility for Error Analysis. Some techniques such as feature ablation using wrapper methods are used as the starting step before a feature drill down. Moreover, as feature selection helps build understandable models, it intertwines with Error Analysis as the analysis profits from such understandable models.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.