Introduction
Prevention, based on a person-centred approach, is a core task in family medicine and often involves a complex problem-solving approach (Sweeney et al., Reference Sweeney, Mac Auley and Gray1998; Van Weel and Knottnerus, Reference Van Weel and Knottnerus1999). As evidence-based medicine draws primarily on randomised controlled trials investigating specific questions about efficacy in narrowly defined population groups, it does not easily guide this area of decision-making. The challenge is therefore to develop research protocols that can address this task adequately (Rosser, Reference Rosser1999; Van Weel and Rosser, Reference Van Weel and Rosser2004). Mathematical models, by providing prognostic factors, specifically for the target outcome, or a patient group, might be helpful in this sense (Sweeney et al., Reference Sweeney, Mac Auley and Gray1998; Campbell, Reference Campbell2006). Decisions on the choice of parameters for modelling involves researchers taking responsibility for knowledge collection and integration. However, for many problems in every day primary care there are not clearly elaborated theories to guide this (Rosser, Reference Rosser1999; Van Weel and Rosser, Reference Van Weel and Rosser2004). This difficulty is even more apparent if one takes into account variation in local working environments, as well as the specific characteristics of local population groups. On the basis of an example of influenza vaccination outcome, we suggest the use of a systems biology methodology, considered as involving both a step-wise research protocol, allowing the research to start with a poorly proved theory, and the use of a systematic record of relevant health parameters.
During recent decades, efforts have been focused on defining markers that can identify individuals who are likely to respond poorly to influenza vaccine. This seemed important because of the conflicting results of published reports on the immune response to influenza vaccine in elderly persons (Webster, Reference Webster2000). Actually, there has been uncertainty on whether differences in health status, or just in the age of those vaccinated were responsible for differences in immune responses between the elderly and younger people, observed in some, but not all studies. In addition, the effect of different vaccine status and/or differences in past infections with influenza viruses between these two generation groups, have been proposed as potential explanations for the differential response. Now that rapid progress in biotechnology is likely to ensure alternative vaccination approaches it is even more important to answer this question (Tosh and Poland, Reference Tosh and Poland2008). However, the single-disease oriented reductionist research methods currently in use are unable to cope with the multifactorial nature of this task (Brydak and Machala, Reference Brydak and Machala2000). Attempts to investigate the association of potentially relevant factors with post-vaccination antibody responses, using multivariate analyses are scarce. (Remarque et al., Reference Remarque, Cools, Boere, van der Klis, Masurel and Ligthart1996). A critical point in modelling is the choice of parameters used as the input, but in modelling influenza vaccination outcomes, a major difficulty is the wide range of factors related to chronic ageing diseases (Ligthart et al., Reference Ligthart, Corberand and Fournier1984; Brydak and Machala, Reference Brydak and Machala2000). In relation to this, the theoretical background is limited, as immunoregulatory disorders that might account for the deficient immune response to influenza vaccine observed in chronically ill elderly patients have not yet been found (Gross et al., Reference Gross, Gerald, Weksler, Setia and Douglas1989; Castle, Reference Castle2000). It has been realised, for example, that differences in stages of a disease, comorbidity, lifestyle factors, or particular biochemical disorders can all contribute to the variation of immune response to influenza vaccine (Wick and Grubeck-Loebenstein, Reference Wick and Grubeck-Loebenstein1997; Brydak and Machala, Reference Brydak and Machala2000). To deal with the complexity of this task, we reached out for the concept of a systems biology, originally applied to analyse high-dimensional, non-linear data provided by new sophisticated diagnostic methods, such as genomics and proteomics (Larranaga et al., Reference Larranaga, Calvo, Santana, Bielza, Galdiano, Inza, Lozano, Santafe, Perez and Robles2005). From the theoretical background, this concept emerges from the science of complexity.
The science of complexity and a systems biology
According to the science of complexity, to truly understand the functioning of biological organisms, including humans and their diseases, they must be studied as complex systems (Goldberger, Reference Goldberger1996). In a complex system, components respond to the environment by using internalised sets of rules that drive the action of the system. In other words, the behaviour of a complex system emerges as an effect of physiological networks.
Although the science of complexity states that biological systems are complex, a systems biology can be defined as a quantitative analysis of how components in the network interact with each other to produce a function, or a phenotype (Kitano, Reference Kitano2002; Iris, Reference Iris2008). We cannot predict the behaviour of a complex system with certainty; however, we can draw inferences by mathematical modelling. Although, in mathematical terms, a complex system can be determined by a range of numerical parameters, a working mathematical model may not need to include all possible parameters, as only a few of them are likely to control the outcomes of the system.
As based on a multitude of poorly proved parameters, a systems biology approach, unlike the classical, reductionist one, where only a few, recognisable parameters can be evaluated, is not strongly driven by the hypothesis, but is rather based on the use of research protocol (Figure 1). That means that background information for input are drawn out from the literature. The model is created on the basis of these pieces of information and tested by experiment, or a computer-based simulation (a model-building approach). The results are, in turn, used to make corrections to the model.
An example – the case study on prediction of influenza vaccination outcomes
We applied a systems biology approach to identify health parameters to use in models to predict responses to influenza vaccine. This aimed to support decision-making about vaccination strategies and provide insight into the value of adopting a systems biology approach in exploring complex health problems.
Methods
Population
The examined sample consisted of 93 volunteers, 35 male and 58 female, aged 50–89 years (median 69), out of 150 patients vaccinated against influenza in the season 2003–04, in a family practice located in the town of Osijek, Croatia, in a region with a high prevalence of chronic diseases. The sample was drawn from the high-risk population for influenza complications who often require vaccination, consisting of older patients with multiple medical conditions, primary health care attenders (Center for Disease Control and Prevention, 2007). The study protocol was approved by the local ethics committee.
Influenza vaccination
Trivalent inactivated split vaccine containing A/H1N1/New Caledonia/20/99-like, A/H3H2/Moscow/10/99-like and B/Hong Kong-330/2001-like influenza virus strains, recommended that season, was used. In addition, the influenza vaccine component type B was tested for heterologous reaction against the virus antigen B/Sicuan 379/99, included in the recent past (Pyhala et al., Reference Pyhala, Kumpulainen, Alanko and Forsten1994). In order to measure vaccination outcomes, specific antibody production was determined by the haemagglutination inhibition (HI) test, with the measure for expression of the specific antibody production being taken as at least a fourfold increase (greater than or equal to four times) in antibody titres. Serological measurements were taken in the Department of Virology of The Croatian Public Health Institute, Zagreb.
The study design – a step-wise research protocol
We used the three-step research protocol (Figure 1). In the first step, we searched the MEDLINE/PubMed Journal database and also screened references to find out how chronic ageing diseases and age-related pathogenetic perturbations alter immune system functions. Our assumption was based on the novel theories of ageing stating that the mechanisms of ageing and age-related diseases could be better understood if viewed as networked reactions integrating all levels of the bodily organisation, from the molecular and cellular, to the personal (systems) one. In particular, this suggests that the diverse expression of ageing phenotypes can be explained through the dynamic interplay between three main control systems, (insulin-dependent metabolism, the neuroendocrine, and the immune systems), with chronic inflammation acting as an intermediate mechanism linking changes in these systems together (Franceschi et al., Reference Franceschi, Valensin, Bonafe, Paolisso, Yashin, Monti and De Benedictis2000).
On the basis of this background information, in the second step of the protocol, we collected data on various health-related parameters indicating inflammation, nutritional, metabolic and neuroendocrine status, chronic renal impairment, latent infections, and humoral immunity (Pozzetto et al., Reference Pozzetto, Odelin, Bienvenu, Defayolle and Aymard1993; Wick and Grubeck-Loebenstein, Reference Wick and Grubeck-Loebenstein1997; Schroecksnadel et al., Reference Schroecksnadel, Frick, Wirleitner, Winkler, Schennach and Fuchs2002; Sipponen et al., Reference Sipponen, Laxen, Huotari and Harkonen2003; Trzonkowski et al., Reference Trzonkowski, Mysliwska, Szmit, Wieckiewicz, Lukaszuk and Brydak2003; Clarke et al., Reference Clarke, Grimley, Schneede, Nexo, Bates and Fletcher2004). Owing to the large number of input parameters, it was necessary to reduce this number before starting to build a prediction model. Non-linear data mining algorithms were used for this purpose, resulting in the limited pool of selected parameters, potential predictors of antibody responses to influenza vaccine (Figure 1).
To reach the final aim of our research, to develop a statistical model that may accurately predict responses to influenza vaccine, in the third step of the protocol, we combined previously selected candidate health parameters with information on past influenza exposure, using logistic regression to build a prediction model (Figure 1).
A data set – systematic health data record
We determined the health status of examined patients systematically, using 52 clinical parameters indicating age and sex, diagnoses of the main groups of chronic diseases (Table 1), anthropometric measures (indicating the nutritional status; Table 2) and haematological, and biochemical tests (Table 3). In order to be chosen, laboratory tests had to meet two criteria: to reflect the age-related health disorders found to have a negative impact on the immune system functions, and to be available in routine primary health care.
OGTT = Oral Glucose Tolerance Test; DEXA = dual-energy-X-ray-absorptiometry, the standard to diagnose osteoporosis (in 10 cases the data were missing); MMSE = Mini Mental State Examination Score, standard screening test on cognitive impairment (maximum score 30, <24 indicates positive on dementia).
WBC = white blood cell; CRP = C-reactive protein; RBC = red blood cell; MCV = mean cell volume; HbA1c = glycosilated haemoglobin; HDL = high-density lipoprotein; ANA = antinuclear antibodies; TSH = thyroid-stimulating hormone; fT3 = free triiodothyronine; fT4 = free thyroxine.
a Laboratory tests were performed by using standard techniques. Descriptive statistics were not shown, as being of the minor relevance for the topic.
Blood sampling
From each patient, blood samples were collected twice, before and four weeks after vaccination. Serum samples for HI antibodies were separated on each occasion and kept at −40°C until analysed. At first blood sampling, specimens were also obtained for laboratory tests. Haematological analyses were carried out from fresh blood samples while sera for biochemical analyses were separated and stored until assayed. Laboratory analyses were performed in the Central Biochemical Laboratory of the Osijek Clinical Hospital, using standard techniques.
Data analysis
Selection of health parameters
On the prepared data set, we applied data mining algorithms based on Machine Learning methods to identify age-related health disorders with the potentially largest negative impact on serological responses to influenza vaccine. This was the key step in the research protocol, resulting in the selection of relevant health parameters (Figure 1).
Data mining is a group of robust computational techniques able to extract and interpret information contents (patterns) from massive biomedical databases (Witten and Frank, Reference Witten and Frank2005). In this study, we used algorithms of the ILLM (Inductive Learning by Logic Minimization) system, developed in the Laboratory for Information Systems, Institute Rudjer Bošković, Zagreb, because of the availability and good classification and pattern recognition properties of this method (Gamberger and Šmuc, Reference Gamberger and Šmuc2001; Gamberger et al., Reference Gamberger, Šmuc and Lavrač2003). The result of applying the ILLM algorithms to the prepared data set is a cluster of six parameters, most strongly associated with the target outcome value, with the first parameter on the list ranking most important. Statistical measures ‘sensitivity’ (the accuracy of the true positive results of the classification procedure) and ‘specificity’ (the accuracy of the true negative results of the classification procedure) are used for expression of the statistically significant properties of the parameters selected in the cluster (Gamberger and Šmuc, Reference Gamberger and Šmuc2001; Gamberger et al., Reference Gamberger, Šmuc and Lavrač2003). As no unique definition of the target outcome value (low antibody response to influenza vaccine) is possible, because influenza vaccines are trivalent and factors related to past influenza viruses exposure strongly affect vaccination outcomes, we set up a maximum number (four) of reasonable definitions (not presented), allowing selection, from the initial dataset, of the four sets of health parameters. In making definitions, we tried to maximally exclude the influence of factors related to past influenza virus exposure, to allow health-related parameters to gain their full effect.
Generating prediction models
For binary outcomes, such as in this study, cases with good or poor antibody induction after influenza vaccination, the classical approach is to develop prognostic logistic regression models (Campbell, Reference Campbell2006). We performed full model regression, with all parameters included, and two reduced forms, with the forward and backward parameter selection, using StatSoft, Inc. (2008), STATISTICA (data analysis software system), version 8.0, www.statsoft.com.
A negative outcome for the model (poor response to influenza vaccine) was defined by a positive result of HI test for only one, or none, of three vaccine components (A/H1N1, A/H3N2, and B), whereas a positive outcome (good vaccination response) was defined by positive results of HI tests for two, or all three vaccine components. The model was based on 56 eligible patients.
In the model candidate health parameters, which had been selected from the data set using data mining methods, were combined with information on past influenza virus exposure, including the number of past vaccinations, pre-existing antibody titres and heterologous reaction, indicated by the vaccine component B/Sicuan. The parameter ‘the number of past vaccinations’, was expressed as categories: vaccinated for the first time (n = 37), previously vaccinated once (n = 19), previously vaccinated two to three times (n = 13), and four or more previous vaccinations (n = 24).
Results
Data mining models
By applying ILLM algorithms to the prepared database four recognisable patterns (clusters) in the data associated with low serological response to influenza vaccination were identified (Table 4). Owing to the partial overlap among the patterns, the intial 24 selected parameters were further reduced to 16 parameters (Table 5).
fT4 = free thyroxine; MCV = mean cell volume; TSH = thyroid-stimulating hormone.
aAttribute ranking: strength of association with a poor response to vaccination.
fT4 = free thyroxine; MCV = mean cell volume; TSH = thyroid-stimulating hormone.
Within this pool of 16 selected parameters, four of them, ranking best in a cluster, or according to the statistical measures ‘sensitivity’ and ‘specificity’, including ‘monocyte %’, ‘lymphocyte %’, ‘vitamin B12’, and ‘homocysteine’, can be especially important (Table 4). On the basis of the existing knowledge, these four parameters are likely to indicate two pairs of disorders, including increased percent of monocytes and decreased percent of lymphocytes in white blood cell (WBC) differential (indicating the switch from the specific to nonspecific immune reaction), and mutually related metabolic disorders, vitamin B12 deficiency, and hyperhomocysteinaemia.
Logistic regression models
Three types of the logistic regression models predicted 76.9% (full model) and 75.8% (both, forward and backward types) of the total results of responses to influenza vaccine (P = 0.00, 0.04 and 0.00, respectively; Table 6).
aMCV = mean cell volume; TSH = thyroid-stimulating hormone; fT4 = free thyroxine.
Among parameters included in the model, the one indicating older age was not selected as an independent predictor (Table 6). Factors related to past influenza virus exposure showed the greatest influence, especially the number of past vaccinations (Table 6). In particular, past vaccinations of two to three times are likely to have the beneficial effect (OR 0.06, 95% CI 0.00–0.63; Table 6, full model). Immune reaction to influenza vaccine component from the recent past (heterologous reaction), as in this case study with the B/Sicuan influenza vaccine component, also showed significant, albeit negative effect (OR 1.05, 95% CI 1.00–1.10; Table 6, full model). In contrast to the factors related to past influenza exposure, parameters indicating chronic health disorders showed only minor effect (negative; Table 6, full model). The results of the forward and backward model types showed that only a few health parameters are sufficient for prediction, including relative lymphopaenia (decreased percent of lymphocytes in WBC differential; OR 0.94, 95% CI 0.88–0.99), vitamin B12 deficiency (OR 0.99, 95% CI 0.99–1.00), and hyperhomocysteinaemia (OR 1.15, 95% CI 0.99–1.32; Table 6).
Discussion
Modelling responses to influenza vaccine
In this study, we have shown that by using a systems biology methodology approach it is possible to identify health parameters that can be used to build useful models to predict responses to influenza vaccine. Good model performances, including the high likelihood level for prediction (indicated by significant P-value) and good predictive accuracy (of 76.9%) support its practical usefulness, although obtained on a small sample (Table 6; Campbell, Reference Campbell2006). In addition, narrow confidence intervals of health parameters imply their mean values as if the sample was large, instead of small (Table 6; Campbell, Reference Campbell2006). The latter characteristic may be due to the process of health parameters being pre-selected using Machine Learning methods. However, there are also some limitations in terms of model's applicability. As the patients in the sample were recruited from the local area in Croatia where a high prevalence of chronic diseases have been recorded, the model should best be applied in this local population group. To be valid as a practical screening tool in other settings, the model should be retested with a larger sample, relevant to the settings in which it might be used.
Our results indicate that factors related to past influenza exposure are preferable for prediction, compared with factors related to chronic health disorders (Table 6). The reason the parameter ‘older age’ remained unselected could be its contribution in logistic regression as a confounder, relative to the two other factors, past influenza exposure and chronic health disorders, both known as being age-dependent (Webster, Reference Webster2000). From a practical point of view, our results also indicate that only a few health parameters, such as those indicating B-vitamin deficiency, hyperhomocysteinaemia, and relative lymphopaenia, are needed for the model to achieve a reasonable level of prediction (Table 6, forward and backward model types). Parameters indicating metabolic disorders, B-vitamin deficiency, and hyperhomocysteinaemia, although not proved as causally related with poor antibody responses to influenza vaccine, are likely to provide common mechanisms to link the burden of chronic ageing diseases with lymphopaenia and other age-related immune system dysfunctions.
There are two lines of evidence supporting this assumption. The first, derived from the literature, is that these parameters can serve as markers of decreased turnover of immunocompetent cells and of the switch from the specific to non-specific and cellular immune response (in our results indicated by increased percent of monocytes and decreased percent of lymphocytes in the differential WBC; Fenech et al., Reference Fenech, Dreosti and Rinaldi1997). In a broader context, these changes can be considered as markers of an impaired methylation reaction, a biochemical process, which when impaired can manifest as DNA damage, genome instability, impaired cell proliferation and insufficient neurotransmitter synthesis, intermediate mechanisms during the development of chronic ageing diseases (Schroecksnadel et al., Reference Schroecksnadel, Frick, Wirleitner, Winkler, Schennach and Fuchs2002).
The second line of evidence, suggesting there may be a causal relationship between key health parameters and the vaccination response that the model predicts, arises from the theoretical background of the complex systems science that the study is based on. Accordingly, identified key components may be reflective of the functional integration of the elements within the common biological network (in this case, linking the burden of chronic ageing diseases with the immune system dysfunction; Kitano, Reference Kitano2002; Iris, Reference Iris2008). The rest of the pool of selected 16 parameters can also be useful, although in some other situations of vaccinations. However, this statement has yet to be proved, by the application of these results in different situations of vaccinations and on different samples of the defined high-risk population for influenza complications.
Actually, those selected parameters overlapping between two or more data mining models are likely to indicate common intermediate mechanisms linking chronic diseases with the immune system dysfunctions (Table 5). Parameters specifically selected in particular models are likely to indicate more specific, relatively well defined clinical conditions (clinical domains), very likely associated with poor responsiveness to influenza vaccine. By using past knowledge, these clinical conditions may link with impaired renal function, especially syndrome characterised with hyperhomocysteinaemia (model 1), chronic gastritis, caused by Helicobacter pylori infection and accompanied with chronic nonspecific immune reaction (model 2), a syndrome composed of glucose metabolism impairment and protein malnutrition (model 3), and ageing of the hypothalamus and the pituitary gland, accompanied with the neuroendocrine system dysfunction (model 4; Table 5). Although laboratory tests are prefered over diagnoses of chronic diseases, these results, implicating clusters of pathogenetic disorders, can provide physicians with information needed for initial screening of older patients who are potentially at higher risk for deficient responses to influenza vaccine.
Implications of implementing a systems biology methodology approach in research in family medicine
Risk charts and scores have been developed to assess the risk for cardiovascular events. The major risk factors were identified some time ago in large prospective cohort studies, but as evidence accumulates there is a tendency to add new risk factors into revised risk scores (Cooper et al., Reference Cooper, Miller and Humphries2005). Another emerging field, also using simple clinical parameters, is the early detection of type 2 diabetes (Rahman et al., Reference Rahman, Simmons, Harding, Wareham and Griffin2008). There is, however, a challenge to know whether a uniform, generally applicable risk assessment tool can be developed, as risk functions depend on the characteristics of the studied population (The European Society on Cardiology and The European Association for the Study of Diabetes Guidelines, 2007). This observation is due to the findings that each population may have a different distribution of risk factors and that the same risk factors may not have the same effect in determing diseases in different population groups. In addition, there are changes in trends over time, as well as the accumulation of new knowledge, which suggest there is a need for a more dynamic and adaptable framework to prepare effective risk scores (Majnarić-Trtica et al. (Reference Majnarić-Trtica, Vitale and Martinis2007; Reference Majnarić-Trtica, Vitale, Martinis and Reiner2010a)). Efforts to develop simple, practical risk scores in some other clinical disciplines, such as prognosis of particular types of cancer, dementia, or influenza vaccination outcomes, are faced with even more difficulties, since for these medical problems simple clinical and demographic prognostic markers have not yet been defined (Jellema et al., Reference Jellema, van der Windt, Bruinvels, Mallen, van Wajenberg, Mulder and de Vet2010; Landau, Reference Landau, Harvey, Madison, Reiman, Foster, Aisen, Petersen, Shaw, Trojanowski, Jack, Weiner and Jagust2010).
On the basis of the results of this study, we propose that a systems biology methodology, considered as both, a systematic health parameters record and a step-wise research protocol, allowing research to begin from poorly proved theory – might be that feasible research framework capable of solving many uncertainties when planning preventive interventions in family medicine. This is made possible by recent advances in information technology such as Machine Learning. In addition, due to the possibility of the modelling process of being separately performed for each problem, or data set, research can be specifically tailored to fit the needs of the local environment.
Subsequently, over time, a common health data set, appropriate for modelling of many medical problems and reflecting the specific features of local population groups, can be selected. This is likely to be due to the characteristics of chronic ageing diseases, such as an overlap between risk factors and clinical expression of related diseases, shared pathogenetic mechanisms, as well as the tendency of these diseases to appear in a cluster (several diseases and morbid conditions occurring in the same person; Buchanan et al., Reference Buchanan, Weiss and Fullerton2006). If information from other sources, such as those on family history, socio-economic status, local environment, occupation, or specific genetic traits, are added to the basic health database, various comprehensive conclusions, based on the modelling, can be drawn (Griffiths, Reference Griffiths1998; Majnarić-Trtica et al., Reference Majnarić-Trtica, Vitale, Kovačić and Martinis2009). This may substantially facilitate preventive programmes implementation.
The increasing complexity of health care implies a need for new research methods (Van Weel and Knottnerus, Reference Van Weel and Knottnerus1999; Plsek and Greenhalgh, Reference Plsek and Greenhalgh2001). Decision-making relies on multiple factors, some of which are as yet undefined, suggesting that a systems biology methodology may prove useful. This is particularly appropriate in family medicine, where electronic health records provide the opportunity for data collection and integration by using advanced computer-based techniques (Majnarić-Trtica et al., Reference Majnarić-Trtica, Zekić-Sušac, Šarlija and Vitale2010b). We propose that a limited number of practices should focus on research and models construction, with computer programmers and mathematicians working as a part of research teams, whereas the remaining practices can be the places where models are tested for their utility in everyday practice.
An ongoing process of models construction and application
Systems biology can be considered as a cyclic, long-term and ongoing research protocol (Iris, Reference Iris2008; Figure 2). Construction of models, especially for problems lacking in evidence, as in this case with influenza vaccination outcome, can only be the first, exploratory phase of research where parameters potentially relevant to the problem are mapped in the large, unknown input space. During the second, training phase of the protocol, the challenge is to repeat the same procedure of selecting parameters on other samples, to find whether the same factors are extracted when the sample is changed. The next phase of the protocol is testing the feasibility of models by means of their predictive accuracy, cost-effectiveness and the availability of the parameters that make up the model.
The practical usefulness of constructed models, when applied to programmes such as patient selection for immunisation, should be assessed through the ongoing process of real-life models application, by means of observational studies (Hannoun et al., Reference Hannoun, Megas and Piercy2004). During this process, models and parameters can further be corrected (Figure 2).
Adopting a systems biology approach in research in family medicine can generally be expected to be useful in identifying the most appropriate target groups for preventive interventions implementation.
Acknowledgement
We would like to express our grateful thanks to Dr. Dragan Gamberger, Head of the Laboratory for Information Systems, Institute Ruđer Bošković, Zagreb, for his efforts and time spent in performing Data Mining models, necessary for data processing.