There is substantial evidence linking dietary factors to the primary and secondary prevention of major chronic diseases such as heart disease, diabetes and certain cancers(Reference Brunner, Rees and Ward1–3), as well as the improvement of functions, for example muscle function or immune response(Reference Asp, Cummings and Mensink4, Reference Asp, Cummings and Howlett5). Thus, it is important for public health agencies and the food industry to be aware of these links and to provide messages and products that will facilitate the consumption of healthy diets by consumers. However, it is important that these messages and products are supported by good scientific evidence.
The aim of the present study is to provide guidelines for the design, conduct and reporting of human intervention studies. These guidelines should assist with studies designed to support nutrition science in a broad sense, and also aim to substantiate health claims for foods. In the present study, the term ‘foods’ is used to mean foods, dietary supplements and food constituents, but does not cover whole diets.
These guidelines, which are the consensus view of an International Life Sciences Institute Europe Expert Group, were finalised following exchanges between representatives from industry, academia and regulatory bodies(Reference Aggett, Antoine and de Vries6). The Expert Group carried out an initial survey of relevant research papers published within pre-defined periods in two leading peer-reviewed journals (Am J Clin Nutr; Eur J Nutr). This database was subsequently augmented with selected research papers, to provide examples not present in the initial selection. This survey facilitated the identification of the range, and the perceived strengths and weaknesses of currently reported methodologies, and these guidelines cite relevant selected papers as examples of current practice.
The major factors involved in the design, conduct and reporting of studies are identified in Table 1, which uses a similar structure to that in the Consolidated Standards of Reporting Trials (CONSORT) checklist for medical trials(Reference Moher, Hopewell and Schulz7). These factors are discussed in the text under the same main headings used in Table 1.
ITT, intention to treat; PP, per protocol.
Summary of existing guidelines
Although earlier guidelines exist(Reference Sandstrom8), newer guidelines are emerging in response to legal requirements(9). These guidelines include those aimed at particular products and health benefits(Reference AbuMweis, Jew and Jones10–Reference Shane, Cabana and Vidry12). However, there is a lack of a comprehensive description on how to perform human nutrition intervention studies in general, and particularly for the evaluation and substantiation of health claims on foods. Besides the ‘PASSCLAIM (Process for the Assessment of Scientific Support for Claims on Foods) Consensus Criteria’(Reference Aggett, Antoine and Asp13), which so far is the most conclusive summary, various collections of advice exist, which are given fragmentarily in legal regulations or guidance reports from international and national authorities or organisations. These are outlined below.
European Food Safety Authority Scientific and Technical Guidance(14) and the ‘Application Rules on Health Claims’ in commission Regulation EC 353/2008(15) provide only minor information on the conduct of human intervention studies. Within Europe, additional information is available from the Joint Health Claims Initiative(16), where some criteria on the validity of human studies are listed. The ‘US Food and Drug Administration Guidance for Industry’ (2009)(17) gives a broad description on human intervention studies in a question and answer style, which guides through relevant human intervention issues. Important aspects for claim evaluation can also be drawn from the updated ‘Health Canada Guidance Document’ (2009)(18). The varying approaches to health claim evaluation in Europe, the USA, Canada, Australia and New Zealand, China and Japan have quite recently been reviewed in a supplement issue of J Nutr (Reference Jones, Asp and Silva19). Additionally, the ‘FAO/WHO Codex Alimentarius Commission’ (2008)(20) gives basic criteria for health claim evaluation. However, a concise summary of evaluation criteria focusing on the conduct of human intervention studies is not currently available. An attempt in this direction may be the ‘Nutrition, Health and Related Claims Consultation Paper (Proposal P293, to be finalized by late 2011)’(21) by ‘Food Standards Australia New Zealand’, which may provide details on the evaluation and substantiation requirements for health claims.
Factors to be considered in the design, conduct and reporting of human intervention studies
Study hypothesis
The primary hypothesis to be tested will directly influence all other aspects of the study, including the study design and duration, the eligibility criteria, the amount of the test product and the nature of the control. The hypothesis should be based on a thorough review of the available evidence. The primary outcome measure should be clearly defined and relate to the hypothesis.
Current practice
The review of current practice showed that only a minority of the papers mentioned the term ‘hypothesis’, with the majority framing the research question as ‘aims’, ‘objectives’ or ‘purposes’. While hypotheses can sometimes be inferred from aims or objectives, this is not always the case and therefore it is recommended that studies have a clearly stated hypothesis.
Study design
Human nutrition intervention studies test hypotheses that have been formulated based on prior knowledge. Prior knowledge will include data from other human intervention studies, and also epidemiological, animal and in vitro studies. Where possible, all the available evidence should be reviewed systematically(Reference Moher and Tricco22) for efficacy, and include an assessment of safety and potential risks.
Exploratory studies generally have a number of aims. These may include evaluations of food matrix issues and the amount to be consumed, and can also provide data on the variability and time scale of outcome responses and the size of the effect on outcomes responses, which can be used for power calculations in subsequent studies.
Once these early studies have been completed, studies with greater rigour will test the hypothesis that the product will alter the expected outcome measures. Usually, a series of studies will be conducted, with later studies extending the work as the evidence accrues. Examples include increasing the range of populations studied, using new and/or longer-term outcome measures, assessing the minimum effective amount to be consumed and evaluating different forms of presentation or delivery of the test product.
The following three basic study designs are encountered: single-arm studies; cross-over studies; parallel studies. Early exploratory studies tend to be single arm with no control group, and these can be a cost- and time-effective way of assessing potential effects, but usually only as a forerunner to controlled studies. These studies add to the totality of evidence but cannot alone determine the effect of intervention.
In controlled studies, in addition to a group of participants receiving the active nutrition intervention, there will be another group that will act as a control. The outcome for this latter group provides a suitable comparator, as it is generally inappropriate to assume that changes observed in the group receiving the active intervention during the study are necessarily entirely attributable to that particular intervention; other factors may be responsible for them. For example, the knowledge that a participant is receiving an active intervention may alter their responses, particularly when dealing with more subjective outcomes such as quality of life scales.
In parallel-group designs, each participant receives only one of the nutrition interventions (product A or B, active intervention or control) under study. Comparisons between groups must therefore be made on a between-participant basis. However, in some studies, it may be feasible to use a different design in which participants receive more than one intervention. In studies using cross-over designs, participants receive all interventions under comparison, and the design specifies the order of interventions. This has the advantage that comparisons between interventions can be made on a within-participant basis with a consequent improvement in precision of the comparisons and therefore power of the study. In such designs, participants act as their own controls. In a cross-over design for two interventions, the participants are allocated to two groups which receive interventions in a different order. Assessments are performed at the end of each intervention period, although in some cross-over studies, baseline measurements may also be taken at the start of each intervention period. Depending on the intervention and outcome measure, a washout period may be required between intervention periods to avoid contamination or carry-over effects. Also a run-in period may be desirable in advance to minimise order effects. During this period, participants may be asked to avoid certain foods. A Latin square design may be used, where appropriate, to extend cross-over studies to more than two interventions. However, any increase in study length may increase participant dropout rate.
For studies that require longer-term interventions, parallel studies are usually preferred, because of their shorter overall time frame. Furthermore, parallel studies are essential where a washout period may be ineffective at returning outcome measures to baseline, for example in certain tests of cognitive function. Parallel studies are also required where intentionally returning to baseline may be unethical, for example if body weight or bone mineral density may be affected. Parallel studies are least suited to outcomes that show large inter-participant variation. Cross-over studies are favoured where participant availability may be restricted, and in very short-term studies, for example postprandial studies to evaluate glycaemic responses, or satiety and energy intakes. However, they are adversely affected by dropouts and necessitate a more complex analysis methodology. The choice of study design will depend on these considerations, but also the availability of time and other resources, and the potential roles of confounding factors such as seasonal variations.
Other less commonly used types of study include (1) the factorial design (in which participants are allocated to all possible combinations of two or more interventions and which permits the evaluation of intervention interactions) and (2) the cluster randomised design (in which the unit of randomisation is not the individual but a cluster of individuals defined, for example, by family, school class or primary care group). Further guidance on these designs is available in statistical texts on clinical trials(Reference Pocock23–Reference Peace and Chen27).
Study duration
The study duration should be sufficient to allow changes in the primary outcome measure. Thus, the duration will be informed by data from exploratory studies, from knowledge of the underlying physiology and biochemistry, particularly the rates of turnover of relevant tissues such as erythrocytes, or from similar studies that have used the same outcomes. It also relates to the envisaged claim or hypothesis, which may focus on an acute effect (e.g. glycaemic response or increased alertness) v. a longer-term health outcome. Thus, no absolute guidelines can be given on study duration. However, researchers should aim to use the minimum feasible duration for ethical reasons, to conserve resources and to avoid participant fatigue leading to non-compliance or withdrawal. In some cases, post-study follow-up measures are desirable to evaluate persistence or other longer-term effects.
Test and control product
Test product
The test product will be the supplement, ingredient or food under investigation. Consideration must be given, however to the intended use of the test product, and the study design should take this into account. For example, if it is intended that the test product be consumed as part of a mixed meal, consumed once a day, then the study design should be testing that pattern of consumption, and details of frequency and timing of ingestion reported.
Amount consumed
The amount of the test product to be consumed will depend on a number of factors. These include evaluation of data from all previous studies, and consideration of the underlying physiology and of issues related to the food matrix, palatability and bioavailability. However, the amount to be consumed should be close to that intended for practical use. Furthermore, it is important to give relevant documentary evidence of the amount of the test product or the component with putative activity that is provided.
Control product
The control is a product that does not provide the component that is being tested, and this must also be analytically documented. The control should be matched for sensory characteristics and taken in the same way as the test product. A control is relatively easy to achieve in supplementation studies using pills or similar preparations, but in studies of foods, it is more difficult to develop a control product identical to the test product but which does not contain the component(s) under study. Blinding may not be possible for many foods where the test product is easily identifiable by the researchers, as may be the case with some minimally processed foods such as fruits or vegetables, and some manufactured consumer foods such as cereal products. However, some degree of blinding may be possible by the use of suitable packaging that conceals products from the researchers and other study participants.
Success in attaining an ideal control is likely to vary depending on the type of ingredient or product under test. Ideal controls for different product types are considered below.
Supplements (pills, powders, liquids of small volume)
The provision of high-quality control products should be relatively easy. Pills should be of identical size, shape, colour and appearance. Ensuring identical internal colour, appearance, mouthfeel and taste for all supplements is also desirable.
Food ingredients (e.g. fibres, starches, proteins, fats)
These can be evaluated in the form of supplements (see the previous section) or incorporated into suitable consumer foods (see the next section). The potential effects of interaction with other food components, and/or potential effects of processing (e.g. degree of loss or modification, effects on bioavailability) must be considered.
Manufactured consumer foods (e.g. cereal products, juices, prepared dishes, yogurts)
Test and control foods should be similar in energy content and in physical characteristics (gross morphology, appearance, volume and texture) and sensory qualities (mouthfeel, taste, palatability and breakdown characteristics in the mouth). If identical products cannot be produced, then it is possible that compositional, physical or sensory differences in the test and control products, unrelated to the factor under test, may exert confounding physiological or behavioural effects unrelated to the factor under test, and will also make effective blinding difficult. In these circumstances, appropriate physiological or behavioural responses should be monitored and compared for the test and control products.
Minimally processed foods (e.g. fresh fruit, vegetables, nuts, eggs, grains)
The formulation of control products is impossible if the aim is to use a single foodstuff, such as fruit or nuts. If the control arm only involves the provision of no test product or a smaller number of portions of the test product, then this may have effects on other aspects of diet and behaviour.
Outcome measures
All intervention studies will assess outcome measures, and will compare these between intervention test product and control groups, if a control group features in the study design. Most studies will have a range of outcome measures, but the study should be powered based on a pre-specified primary outcome and the sample size calculated based on that outcome (see the Size of study section). Similarly, if an outcome is assessed at several time points over the course of the study, either a single time point or a single summary measure of results at several time points should be pre-specified as the primary measure. All outcomes measures, whether primary or secondary, should be stated and defined in the study protocol.
It is essential that the outcome measure is of biological relevance. In some cases, the outcome measure is clearly relevant as it is a direct, objective measure of the intended effect, for example body weight, or diagnosis of a disease or muscle strength. Subjective measures are also used, such as feelings of health, appetite or fatigue, and, in these cases, it is important to use validated instruments if these are available. When the effect cannot be measured directly, indirect or surrogate factors such as biomarkers or risk factors are used that reflect a functional, physiological or biochemical characteristic associated with a function or a disease, or that predict later development of the disease. Examples include glycated Hb as an indicator of long-term hyperglycaemia and risk of type 2 diabetes complications(28), plasma LDL-cholesterol as a measure of CVD risk(Reference Mensink, Aro and Den Hond29), bone mineral density as a measure of osteoporosis risk(Reference Prentice, Bonjour and Branca30), complex metabolomic or proteomic profiles as markers of function and disease risk(Reference Zhang, Yap and Wei31), and the presence of adenomatous colon polyps as an early indicator of colon cancer(Reference Winawer, Zauber and Ho32). Most indirect outcome measures are chosen because they reflect consensus guidelines or are commonly used by experts in the area. However, very few markers have been validated and recognised by expert consensus in terms of their specificity, variability, limitations and applicability to a range of population groups(Reference Aggett, Antoine and Asp13).
Methodological aspects
Efforts should be made to standardise all outcome measure assessments and reduce measurement error as far as possible (e.g. by standardising measurement protocols, training observers and averaging several measurements rather than using a single measurement), especially if measurement errors are known to be large.
Analytical variability
Laboratory analytical methods should be precise, accurate, sensitive and specific, and these performance characteristics should be recorded in standard operating procedures or similar quality record documents. Intra-laboratory analytical variability should be minimised by using automated equipment to analyse samples in duplicate or triplicate, in batches that represent the range of interventions, participants and sampling times, with suitable internal and external standards and participation in quality assurance programmes. Ideally, all samples from a study should be analysed at the same time, and all samples from an individual participant in one run, but this may be precluded by degradation in storage, even at low temperatures. Inter-laboratory analytical variability, which may be a factor in multi-centre studies, should either be minimised by the sharing of methods, standard operating procedures and calibration standards, or be overcome by centralisation of sample analysis. Where appropriate, statistical analysis should be used to account for any remaining inter-laboratory variation. Biomarkers that have high methodological variability will often require a higher number of participants.
Biological variability
Biological variability has a number of underlying factors, and such variability is likely, in part, to be genetically determined. Many biomarkers, similar to most biological parameters, have circadian or seasonal variations. The basal value can also fluctuate due to biological rhythms, such as the menstrual cycle. This variability may introduce systematic bias into the results. Thus, it is important to understand the factors underlying this variability for the biomarkers, and to take samples or adapt the study design accordingly.
Biologically meaningful changes
A study may show a statistically significant change in a validated biomarker. However, a statistically significant response of an outcome measure to a nutrition intervention does not necessarily mean that the intervention will be effective in terms of benefit or risk reduction in target groups. Therefore, the size of changes and whether these will be of biological, clinical or public health significance must also be considered.
Selection of participants: eligibility criteria
Eligibility criteria are functional, physiological or clinical characteristics or demographic variables used to define the study population. Eligibility criteria may also include lifestyle factors such as smoking habit or level of physical activity, and dietary factors such as low fibre intakes, or the consumption of restricted diets. Eligibility criteria can be presented as inclusion and exclusion criteria. The criteria are likely to include factors such as age, sex, health status, and underlying physiological conditions or concurrent diseases.
Eligibility criteria should describe participants adequately, so that the results can be appropriately interpreted in terms of their generalisability. Eligibility criteria should be selected with the target population for the test product, and the hypothesis and outcome measures in mind. Inter-participant variation may usefully be reduced by using stricter eligibility criteria to select a more homogeneous set of participants for study, but this also has the disadvantage of restricting the target population and consequently limiting the generalisability of findings. Children and women of childbearing age may need to be excluded from studies of certain interventions with developmental implications or teratogenic potential.
It is important to define eligibility criteria using objective, quantitative descriptors wherever possible. For example, many nutrition interventions use ‘apparently healthy’ participants. Health may be evaluated by using a questionnaire on medical history and surgical events, or this may be extended to a physical examination and screening of blood and urine(Reference Perez-Martinez, Yiannakouris and Lopez-Miranda33). ‘Health’ may just refer to the absence of diagnosed disease, or refer to a specific aspect such as a healthy blood pressure, and in such cases, the criteria can be very specific(Reference Lee, Skurk and Hennig34) and may follow official guidelines. However, ‘apparently healthy’ may also include a healthy lifestyle, which could be assessed using questionnaires, for example, for physical activity, dietary habits, smoking, alcohol and medication use.
Current practice
The majority of human intervention studies report and define eligibility criteria to some extent. However, most papers do not give all eligibility criteria, quantifiable ranges for these criteria or a clear rationale for these criteria, and do not relate the criteria to the hypothesis being tested. In addition, many studies mention ‘healthy’ as an inclusion criterion without defining ‘health’ status. An example of good practice is Brink et al. (Reference Brink, Coxam and Robins35), where inclusion and exclusion criteria are described with quantifiable ranges for many of the criteria.
Statistical considerations
Randomisation
Randomisation is the allocation of interventions to participants using some random process such as the toss of a coin. Randomisation ensures that the investigator does not influence the intervention to which a participant is allocated. The main advantage of random allocation is that, in the long run, it will produce study groups, which are comparable with respect to both known and unknown factors, which could influence the outcome measure. Consequently, any observed difference in the responses of the two intervention groups is likely to be due to the effects of the intervention. Randomisation helps to ensure that the comparison of interventions is fair (unbiased) and the statistical analysis is valid.
To allocate individual participants to intervention groups, random numbers (either from tables or generated by computer) can be used. However, it is advisable to ensure that approximately equal numbers of participants are assigned to each group by using a block randomisation, in which participants are divided into blocks within which equal numbers of allocations to each intervention are made. To avoid any possible predictability of the allocations at the end of a block it is advisable to vary the block size. It is often desirable to stratify participants into subgroups defined by important variables such as age, sex and ethnicity that could influence the response to intervention. A restricted randomisation is then conducted within each subgroup. Stratification will generally result in more comparable study groups and can also reduce variability in the response measure when incorporated in the statistical analysis. Minimisation offers a more practicable approach to stratification on multiple variables.
Concealment of the intervention allocations
The decision to enrol a participant in a study could be influenced subconsciously or otherwise by the knowledge of which intervention the participant would receive if entered into the study. A simple way to eliminate any possible bias of this sort is to implement randomisation using sealed envelopes. The list of random intervention allocations is concealed in sequentially numbered opaque sealed envelopes. Only after a participant has been enrolled in the study and consent obtained should the seal be broken to reveal which intervention the participant has been allocated.
Blinding
The assessment of study outcomes may be influenced by knowledge of which intervention was received, particularly for subjective outcomes. Such bias can be avoided by using blinded assessment. If neither the assessor nor the participant knows which intervention the participant received, then the study is double blind. If the participant knows but the assessor does not (or vice versa), then the study is single blind. Blinding should also be carried through into laboratory determinations and statistical analysis. The time of unblinding, usually after the freezing of the database, should be documented in the study report and may be mentioned in the publication.
Where possible, and particularly for food products, the effectiveness of blinding should be assessed at the end of the study and commented on in the study report. This can be achieved by the use of a simple questionnaire asking participants which product (test or control) they thought they were consuming. Currently, this information is rarely reported.
Size of study
Estimation of the number of participants required for the study is essential. Too small a study is likely to fail to detect important differences between interventions, while too large a study may needlessly waste resources and be unethical. In certain circumstances, trials may be designed for interim analysis as each participant's result becomes available (sequential design) or after pre-specified numbers of participants' results become available (group-sequential designs)(Reference Pocock23–Reference Peace and Chen27). These designs are ethically appealing because they ensure that inferior interventions are quickly identified, so minimising the numbers receiving them. However, even when such early termination is feasible, it is not always advisable since it can lead to intervention effects being estimated with poor precision. Usual methods for sample size estimation require specification of the magnitude of the smallest meaningful difference in the outcome variable. The study must be sufficiently large to have acceptable power to detect this difference as statistically significant, and must take into account possible non-compliance and the anticipated dropout rate. Information about the degree of variability in the outcome is also required and may come from previously published or unpublished results, or from a pilot or exploratory study specifically performed for the purpose. A multi-centre study may be necessary if the study size is too large to be performed in a single centre. Statisticians are key members of research teams, and it is recommended they are involved at an early stage, not only in study size calculation, but also in planning the design of the study.
Ethical approval and study registration
Researchers should determine the appropriate local ethical approval and research governance procedures required for their study, and seek these approvals before the study commences. While not all nutrition research may be classified as medical research, it is recommended that researchers adhere to the World Medical Association's Declaration of Helsinki(36). One of the recommendations of the Declaration of Helsinki is that every clinical trial (including human nutrition intervention studies) must be registered in a publicly accessible database before recruitment of the first participant. Such registration, with accompanying protocol details, is intended to reduce the consequences of non-publication of studies (e.g. repetition of negative studies), selective reporting of outcomes, and of per-protocol rather than intention-to-treat analyses (see the Statistical analysis section). The WHO(37) has stated that ‘the registration of all interventional trials is a scientific, ethical and moral responsibility’ while the International Committee of Medical Journal Editors have, from September 2004, only considered trials for publication if they were registered before enrolment of their first participant(38). The academic view is that a priori trial registration is essential for ethical research in human participants. For industry-based nutrition studies, where the results may be commercially sensitive or where intellectual property rights may be jeopardised, a position similar to that in the pharmaceutical industry could be adopted, whereby five items of protocol information are kept hidden in a locked electronic depository that is publicly inaccessible until the information is no longer deemed commercially sensitive(Reference Krleza-Jeric39).
Recruitment and participant flow
The study protocol should state the methods by which participants will be recruited, and details of the recruitment process should be carefully described, with details of participants approached, screened, recruited and completing, and reasons for non-recruitment (ineligibility, lack of willingness to participate) and non-completion noted. Informed consent should be obtained. This information is best summarised in a participant flow diagram when reporting the study(Reference Moher, Hopewell and Schulz7).
Data collection
Data should be collected using standardised case report forms. Participants should be assigned a unique study number at the start of the study, and data held under that study number, i.e. there should be no participant identifiable information held by the researchers, other than a single sheet where the study number is linked to the participant contact details. All data, both paper and computer-based, should be kept securely and all data collection conducted in line with the required local regulations.
Background diet and change in diet during intervention
The nature of the participants' background diet may be one of the eligibility criteria. However, it is important, regardless of eligibility criteria, and particularly in longer-term studies, to collect background dietary information in order to characterise the habitual diet of the participants in terms of foods and diet composition. Diet should also be assessed during longer-term interventions, in order to detect any potentially confounding changes that may occur. A number of methodologies are available for these dietary assessments (e.g. FFQ, food diary, diet history(Reference Bingham40)), However, dietary intake assessment is subject to misreporting, and reported energy intakes should be compared with the estimated energy requirements of participants, particularly if these assessments are used for monitoring compliance.
Background health status and lifestyle, and changes in health status and lifestyle during intervention
In addition to their possible roles as eligibility criteria, it is also important to characterise the study population in terms of demographic background, health status and lifestyle behaviours, in order to allow suitable interpretation and generalisation of the results. Examples include age, sex, level of medication use, physical activity and smoking habit. The monitoring of health status and lifestyle behaviours should also be carried out in the course of longer-term studies to assess potential between-group differences, which may confound outcome measures. This is particularly important in studies where it is not possible to use closely matched test and control products.
Unintended effects
Aside from formal adverse events (AE), which are discussed below, in nutrition studies, there may be other unintended effects (to use recent CONSORT(Reference Moher, Hopewell and Schulz7) terminology). Unintended effects arising during nutrition studies are likely to be restricted to, for example, mild nausea or minor gastrointestinal discomfort as a result of changes in dietary pattern or consumption of unfamiliar or blinded products.
The recording of these unintended effects is desirable in human nutrition interventions, providing data on the tolerability of the product and enabling the compilation of a dossier for future reference. Questionnaires should be used to provide quantifiable data, using standardised formats where available, for example to assess gastrointestinal effects such as bloating or flatulence. Data should be collected at baseline and at suitable intervals during the study to assess onset and time course. Time, intervention and group effects should be tested statistically and, if significant, potential influence on compliance, withdrawal and outcome measures should be considered.
Adverse events
An AE is any untoward medical occurrence or undesirable clinical experience in a participant in a clinical trial, whether or not considered related to the intervention. Recording AE is of major importance in pharmaceutical studies, allowing a risk–benefit analysis. Hence, there is an abundance of guidelines for the management of AE in the clinical study setting (e.g. European Medicines Agency; International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use; US Department of Health and Human Services, Food and Drug Administration)(41–43). There are no guidelines for nutrition intervention studies, given that these studies involve testing foods, supplements or ingredients in participants who are usually apparently healthy. However, the formal recording of AE is required for good practice in nutrition research.
Any serious or unexpected AE that are encountered, whether or not they appear related to the intervention, should be reported immediately to the principal investigator, the research ethics committee, the sponsor and relevant regulatory bodies in order to ensure appropriate management.
Compliance
Any deviations from protocol can affect the validity and relevance of an intervention study. Low levels of participant compliance in nutrition studies decrease the power to detect effects, and cause the reporting of false negative findings, and ultimately a lack of evidence to support a potentially beneficial effect. Poor compliance in a particular subgroup will also reduce the generalisability of results. When compliance is very different between allocated groups, this may be because acceptability of the interventions differs. Therefore, a nutrition intervention study should aim to have measures in place to maximise and assess compliance.
Methods to measure and maximise compliance
The choice of compliance assessment methods will depend on study design, duration and intervention type. In acute or postprandial studies, the intervention is usually consumed only once under supervision and thus, compliance is not usually an issue. However, maintaining compliance throughout longer-term studies is very important. Informing participants that compliance will be measured by one of the methods below is likely, in itself, to improve adherence to the dietary intervention.
Complete provision of intervention, consumed under supervision
This will maximise compliance, but requires access to a metabolic suite or equivalent, and there will be resource implications. The suitability of this approach is likely to depend on the duration of the intervention.
Complete provision of intervention, with the return of unconsumed items
The return of unconsumed items is often used, but there can be no certainty that the participant has actually consumed all unreturned items.
Assessment of biomarkers in biological samples
Where possible, an independent and objective measure of compliance should be used, for example assessment of Se in serum or fatty acid composition of erythrocyte membranes(Reference AbuMweis, Jew and Jones10).
Using dietary records
Weighed food intakes, food diaries, food checklists and dietary recall methods, including diet histories, can be used. An inherent weakness of self-reported dietary intake data is the prevalence of socially desirable responding, where participants tend to under-report overall dietary intake and to over-report intakes of the intervention foods, and ‘healthy’ foods in general. Dietary records can usefully augment objective biological markers. If valid biomarkers of exposure do not exist, it is important to try to ensure that intakes are reliably reported.
A combination of the above
As no single method of assessing compliance is completely infallible, a combination of methods may be required.
In addition to the compliance assessment methods above, maintaining close, regular contact with participants is key to achieving good compliance, allowing any issues to be identified and dealt with at an early stage.
Acceptable levels of compliance
Acceptable levels of compliance for human nutrition studies have rarely been stated, and are difficult to comment on definitively. See later section on Statistical analysis for discussion of how compliance will affect statistical analysis. A decision on the statistical analysis approach will be partly influenced by whether studies are designed as tests of efficacy (biological effect) or effectiveness (potential to modify outcome in real-life situation), as the former studies will be more focused on maximising compliance. Making a decision on an acceptable level of compliance relies on an accurate, objective assessment of compliance as detailed above. A priori decisions should be made regarding the acceptable level of compliance for inclusion in a per-protocol analysis.
Current practice
An acceptable level of compliance is rarely reported quantitatively, with the extent of compliance usually reported qualitatively. However, as an example of good practice, Brink et al. (Reference Brink, Coxam and Robins35) monitored compliance by a combination of specially designed forms, return of unconsumed foods, and assessment of both plasma and urine concentrations of isoflavones. These authors reported that the level of compliance was high (94 %) and did not differ between intervention groups. Plasma isoflavone concentrations increased significantly in the active intervention groups.
Statistical analysis
There are a number of statistics books that cover the basics of randomised intervention trial methodology and analysis(Reference Pocock23–Reference Peace and Chen27). It is good practice to have a statistical plan that specifies the statistical methods to be used, the hypotheses to be tested for both primary and secondary outcomes (including whether one-sided or two-sided), and the significance level to be employed.
Rationale for using statistical methodology
In common with other research in medicine and the biological sciences, differences between groups which the investigator wishes to identify in a nutrition study are usually masked by several types of variation (inter- and intra-participant variation, measurement error, etc.), and strategies to minimise these have been outlined earlier.
These sources of variation mean that there is a need for the results of a study to be assessed objectively using appropriate statistical methodology. This section describes the basic statistical concepts necessary for the analysis of nutrition intervention studies. Although tests of hypotheses play a key role here, it is worth emphasising that the calculation of CI for intervention effects can often be more informative.
In general, statistical techniques require an assumption that the group under study may be considered to be a random sample from a target population about which inferences are to be made. In practice, there would be considerable practical difficulties in mounting an intervention study on a truly random sample from a target population, and usually a convenience sample such as a group of healthy volunteers or patients attending hospital outpatient clinic will be studied. The investigator should be particularly cautious in any extrapolation of findings beyond the population from which the study sample was drawn. It is also worth emphasising that statistical methods will only take account of sampling error (i.e. variation arising from the process of sampling); they cannot quantify the extent of biases attributable to non-random sampling, particularly bias that may be introduced through non-response.
Preliminary steps in data analysis
Before attempting any formal statistical comparisons, it is important to visualise the data with histograms and scatter diagrams to examine the shapes of distributions, to check for outliers and to establish the nature of any relationships between variables.
Suitable descriptive statistics should also be presented to characterise the participants under study, and an indispensable step in a comparative study will be to construct a table of participant characteristics by group. For quantitative variables, this should include both measures of location and measures of dispersion, typically the mean and standard deviation for roughly symmetrically distributed variables or the median and interquartile range for variables whose distribution is heavily skewed. For categorical variables, both frequencies and percentages should be included in this table. In an adequately randomised study, it is not usually considered necessary to perform statistical tests on these baseline group characteristics since any differences observed between groups must be due to chance.
Hypothesis tests for comparing groups
Along with the study design, the scale of measurement of the response variable is of fundamental importance in deciding which statistical analysis techniques to use. The following provides a brief description of statistical techniques suitable for simple randomisation studies.
Parametric methods
For a study using a parallel-group design and an interval-scale response variable (e.g. weight or blood pressure), the independent-samples t test will be used to compare two groups and one-way ANOVA to compare three or more groups(Reference Pocock23, Reference Machin and Fayers25). For the two-period cross-over study, a refinement of the paired t test, which takes account of variability attributable to period effects, is typically employed(Reference Hills and Armitage44). If baseline values of a response variable are available, then changes in the variable during the intervention may be calculated and used in the analysis, although it can be more beneficial to analyse the final value of the response in an ANCOVA with the initial value considered as the covariate. For studies that take more than two serial measurements of response variables, the derivation of a summary measure such as a slope or area under the curve may permit the application of straightforward statistical techniques and avoid more complex methods for correlated responses(Reference Matthews, Altman and Campbell45). Intervention effects, often expressed as means, or differences in means, should be estimated along with their associated 95 % CI.
Non-parametric methods
For ordinal-scale outcomes, non-parametric methods are typically employed with the Mann–Whitney U test used to compare two groups and Kruskal–Wallis one-way ANOVA of ranks to compare three or more groups(Reference Pocock23, Reference Machin and Fayers25). These techniques may also be used for analysing interval-scale response variables, which do not satisfy the assumptions for the parametric methods. However, these techniques focus on hypothesis testing, and confidence limits associated with these techniques are not widely available.
Contingency table methods
For nominal-scale (or unordered categorical) outcome variables, the analysis is performed using χ2 tests for contingency tables or Fisher's exact probability test where numbers are small. CI for proportions, for differences in proportions, for OR or for risk ratios may also be useful for characterising intervention effects.
If information on covariates is available, then it may be incorporated into a multiple regression analysis to improve the precision of comparisons between interventions on an interval-scale response. This technique may also be a useful approach in adjusting for chance imbalances between the intervention groups on factors relevant to the response. Similarly, for a categorical response variable, logistic regression analysis may be used.
The interpretation of analyses involving more than two intervention groups may be complicated by the multiplicity of statistical tests. If the aim of an analysis is restricted to making only a small number of pre-specified comparisons between groups as stated in the study protocol, then multiple testing is less of an issue. However, tests of hypotheses other than these (e.g. hypotheses formulated after looking at the results) require a more conservative approach in the statistical analysis to limit the risk of false positive findings. A similar issue arises in the interpretation of tests on multiple response variables. Ideally, investigators should nominate the primary response variable in the study protocol. Other responses may still be analysed but a stricter significance level may be appropriate to safeguard against false positive findings.
A recent development in nutrition research has been to use genomics, proteomics and metabolomics approaches as endpoints in nutrition intervention studies(Reference Zhang, Yap and Wei31). The use of multiple endpoints such as these raises some statistical issues. If the multiple endpoints are independent, then a simple Bonferroni correction is sufficient to control the risk of type 1 error with a significance level set not at the a level but at the a/k level, where k is the number of endpoints. An alternative approach, which retains more power than the Bonferroni correction and is more suited to microarray work, is to control the false discovery rate, the expected proportion of false positives among the results that are declared significant. For dependent endpoints, comparisons may be performed by a permutation test. This involves comparing the largest test statistic, not with a standard distribution (such as the t distribution or χ2 distribution), but instead with its permutation distribution obtained by calculating the largest test statistic in every possible relabelling of the groups (or at least in a very large random sample of them).
Intention to treat or per protocol
An important issue in the analysis is to decide how protocol deviations should be handled in the analysis. Usually, the most relevant comparison of interventions will include all randomised participants who began the intervention, and the analysis will be conducted on an ‘intention-to-treat’ principle. Once participants have been randomised to intervention groups, all available results are analysed in the groups to which they were allocated regardless of whether or not the participants complied with the intervention. In nutrition, interest sometimes focuses on the subset of participants who showed good compliance with the intervention (for a discussion of adequate levels of compliance see the Compliance section) and a ‘per-protocol’ analysis may then be more relevant even though this approach has a greater potential for introducing bias into the comparison of interventions.
Discussion and interpretation
The interpretation of study findings, and discussion section of a paper should include a consideration of the study limitations, including any potential sources of bias (e.g. imbalance in baseline characteristics), imprecision (in outcome assessments) or an acknowledgement of the possibility of statistically significant findings arising from multiple comparisons. The generalisability of the study findings should also be considered and limitations acknowledged.
Conclusions
The conclusions should be confirmed and justified by the accompanying data. The conclusions should relate directly to the hypothesis, to the test product at the amount consumed and to the population included in the study. Conclusions about secondary outcome measures should be stated as such and interpreted appropriately.
Current practice
In practice, the hypothesis is often not clearly stated, and this can make the validity of the conclusions difficult to judge. Sometimes, statistically significant results are overemphasised when these were not the original focus of the study design. Another issue that occurs frequently is that findings of studies are generalised to broader populations than may be reasonable given the participant characteristics. Inappropriate generalisation of study findings can also occur in relation to the specific product and the amount consumed, and to the duration of the study.
Roles and responsibilities of the research team
The complex issues involved in potential conflicts of interest and scientific bias, particularly when research funding may come from the food industry, have recently been discussed(Reference Rowe, Alexander and Clydesdale46). Furthermore, many journals now require statements of the roles and responsibilities of all members of the research team, including the funders or sponsors, and declarations of any potential conflicts of interest. We recommend that this should be a standard practice.
Acknowledgements
The authors would like to thank participants of the workshop ‘Beyond PASSCLAIM – Guidance to substantiate health claims on foods’ held in Nice, France from 14 to 16 December 2009. This workshop brought together over seventy experts from industry, academia and public bodies to discuss guidelines to establish beneficial effects of functional foods and outputs of the workshop have been incorporated into this paper. The study was commissioned by the Functional Foods Task Force of the European branch of the International Life Sciences Institute (ILSI Europe). Industry members of this task force (in 2009 and 2010) were Abbott Nutrition, Barilla G. & R. Fratelli, Bayer CropScience BioScience, Bionov, Cadbury, Cargill, Coca-Cola Europe, Colloïdes Naturels International, CSM, Danisco, Danone, Dow Europe, DSM, FrieslandCampina, Frutarom, International Nutrition Company, Kellogg Europe, Kraft Foods, La Morella Nuts, Mars, Martek Biosciences Corporation, McNeil Nutritionals, Monsanto Europe, Naturex, Nestlé, PepsiCo International, Pfizer, Puleva Biotech, Red Bull, Rudolf Wild KG, Soremartec Italia – Ferrero Group, Südzucker/BENEO Group, Syral, Tate & Lyle, Ülker Bisküvi, Unilever, Valio and Yakult Europe. For further information about ILSI Europe, please email info@ilsieurope.be. All authors contributed to the development of these guidelines and approved the final manuscript, which was collated by J. V. W. and R. W. W. The opinions expressed herein are those of the authors and do not necessarily represent the views of ILSI Europe nor those of its member companies. J.-M. A. is employed by Danone, J. d. V. was employed by CSM, O. H. is employed by Danisco, M. J. is employed by Unilever, M. R. is employed by Nestlé, S. T. is employed by Südzucker/BENEO Group, M. S. and S. V. are employed by ILSI Europe. R. W. W. has made presentations sponsored by food companies, and has been a member of research teams that have carried out projects funded wholly or partly by food companies. For those experts affiliated with academic or non-industrial institutions, ILSI Europe covered the expenses related to their participation in expert group meetings and the workshop held in Nice, and an honorarium has been provided.