Fe deficiency is the most common form of malnutrition and the most important cause of anaemia worldwide. According to the results of the 2016 Global Burden of Disease study, Fe-deficiency anaemia with 34·7 million years lived with disability was the fourth leading cause of years lived with disability in the world( 1). In 2010, anaemia affected more than 2 billion people worldwide(Reference Kassebaum, Jasrasaria and Naghavi2); and it is estimated that approximately 50 % of anaemia cases are due to Fe deficiency(3). Based on the estimates of WHO, in 2006, the prevalence of anaemia in Iran was 35 % in pre-school children, 33 % in non-pregnant women of reproductive age and 40 % in pregnant women(4). In several studies, the prevalence of Fe-deficiency anaemia in different age groups of women in Iran was estimated to be about 14–30 %(Reference Amirkhani, Ziaedini and Dashti5–Reference Shams, Asheri and Kianmehr7). In a recent systematic review, the overall prevalence of Fe deficiency and Fe-deficiency anaemia among the Iranian population under 18 years of age was estimated to be 26·9 and 13·9 %, respectively(Reference Akbari, Moosazadeh and Tabrizi8).
Fe supplementation in high-risk groups is one of the feasible and effective approaches in reducing Fe deficiency(Reference Schultink and Gross9). In 2001, Iran’s Ministry of Health designed an integrated Fe-deficiency control programme for groups at high risk of Fe-deficiency anaemia. A national health promotion programme for female high-school students through nutritional education and Fe supplementation was part of this integrated programme. This programme aimed to increase the Fe intake among students by delivering free weekly Fe supplements. Despite more than one decade of the implementation of this programme, few studies have evaluated the programme’s performance. A study in the south-east of Iran showed that 62·3 % of students had taken all the pills delivered to them during the programme implementation period(Reference Khammarnia, Amani and Hajmohammadi10). In another study, complete consumption of supplements was about 31 %(Reference Khammarnia, Amani and Hajmohammadi11). It seems that the programme has been somewhat unsuccessful; a comprehension programme evaluation is warranted.
In surveys, direct questioning (DQ) is the usual method for asking questions. It has been shown that DQ about sensitive issues can influence the participants’ responses. DQ usually has a tendency to under-report socially undesirable behaviours or over-report socially desirable behaviours, which is known as social desirability bias(Reference Krumpal12). This kind of bias could also occur in the implementation fidelity evaluation of health programmes by DQ, which leads to distorted answers about the dose of interventions delivered to respondents or their adherence to the programme. Evaluation of health intervention programmes at schools could be subject to this under- or over-reporting. Some students might provide dishonest answers due to fear of being punished by teachers or other school staff, or worrying about being blamed for non-adherence to the intervention. Thus, there will be some degree of overestimation of the intervention delivered to the students. In asking sensitive questions, indirect questioning methods could help obtain more valid estimates. They might increase the confidentiality of answers and protect respondents’ privacy(Reference Chaudhuri and Christofides13, Reference Warner14). The crosswise model (CM) was introduced by Yu et al. in 2008. It is one of the newest indirect questioning methods(Reference Yu, Tian and Tang15). It has been used successfully to estimate the prevalence of sensitive issues such as illicit drug use(Reference Khosravi, Mousavi and Chaman16, Reference Shamsipour, Yunesian and Fotouhi17), sexual behaviour(Reference Vakilian, Mousavi and Keramat18), use of anabolic steroids by bodybuilders(Reference Nakhaee, Pakravan and Nakhaee19) and researchers’ misconduct(Reference Roberts and St. John20). However, to the best of our knowledge, this method has not been previously used in assessing the implementation of health programmes.
In the present study, we aimed to: (i) evaluate the implementation of the Fe supplementation programme at senior high schools in West Azerbaijan Province in the north-west of Iran; and (ii) assess the usefulness of the CM for evaluating the health programme’s implementation.
Methods
Iron supplementation programme
Based on a joint agreement between the Ministry of Health and Medical Education and the Ministry of Education, a national health promotion programme through nutritional education and Fe supplementation started in all thirty-one provinces of Iran in 2001. The target population of the programme was female students at senior high schools.
According to the programme guideline, the directory of Population, Family and School Health of the Ministry of Health should arrange sessions to explain the goals of the programme, the importance of the intervention to reduce Fe deficiency, and also to prepare educational materials and deliver Fe supplements at the beginning of each school year. The main intervention was conducted from October to February for 16 weeks. It consisted of three parts: (i) delivering one Fe pill regularly and weekly (totally sixteen pills per student); (ii) consumption of the pills by students under the supervision of teachers; and (iii) holding nutritional training sessions along with delivering the pills. In the case of students’ absence from school, the Fe pills should be delivered to them the next day.
Crosswise model
The CM is a type of non-randomized response technique for sensitive issues and is designed to overcome the disadvantages of previous indirect questioning methods, such as the randomized response technique and its extensions(Reference Chaudhuri and Christofides13). In this model, each sensitive question (S) is combined with an insensitive question (I). Both questions have a binary response (yes (1)/no (0)). The pair of sensitive and insensitive questions is followed by a joint response with two options, for example ‘A’ and ‘B’. Respondents are asked to give a joint response to the pair of sensitive–insensitive questions. They are asked to choose option ‘A’ if their response to both questions is the same (S = 1 and I = 1, or S = 0 and I = 0), or choose option ‘B’ in the case of one ‘yes’ and one ‘no’ answer (e.g. S = 1 and I = 0, or S = 0 and I = 1). In this way, the options neither specify the existence of sensitive behaviours in the respondent nor dismiss them. This gives a feeling of privacy protection and confidentiality to the respondent, so they can trust the model and avoid false positive or negative responses. The insensitive question should be independent of the sensitive one with known proportion p (e.g. season or month of birth); p must be unequal to 0·5. Using questions with more extreme values of p (i.e. close to, but not equal to 0 or 1) will result in a smaller variance for the proportion of sensitive behaviour. In other words, it means higher efficiency for the prevalence estimation. In the case of p = 0 or p = 1, the model would be the most efficient, but due to a trade-off between statistical efficiency and privacy protection, it would not offer any privacy protection to respondents. Yu et al. argued that working with this model is easy for both respondent and investigator(Reference Yu, Tian and Tang15). In a recent study, Hoffman et al.(Reference Hoffmann, Waubert de Puiseau and Schmidt21) showed that the CM had greater comprehensibility than the other indirect questioning methods, because the respondents simply have to read two questions and integrate their response into one answer option. Some studies showed that the CM provides more valid estimates than DQ(Reference Hoffmann, Diedenhofen and Verschuere22, Reference Hoffmann and Musch23).
Study design
Female high-school students in West Azerbaijan Province formed our target population. We conducted two cross-sectional studies: one for the DQ and the other for the CM. Both studies were conducted simultaneously, due to logistical problems and the limited time until the end of the school year. We used a multistage stratified cluster-random sampling method to choose the participants from the public and private high schools in urban and rural areas of all seventeen cities in this province.
Sample size
We estimated a sample size of 440 students for the DQ method using P = 0·5, type 1 error = 0·05, 7 % precision, response rate = 90 % and a design effect of 2 for cluster sampling. In comparison to DQ, CM estimates have larger standard error(Reference Jann, Jerke and Krumpal24). To get an adequate level of precision, the sample size of the CM survey must be larger. Thus, we considered a fourfold higher sample size for the CM survey. We selected 2180 students in the form of 109 clusters of twenty students. A sample of 1740 students was surveyed using the CM method. To increase the comparability of the samples, twenty-two clusters of students were randomly assigned to the DQ survey.
Data collection instruments
We used two questionnaires that were made up of two parts. The first part of both questionnaires had the same questions about demographic characteristics. Questions were kept to a minimum number to get a high degree of privacy protection felt by students and contained questions about birth date, school grade and parents’ education.
In the second part of the questionnaires we intended to evaluate three aspects of the programme implementation: delivering Fe pills, consuming them and holding training sessions for students. The second part of the DQ questionnaire was composed of six questions: two questions for each aspect. Each question was followed by a yes/no answer. In addition, we asked the students to report the number of Fe pills consumed.
In the second part of the CM questionnaire, we used the same six questions of the DQ questionnaire, but each question was paired with one insensitive question with a known probability for positive response (Table 1). Each of the paired questions had a joint response with two options: ‘A’ and ‘B’. We used an additional combination of two insensitive questions with known probabilities and joint response. The similarity of the expected probabilities of these insensitive questions with their observed answers could demonstrate that the technique worked properly, and it did not confuse the students. In order to assess the students’ understanding of the instructions and their trust in the technique keeping their information confidential, we added two related questions to the CM questionnaire. The content validity of the questionnaire was evaluated qualitatively by eight academic members. The reliability of the questionnaire was evaluated in a pilot study and its split-half correlation was 0·78.
P, known probability of the insensitive question.
Data were collected by trained health workers in 2014–2015 school years. The first page of the questionnaire contained instructions to complete the questionnaire. However, oral explanations on how to answer the questions were presented to the students by health workers. Students were assured about the confidentiality of responses.
Statistical analysis
Delivering the Fe supplements and training sessions to the students (which were asked in questions 1, 2, 5 and 6) was considered as ‘intervention delivered to them’. In the DQ survey, to estimate the proportion of participants who had the intervention delivered to them (PrDQ; i.e. the proportion of students who gave positive answers to questions 1, 2, 5 and 6), we divided the number of positive answers by the total number of answers for each question (n 1). The sample variance of the PrDQ was estimated using $\left[ {{\rm{P}}{{\rm{r}}_{{\rm{DQ}}}}\left( {{\rm{1}}-{\rm{P}}{{\rm{r}}_{{\rm{DQ}}}}} \right)} \right]/\left( {{n_{\rm{1}}}-{\rm{1}}} \right)$ .
In the CM survey, the proportion of participants who had the intervention delivered to them (PrCM) was calculated using the following formula:
where p is the known probability of the insensitive question and λ is the proportion of participants who selected option ‘A’. The sample variance of PrCM and its standard error were calculated as:
and
where n 2 is the total number of answers to each question(Reference Jann, Jerke and Krumpal24).
To compare the differences between CM and DQ estimates, a two-tailed Z test for independent samples was calculated using the following formula:
where Z is assumed to have a standard normal distribution.
Results
Table 2 shows characteristics of the participants in the two surveys. No significant differences existed in the proportion of rural and urban students between the two surveys. There were, however, different proportions of students by school type.
† P values are based on the χ 2 test (for proportions).
Table 3 presents the estimated proportions for each of the questions about the Fe supplementation programme in the CM and DQ surveys. The CM estimates were consistently lower than the DQ estimates except for the responses to the question about delivering at least one training session to the students. The largest difference between the estimates of the two methods was observed in the proportion of Fe pills delivered weekly on a regular basis (73·2 % in DQ in contrast to 22·5 % in the CM survey). The proportions of students in the DQ and CM who reported delivery of at least one pill were 99·6 and 78.0 %, respectively. The proportions of all students in the DQ and CM who took all the pills delivered to them were 43·0 and 31·3 %, respectively. Of 364 students who reported the number of pills consumed in the DQ survey, 12 % had taken sixteen pills (complete dose) during the implementation period (results not shown).
* P = 0·2954,
** P < 0·001,
*** P = 0·001.
† The difference between the results of DQ and CM.
‡ Standard error of the difference, estimated by $\sqrt {({\mathop{\rm var}} ({{\Pr }_{\rm CM}}))} \times 100$ .
No significant differences existed between the expected and observed probabilities of the two insensitive questions that were paired together (Table 4).
When the second question presented as insensitive question: *P = 0·205.
When the first question presented as insensitive question: **P = 0·248.
In the CM survey, about 72 % of the students stated that they fully understood the instructions to complete the questionnaire, and 67 % of the students were highly or very highly confident that the CM kept their information confidential. The proportions of the students who had low or very low levels of understanding about the instructions and trusting the CM method were very low (8·3 and 4·7 %, respectively). We found a positive correlation between understanding the technique and trusting the information confidentiality. In other words, students who highly understood the completion instructions, highly trusted the method (Spearman’s ρ= 0·352, P < 0·001; Table 5).
Discussion
In the present study, we evaluated the implementation of the national health promotion programme for female high-school students using the CM and DQ methods. The CM resulted in estimates that were lower than the DQ estimates. In other words, degrees of over-reporting existed in the positive responses to the questions about the implementation of all aspects of the programme in the DQ survey. Our results indicate that all the three major aspects of the programme (delivering and consumption of the supplements, holding nutritional training sessions) were poorly implemented.
We considered evaluating the implementation of the programme as a kind of sensitive issue. Using DQ to estimate the prevalence of sensitive issues resulted in some consequences including wrong answers, item non-responses, over- or under-reporting of the problem and low response rate. Then, in addition to DQ, we evaluated the programme implementation by the CM – an indirect method for asking sensitive questions. It was used successfully in research surveys about highly personal or sensitive questions such as addiction(Reference Khosravi, Mousavi and Chaman16), sexual behaviour(Reference Kazemzadeh, Shokoohi and Baneshi25), cheating and plagiarism(Reference Roberts and St. John20). A more-is-better hypothesis has been assumed in these studies, and results indicated that the CM method provides a higher prevalence for sensitive behaviour(Reference Yu, Tian and Tang15). In the present study, we assumed that social desirability bias could lead to inflated and biased estimates of services delivered to the students. Therefore, we assumed that the size of the CM estimates for each item would be lower than the DQ estimates, and we considered a lower-is-better hypothesis. Our findings revealed that DQ can be subject to considerable over-reporting in the evaluation of the health programme at schools. We found significant differences between the results of the DQ and CM methods with regard to the three major aspects of the programme. For example, we found a 51 % difference between the estimates of the DQ and CM method for the proportion of students to whom Fe supplements were delivered weekly and regularly, and a 12 % difference for consuming all the pills delivered to the students (Table 3). Hence, it seems that the proportion of the students who consumed all sixteen pills might be lower than that reported in the DQ survey. These findings confirmed our lower-is-better hypothesis.
The success of the CM depends heavily on two factors: understanding the instructions for completing the questionnaire or the comprehensibility of the method; and the trust in the confidentiality of personal information. In the CM survey, we found a positive relationship between understanding the method and trusting in its privacy protection. A considerable proportion of the students in the CM survey stated that they had highly understood the completion instructions and believed in the confidentiality of their personal information. The proportion of the students who did not understand the completion instructions of the CM questionnaire or did not trust in it, was very low. The joint distribution of the students’ responses to these questions revealed that the students who could not understand the method, poorly trusted in it, and vice versa. This is unlike the results of the study by Hoffmann et al.(Reference Hoffmann, Waubert de Puiseau and Schmidt21), who did not find any correlation between these two constructs. In addition, we considered the lack of missing data in the results of the CM, which indicates that distrust of the CM or lack of understanding the instructions by some of the students did not lead to non-response to questions.
In our DQ survey, the proportion of students who received Fe supplements regularly and weekly was slightly lower than the results of the study conducted by Kheirori and Alizadeh (79·3 %) in Tabriz, the capital of East Azerbaijan Province(Reference Kheirouri and Alizadeh26). This difference may be due to the fact that our study was done in rural and urban schools of all seventeen cities of the province. It is possible that the implementation of the programme in provincial cities and rural areas is different from implementation of it in the capital city of a province. In our DQ survey, furthermore, the proportion of the students who consumed all the Fe pills delivered (43 %) was similar to that in the study conducted by Khammarnia et al.(Reference Khammarnia, Amani and Hajmohammadi11) in Semnan (38·2 %), but was remarkably lower than that found by Kheirori and Alizadeh (62·3 %) in Tabriz(Reference Kheirouri and Alizadeh26). The low consumption rates of Fe pills could be attributed to adverse effects of the Fe pills, lack of water supply for taking the pills in class and students’ low levels of health knowledge. The latter may be the results of insufficient nutritional education provided by teachers, the limited number of training sessions, teachers’ unwillingness and reluctance to hold training sessions, and insufficient time to hold them, similar to what was found in other studies in Iran(Reference Khammarnia, Amani and Hajmohammadi11, Reference Kheirouri and Alizadeh26, Reference Karimi, Hajizadeh-Zaker and Ghorbani27). The results of our DQ survey revealed that almost one-third of the students received regular educational sessions along with the distribution of pills. The small number of training sessions might have affected the consumption of pills by students. It might explain the very low percentage of students (12 %) who consumed all sixteen pills during the programme implementation period. These findings show that the programme was poorly implemented, and it cannot be successful in achieving its goals.
The present study has some strengths. To the best of our knowledge, it is the first study that used the CM for the purpose of examining the implementation of a health programme. Also, we carried out a DQ survey concurrent with the CM and compared the results. We selected clusters of students randomly for the two methods and believed that we could avoid bias from the time lag between the interviews in our results. We used a larger sample size in the CM method than the DQ method to get the estimates with sufficient precision. Furthermore, according to some authors, asking questions about an issue that may seem repetitive should be avoided(Reference Khosravi, Mousavi and Chaman16). It may affect the respondents’ trust. They may think that the researcher tries to find their response by linking questions to each other. To overcome that, we used a limited number of questions for each aspect of the programme, which in turn limited the amount of the information obtained.
Our study has some limitations. First, we did not ask any questions about the comprehension of the DQ survey and trusting in its privacy protection. Instead, we considered the non-responses to some items in the two surveys. Some degree of non-response appeared in the DQ, which means that some students did not trust in the DQ method completely. In an empirical study that compared the DQ method with some indirect questioning methods, Hoffmann et al.(Reference Hoffmann, Waubert de Puiseau and Schmidt21) showed greater comprehensibility and lower perceived privacy protection of DQ than indirect questioning methods, including CM. Second, we assumed that social desirability bias and non-response bias could threaten the validity of the programme implementation quality and the estimates obtained through the DQ method, so we used CM to obtain more valid estimates. But we had no valid estimates about the known prevalence of the intervention delivered, or adherence of the students to the intervention, which could call into question the validity of our CM estimates. To overcome this problem, we used the results of the DQ method as a control for the results of the CM method. In addition, we considered the lack of missing data and the high frequency of students who understood and trusted in the CM method. Furthermore, we used a set of two insensitive questions in the CM questionnaire and compared the corresponding estimates with their known probabilities to assess the degree of accuracy in the responses of the students. No significant differences existed. This means students understood the completion instructions and trusted in it, so that their correct answers to these questions could signify valid estimates for the questions about the programme implementation. Also, the results of two recent studies using empirical data showed that the CM method is capable of providing a valid estimate for the prevalence of sensitive behaviour compared with DQ(Reference Hoffmann, Diedenhofen and Verschuere22, Reference Hoffmann and Musch23). Some studies, nevertheless, showed that using the CM leads to non-ignorable false positives that makes the validity of the estimates of CM questionable(Reference Höglinger and Diekmann28, Reference Höglinger, Jann and Diekmann29). These studies, however, used a very rare condition in the population to show the rate of false positive in the estimates. More research is needed on the validity of CM estimates when the condition of interest is not rare, and both false positive and false negative results are possible.
Conclusion
In conclusion, the present results showing poor implementation of all aspects of the programme at schools were in line with the findings of previous studies. The poor implementation quality of the programme and the incomplete and irregular consumption of Fe supplements by the female high-school students make the programme ineffective in reducing both Fe deficiency and Fe-deficiency anaemia in this group. Furthermore, our results showed that the CM might give better estimates about the implementation of health programmes at schools. Much more research is needed, however, both in assessing the validity of the results of the CM, and on the application of this method to evaluate health programmes at schools and in other settings.
Acknowledgements
Acknowledgements: Deepest gratitude and appreciation go to all respected nutrition experts in the health departments of Urmia University of Medical Sciences and other individuals who have cooperated sincerely in conducting this plan. Financial support: This work was supported by research deputy of the Urmia University of Medical Sciences (grant number 92-01-41-1111). Research deputy of the Urmia University of Medical Sciences had no role in the design, analysis or writing of this article. Conflict of interest: No conflict of interest exists for any of the authors associated with the manuscript. Authorship: M.B. and S.N.S. were the study designer and programme managers. S.M. participated in data handling and analysis, interpretation of the result and full manuscript preparation. F.B., F.S. and F.R. participated in executive managing for sampling the participants and data collection. Z.A. participated in preparing the draft of the manuscript and interpretation of the results. All authors have seen and approved the final manuscript that has been submitted. Ethics of human subject participation: This study was conducted according to the guidelines laid down in the Declaration of Helsinki and all procedures involving human subjects were approved by the ethics committee of the Urmia University of Medical Science. Written informed consent was obtained from the parents of all students.