INTRODUCTION
The epidemiological pattern of tuberculosis (TB) in low-incidence countries is changing, with an increasing number of TB patients living in urban areas [Reference Fujiwara, Frieden, Rom and Garay1–Reference Hayward3]. This is due to overrepresentation of immigrants from countries with a high incidence of TB in large cities, and to urban risk groups for TB such as illicit drug users and homeless persons [Reference Valin4–Reference Story, Van Hest and Hayward6]. Conventional TB control methods such as contact-tracing and preventive treatment are inadequate among marginalized care-avoiders [Reference De Vries and Van Hest5, Reference Barnes7, Reference Bock8]. As an alternative, radiological screening programmes for illicit drug users and homeless persons have been recommended in European cities [Reference Kumar9–Reference De Vries, Van Hest and Šebek12].
TB re-emerged among illicit drug users and homeless persons in Rotterdam (population ∼600 000) in 2001, after periodic radiological screening was discontinued in 1996. In response, a periodic radiological screening programme was re-introduced in May 2002, using a mobile digital X-ray unit (MDXU) and visiting day and night shelters and hostels for homeless persons, methadone-dispensing centres and safe drug consumption rooms for opiate users, as well as the street prostitution zone in Rotterdam. The programme aimed to screen clients of these facilities and services bi-annually [Reference De Vries and Van Hest5, Reference De Vries, Van Hest and Richardus13].
For priority setting, service planning and resource allocation it is necessary to know the number of persons in a targeted group. This number can also be used to assess the coverage of an intervention [Reference Smit, Reinking and Reijerse14]. Often direct (enumeration) techniques are not feasible to estimate the size of hidden populations and indirect techniques have to be used. One such indirect technique, capture–recapture analysis [15–Reference Hook and Regal17], has been used to estimate the size of hidden populations, including illicit drug users [Reference Hay and McKeganey18, Reference Buster, van Brussel and van den Brink19], and homeless persons [Reference Fisher20, Reference Gurgel21]. However, capture–recapture analysis preferably needs at least three linked data sources, which are not always available for hidden populations. As an alternative, truncated models are described in the literature [Reference Hook and Regal17, Reference Wilson and Collins22, Reference Van Hest23]. Contrary to conventional capture–recapture analysis, truncated models can use frequency data from a single source of information. These models have been applied to estimate the size of hidden populations such as criminals [Reference Rossmo and Routledge24, Reference Van der Heijden, Cruyff and Van Houwelingen25], illegal residents [Reference Van der Heijden26], and illicit drug users and homeless persons [Reference Smit, Reinking and Reijerse14, Reference Hser27–Reference Hay and Smit30].
The objective of this study is to estimate the coverage of a mobile TB screening programme among illicit drug users and homeless persons in Rotterdam, using simple truncated models.
METHODS
Ethics committee approval was not required for this study.
Study design, participants and study years
Participants in this descriptive study are individuals that use the services of shelters and hostels for homeless persons, methadone-dispensing centres or safe drug-consumption rooms for opiate users, or work in the street prostitution zone in Rotterdam, having at least one chest X-ray taken in the MDXU of the mobile TB screening programme between 1 January 2003 and 31 December 2005. Because 2002 was an incomplete year of screening and not all facilities were visited twice by the MDXU these data were excluded. A proportion of individuals in the target group use multiple facilities and their chest X-ray can be taken at different locations, sometimes more than twice yearly. Chest X-rays were read by public health TB physicians on location or within a few working days at the Public Health Service.
Data collection and validation
Data on participants of the MDXU screening programme, such as name, date of birth, sex, date of chest X-ray and chest X-ray result, are routinely entered into the electronic Client Information System of the Tuberculosis Control Section of the Municipal Public Health Service Rotterdam-Rijnmond, using a unique personal identification number. To avoid misclassification of individuals due to clerical errors such as misspelling of names or typing errors, all names and dates of birth of the participants were double-checked in the Client Information System during data entry. Since 2005 the Universal Mobile Telecommunications System (UMTS) provides wireless connection between the MDXU and the Client Information System facilitating checking personal data of participants on location. The number of individuals participating in the TB screening programme and the frequency of their visits per year and for the total study period were extracted from the Client Information System.
Truncated models
The number of illicit drug users and homeless persons in the target group for the mobile TB screening programme, and hence the coverage of the programme, was estimated through simple truncated models. Although their results were expected to be similar, as two examples we used Zelterman's truncated Poisson mixture model and Chao's truncated heterogeneity model, which can be applied to frequency counts of observations of individuals in a single register [Reference Chao31–Reference Chao33]. Truncated models aim to estimate the number of unobserved persons in the (truncated) zero-frequency class based upon information of the lower observed frequency classes, assuming a specific truncated distribution of the observed data, e.g. Poisson, binomial or a mixture [Reference Hook and Regal17, Reference Chao31–Reference Chao34]. Observed frequency distributions may not be strictly Poisson and to relax this assumption Zelterman and Chao based their models on a Poisson mixture distribution. This allows for greater flexibility and applicability on real life data because the models explicitly cater for departures from the strict Poisson assumption. Zelterman's Poisson mixture model of the estimated total population size, est(N), is given by
and Chao's heterogeneity model by
where f 1 denotes the number of persons falling in the first frequency class, f 2 denotes the number of persons falling in the second frequency class, obs(N) denotes the number of all observed individuals and exp is the exponential.
The simple truncated models do not need statistical packages and have performed well when compared to log-linear capture – recapture analysis [Reference Hook and Regal35]. They supposedly perform well even when data are sparse. Frequency data are less sensitive to privacy regulations. The truncated models of Zelterman and Chao were previously used to estimate the number of problematic illicit drug users in Rotterdam and detailed conceptual aspects of these models have been described [Reference Smit, Reinking and Reijerse14, Reference Smit, Toet, Van der Heijden, Hay, McKeganey and Birks28, Reference Hay and Smit30]. An overview of a range of truncated models, is given elsewhere [Reference Wilson and Collins22]. The underlying assumptions and limitations of truncated models will be discussed later.
Coverage
The annual coverage is defined as the number of individuals screened at least once per year [obs(N) or the annual case-ascertainment] divided by the estimated annual number of illicit drug users and homeless persons in the target group for periodic TB screening [est(N)], expressed as a percentage [obs(N)/[est(N)]×100]. This definition is different from the use of the word coverage by Chao in her heterogeneity model article [Reference Chao31], which is related to the proportion of times that the confidence interval includes the true number of cases in a simulation study, or another well-known publication of Chao's, in which it is related to a measure to quantify the source overlap information [Reference Chao36].
RESULTS
Between 1 January 2003 and 31 December 2005 a total of 7075 chest X-rays were made of 3034 individuals. Table 1 shows the total number of screened individuals per frequency class and number of chest X-rays taken. Nearly half of the individuals screened (45·6%) entered the programme only once.
Table 2 shows the annual number of screened individuals, people not previously screened and number of X-rays taken, per frequency class and in total. The annual number of individuals screened gradually decreased over the years. The annual number of people not previously screened strongly decreased but in 2004 and 2005 a considerable number of these persons still entered the programme. The annual number of individuals in the first frequency class (seen once), second frequency class (seen twice) and total number of individuals screened respectively represent f 1, f 2 and obs(N) in the formula of the truncated models.
* Corrected for screening of a large shelter in January 2005 planned for December 2004.
Table 3 shows the annual observed and estimated number of illicit drug users and homeless persons in the target group for periodic TB screening for the two truncated models, as well as the estimated coverage of the mobile TB screening programme. The estimates of Chao's model are slightly higher but in the same range as Zelterman's model. The radiological mobile targeted TB screening programme reaches about 63% of the estimated target population at least once per year. The intended coverage of the screening programme (at least two chest X-rays per person per year) was about 22%, 25% and 21% in 2003, 2004 and 2005, respectively.
obs(N), Number of individuals observed; est(N), number of individuals estimated; CI, confidence interval.
DISCUSSION
Main findings
This study demonstrates that truncated models can be used relatively easily on available single-source routine data to estimate the size of a hidden population of illicit drug users and homeless persons. Our results show that a radiological mobile targeted TB screening programme among illicit drug users and homeless persons in Rotterdam reaches about two-third of the estimated target population at least once per year. Between 21% and 25% of the estimated target population meets the objective of the programme and has two or more chest X-rays taken per year.
Limitations
As with capture–recapture analysis, the validity of the estimates of truncated models depends on the possible violation of the underlying assumptions. These assumptions are perfect identification (i.e. no misclassification of the number of visits of one client), a closed population (i.e. no in-migration or out-migration in the time period studied), ideally but not necessarily a homogeneous population (i.e. no subgroups with markedly different probabilities to be observed and re-observed), a constant probability of being observed (i.e. there should be no individual behavioural response and the probability of being re-observed should not be influenced by the experience of a previous visit) and, as explained earlier in the Methods section, a specific truncated distribution of the observed data [Reference Smit, Reinking and Reijerse14, Reference Hay and Smit30].
Perfect identification assumption
In this programme individuals were assigned unique identification numbers in the Client Information System and personal identifiers were double-checked upon data entry to avoid misclassification. The staff of the facilities visited assisted the programme by providing a list of names and dates of birth of clients eligible for screening. Most clients had personal identification cards which were checked at screening. Furthermore, social workers from the services assisted on the day of screening which also reduced the possibility of misclassifying individuals. Violation of the perfect record-linkage assumption is therefore considered minimal.
Closed population assumption
To reduce bias as a result of violation of the closed population assumption we divided the study in 1-year periods. The MDXU visits each location for one day twice a year. This limits the opportunity for passers-by and short-term clients to be observed. Tables 1 and 2, however, show that every year a substantial number of people not previously screened enter the programme. These can be individuals belonging to the target group but not yet captured by the screening programme, individuals not belonging to the target group or individuals that recently joined the target group. Influx of the last two categories will result in annual estimates of the target population of long-term illicit drug users and homeless persons being too high and hence the estimate of the screening programme coverage being too low.
Homogeneity assumption
Some problematic illicit drug users and homeless persons, such as cocaine users or persistent rough sleepers, will never be reached. Their likelihood of attending the TB screening programme is zero because they never utilize the facilities and services. This group is not included in the truncated model estimate [Reference Smit, Toet, Van der Heijden, Hay, McKeganey and Birks28].
We cannot exclude individuals entering the screening programme, e.g. among individuals entering the programme only once, that do not belong to the group of long-term illicit drug user and homeless persons. In a previous conventional log-linear capture–recapture estimation of the number of clients of a methadone maintenance programme it was demonstrated that differences in capture probabilities of the population of interest, problematic drug users, and the sampled population, also including non-problematic drug users, could considerably overestimate the size of the population of interest [Reference Buster, van Brussel and van den Brink19].
We cannot exclude heterogeneity among individuals belonging to the target group entering the screening programme but the opportunities to participate in the screening (opting-out strategy) or not to participate (not attending the facility or service on the day of screening) are assumed to be largely similar for the majority. The truncated models are arguably more robust to violation of the homogeneity assumption because they are partly based upon the lower frequency classes, and assumed to have more resemblance to the zero frequency class. The relative insensitivity to violation of the homogeneity assumption of Zelterman's and Chao's models is also supported mathematically and through simulation studies [Reference Smit, Reinking and Reijerse14, Reference Wilson and Collins22, Reference Zelterman32]. However, in the presence of heterogeneity they can underestimate the population.
An alternative approach to estimating a heterogeneous population would be to use a population mixture model. Such a model (for the data in Table 1) regards the eligible population for each visit as a mixture of ‘local clients at the facilities’, having six opportunities to be observed and ‘roaming clients’ from other facilities, visiting more places than their own facility and can be captured at other facilities by the MDXU as well. They possibly have more than a total of six opportunities to be observed. For each visit the capture of the local clients could be modelled as binomial (6, p1) and the capture of roaming clients as Poisson (lambda), where lambda is probably less than 6 times p1. However, in our population of homeless persons and illicit drug users a clear distinction between local and roaming clients is arbitrary as many clients use multiple services, e.g. methadone-dispensing centres due to their addiction and day- and night-care facilities due to their homelessness, and their need for specific services may change over time. Furthermore, we have not considered such a population mixture model or E-M algorithm because, although more accurate, their complexity disagrees with the appealing ease of use of the simple truncated models. For the purpose of our study more exact, but complex to calculate, estimates were subsidiary to the simplicity of a method which should be close enough. As described for capture–recapture analysis simple truncated models are useful under certain circumstances, e.g. when the probable direction of the bias caused by violation of the underlying assumptions can be predicted and plausible lower and upper boundaries of the prevalence or incidence of a disease or the coverage of a community health-care intervention can be estimated [Reference Hook and Regal17, Reference Hook and Regal37, Reference Hook and Regal38].
Constant (re)observation probability assumption
For the majority of the individuals in the target group of the mobile TB screening programme the facilities and services where screening took place provide important needs, namely methadone and shelter. These needs are probably constant over time, creating a considerable probability of attending the services. Frequent users have the highest risk of TB but are also most likely to be screened. Although incentives, e.g. chocolate bars, were given to participants at some locations, it is unlikely that this creates an important positive behavioural response to participate again. This also applies to clients with radiographic abnormalities inconsistent with TB as they are referred to a chest physician in one of the general hospitals in Rotterdam where further analysis and follow-up is performed. The opting-out strategy and (strong) persuasion by the staff of the social and medical services to participate prevents a negative behavioural response. The pressure particular institutions put on their clients to participate in the screening programme is considered relatively constant on each screening day. The coverage of the screening programme will never be perfect as each year a proportion of the target group will temporarily have a low or zero probability to attend, e.g. due to admission in a rehabilitation clinic or prison sentence. Finally, is has been explained elsewhere that the probability of being observed does not have to be constant as long as a capture or non-capture does not influence a possible change in probability [Reference Van der Heijden26].
Poisson distribution of the observed data assumption
Zelterman and Chao based their model on a Poisson mixture distribution, catering for departures from the strict Poisson assumption. We have examined whether the Zelterman model used tolerates the departures from the Poisson distribution observed in our data. We have performed negative binomial regression, with number of times screened as the covariate and number of individuals as the outcome, on the Table 1 data (counting >6 as 6). The variance of the data is larger than that of a Poisson distribution. This overdispersion is statistically significant (P=0·11), but small (alpha=0·024), and so does not invalidate the use of Zelterman's estimator [Reference Zelterman32]. Therefore it seems reasonable to use this simple model in the context of our study, as explained earlier.
A further limitation is that persons in the target group could have indicated on the day of screening that a chest X-ray was recently taken in the MDXU, in a general hospital, upon detention in prison or at the Tuberculosis Control Section upon referral, exempting them from the screening exercise. This information, together with improved experience, better coordination and UMTS access over the years, would prevent some clients from being recorded twice or more than twice yearly in the screening programme, as reflected in Table 2, leading to overestimation, but we assumed this effect to be limited.
Cross-validation of the estimates of the target group
The number of problematic illicit drug users in Rotterdam, already including many homeless persons, was most recently estimated in 2003 with two-source capture–recapture analysis, using a similar case-definition, which observed and estimated 1910 and 2856 clients respectively [Reference Biesma, Snippe and Bieleman39]. These numbers are similar to our results in 2003.
Alternative simple truncated models
Although we used truncated Poisson mixture models, an alternative is to use a truncated binomial model such as est(N)=obs(N)+(f 1)2/4f 2. This model, close to Chao's model, estimates a lower number of 2432, 2181 and 2015 illicit drug users and homeless persons in 2003, 2004 and 2005 respectively, resulting in a slightly higher estimated coverage of the screening programme.
CONCLUSION
Although the limitations of the simple single-source truncated models should be appreciated and bias cannot be excluded, alternative methods for estimating illicit drug users and homeless persons have their own restrictions. Conventional two-source and three-source capture–recapture analysis have similar underlying assumptions and hence limitations, and for hidden populations sufficient adequate registers for record-linkage may not be available. Compared to alternative estimators the ease of use of the truncated models is appealing. We could extract, check and prepare the required data from an existing routine dataset in 2 days and calculate the point estimates on a pocket calculator. We assumed the probable overall bias in this study to be overestimation and therefore the coverage of the targeted mobile TB screening programme among problematic illicit drug users and homeless persons in Rotterdam would be higher than the 63% one chest X-ray per year and 21–25% for at least two chest X-rays per year, especially among those with the highest risk.
ACKNOWLEDGEMENTS
We thank Monica Straal, software application manager of the Tuberculosis Control Section, for assistance in preparing the final data file.
DECLARATION OF INTEREST
None.