INTRODUCTION
Injecting drug use (IDU) represents the most common risk factor for hepatitis C virus (HCV) infection throughout the industrialized world [Reference Di1]. However, a substantial proportion of persons who test HCV antibody-positive report no risk factors for acquiring infection. In the USA, 30% of persons with acute HCV during 1991–1995 denied a specific exposure associated with becoming infected during the 6 months preceding illness onset, although over half of these reported a history of drug use [Reference Alter2]. In England and Wales, 71% (35 598/49 819) of confirmed HCV infections during 1992–2004 lacked risk factor information [Reference Gungabissoon, Balogun and Ramsay3]. Sentinel surveillance of acute HCV infection in the USA indicates sexual risk behaviour as a probable route of infection in a significant minority of cases [Reference Wasley, Grytdal and Gallagher4, Reference Williams5]. Either the potential risk factors for HCV acquisition were not carefully elicited in these studies, or there was a significant undefined source of viral transmission. A study in the USA showed that the route of HCV acquisition could be delineated in 88% of HCV chronically infected patients using a systematic interview approach; in nearly all cases, the initially unreported risk factor for HCV transmission was a remote history of IDU [Reference Flamm, Parker and Chopra6].
Deriving an estimate of the percentage of HCV-diagnosed persons with IDU risk from the observed data can be problematic (i.e. subject to bias) if there is a large amount of missing risk information, which is the case for the national HCV Diagnosis database held by Health Protection Scotland (HPS), a population-wide record of all individuals testing HCV antibody-positive since testing commenced in 1991. As at the end of 2009, 35% of records lacked data on risk factor(s), and of those with risk information, current/former IDU was specified for 89% [7]. An unknown percentage of persons with missing risk data will have acquired their infection through IDU.
In the current study, we combined data on IDU history available from four other data sources – HIV testing, hospital discharges, deaths, treatment for drugs misuse – with that in the HCV Diagnosis database, using record-linkage methods to identify individuals observed across sources. The total number of HCV-diagnosed persons with IDU risk was then estimated using capture–recapture statistical methods [Reference Cormack8], originally developed for counting animal populations. Thus, the purpose of our study was to estimate of the proportion of HCV antibody-positive and diagnosed persons who were likely to have acquired their infection through IDU, which will provide public health policy-makers with a more accurate demographic picture of Scotland's HCV-infected population, and will consequently inform on resource allocation for prevention, treatment and care.
METHODS
The IDU status of HCV-diagnosed persons was sourced from five databases (i.e. HCV Diagnosis, HIV Test, treatment for drug misuse, hospital discharges, and deaths databases). Data on opiate use rather than IDU per se was available from the latter two databases, and served as a proxy indicator for IDU.
The HCV Diagnosis database, maintained by HPS, records all persons who have been diagnosed HCV positive (defined as laboratory detection of HCV antibody or a positive PCR test result) in Scotland since testing began in 1991. The database contains the following non-named information: surname Soundex code (multiple surnames possible; for instance, following marriage), forename initial, date of birth, sex, postcode district of residence, hospital/clinic number (generated from GUM clinic/hospital referrals only), and data concerning risk activities; as at 31 December 2009, this database contained records for 27 183 persons.
The second data source used was the national HIV Test database, also held by HPS. It records all HIV tests conducted within Scotland, excluding routine screening (e.g. antenatal, renal, travel/insurance), and persons aged <15 years. Data is provided by all NHS laboratories in Scotland that perform HIV testing. Individual records on this database contain the following non-named information: sex, date of birth, surname and forename initials, health board of residence (NHS Scotland administrative area) and a hospital/clinic number generated from GUM clinic/hospital referrals only, as well as data concerning risk activities. HIV test records were mapped to distinct individuals using a deterministic approach (i.e. the procedure required a complete match on either the set of identifiers sex, date of birth, and initials, or the set of sex, date of birth, and hospital/clinic number). The HIV Test database contained records for 523 251 HIV tests conducted between 1 January 1988 and 31 December 2009 (including the testing of some stored sera back to 1980). Internal linkage resulted in records for 412 994 distinct HIV-tested individuals. Linkage between the HCV Diagnosis and HIV Test databases was also performed in-house at HPS using deterministic methods [i.e. a complete match was required on the identifier set (i) sex, date of birth, and initials; or, if initials were missing, the set (ii) sex, date of birth, and hospital/clinic number].
The Scottish Morbidity Records (SMR01) is an episode-based patient record of all acute inpatient and day case hospital discharges from non-obstetric, non-psychiatric specialities. Information Services Division (ISD) routinely combines SMR01 data with death registrations held by the General Register Office for Scotland to form a linked dataset; the identifiers sex, date of birth, initials, and surname Soundex were available. We treated this linked dataset (of hospital and death records) as a single data source for the capture–recapture analysis. Linkage of records between the HCV Diagnosis database and the hospitalization/deaths dataset was carried out by ISD using probabilistic record-linkage techniques [Reference Kendrick and Clarke9], which allow for matches using incomplete identifiers. All hospital and death records (for 1 January 1981 to 31 December 2009) that had linked to HCV-diagnosed persons were provided for analysis.
The final data source used was the Scottish drug misuse database (SDMD), also held by ISD. The SDMD is a record of current/former drug users in contact with drug treatment and support services, including general practitioners, hospitals, specialist drug clinics, and non-statutory agencies. These agencies report information on new contacts (defined as first presentation or repeat presentation if it has been at least 6 months since last attendance) to the SDMD. The SDMD contains limited identifying information: sex, date of birth, forename initial, first and fourth letter of surname, and postcode sector of residence. Data were available from 1 April 1996 to 31 December 2009, containing 76 364 records representing 28 601 distinct individuals. Data-linkage between the SDMD and HCV Diagnosis databases was performed by ISD using probabilistic methods.
After exclusion of 1568 persons with insufficient identifiers (defined as missing date of birth and two of: sex, initials, surname Soundex), 25 615 records on the HCV Diagnosis database were available for linkage to the other databases. Approval for this linkage exercise was provided by the NHS National Services Scotland Privacy Advisory Committee. The study population was further restricted to all records with non-missing sex data, leaving 25 521 for analysis.
Definition of IDU risk
An individual in the HCV Diagnosis or HIV Test databases was considered to have IDU risk if IDU was listed as a risk factor for acquiring infection.
In the SMR01/death registrations linked dataset, both hospital discharge diagnosis and cause-of-death codes use the International Classification of Diseases – Ninth Revision (ICD-9) for events occurring before 2000, and the Tenth Revision for 2000–2009. IDU risk was inferred if the discharge/death record contained a code for opiate use, i.e. any of the ICD-9 codes 304.0 ‘Opioid type dependence’ or 304.7 ‘Combinations of opioid type drug with any other drug dependence’, or the ICD-10 codes F11.0–F11.9 ‘Mental & behavioural disorders due to use of opioids’.
IDU risk for an individual in the SDMD was defined according to self-report: if at any attendance at drug services the client reported having either ‘injected in the previous month’ or ‘injected in past/not previous month’, he/she was classified as having IDU risk.
Statistical methods
Log-linear modelling [Reference Cormack8] was used to analyse the overlap in the number of HCV-diagnosed persons with IDU risk in the four data sources (HCV Diagnosis, HIV Test, hospital/deaths, SDMD) and to estimate the total population size (i.e. the number of HCV-diagnosed IDUs including those who are unknown). Backwards stepwise regression was used to find a model which adequately described the data with the least number of parameters; two-way and three-way interaction terms were removed from the model specification if Akaike's Information Criterion (AIC) difference was <2 compared to the model including the interaction term. Confidence intervals were determined using the profile likelihood. Model fitting was performed using the Rcapture package [Reference Baillargeon and Rivest10] for R statistical software [11].
Stratified log-linear analyses were conducted according to four covariates: sex, birth cohort (<1960, 1960–1969, 1970–1979, 1980+), health board of residence (Greater Glasgow & Clyde, and all other), and calendar year period of HCV diagnosis (<1995, 1995–99, 2000–04, 2005–2009). Additional analyses were conducted restricted to those HCV-diagnosed who had non-IDU risk and to those for whom risk activity leading to infection was unknown. In order to investigate the possibility that there might not have been sufficient time for those persons diagnosed with HCV near the end of the study period to be ‘captured’ on the other data sources, in a sensitivity analysis we restricted the inclusion period to HCV diagnoses made up to 31 December 2006 (with the other data sources censored as before, at end of 2009).
RESULTS
Study population
Of all HCV-diagnosed persons, 58% (14 836/25 521) reported IDU risk at the time of their HCV diagnosis, 7·3% (1862/25 521) reported a non-IDU risk (e.g. blood factor/transfusion) and for 35% the risk factor(s) for acquiring infection were unknown. Of those with a reported risk(s) on the HCV Diagnosis database, 89% (14 836/16 698) were current/former IDUs. The majority of HCV-diagnosed persons were male (68%), and were born during the 1960s and 1970s (70%) (Table 1).
HCV Diag, HCV Diagnosis database; SDMD, Scottish drug misuse database; SMR01/Deaths, Scottish Morbidity Records hospital discharge/deaths data; GGC, Greater Glasgow & Clyde Health Board.
IDU risk from other data sources
Of the 25 521 HCV-diagnosed persons, 38%, 49% and 30% were identified as having an IDU risk in the HIV Test, SDMD and hospital/deaths databases, respectively (Table 1). Of HCV-diagnosed persons who had reported a non-IDU risk at the time of HCV diagnosis, 11% (200/1862), 16% (294/1862) and 9% (171/1862) were identified as having IDU risk in the HIV Test, SDMD and hospitalization/deaths databases, respectively. The distribution of covariates across data sources varied somewhat for birth cohort (Table 1). Twenty percent of all HCV-diagnosed persons were born prior to the 1960s, compared to only 6% of those identified with IDU risk in the SDMD, and 10% of those with IDU risk in either the HCV Diagnosis or hospitalization/deaths data sources. Across data sources, the proportion of individuals with an IDU risk residing in the Greater Glasgow & Clyde Health Board was relatively constant, at 40–46%, as was the proportion of males, at 69–74%.
Of the 25 521 HCV-diagnosed persons, 18 782 (74%) had IDU risk recorded in at least one of the four data sources (i.e. in either the HCV Diagnosis, HIV Test, SDMD or hospital/deaths databases). Overall sensitivity of the HCV Diagnosis database for recording IDU risk (with respect to the ‘gold standard’ of IDU status determined from any of the four linked data sources) was 79·0%, which varied according to covariate level (Table 1).
Log-linear modelling
The log-linear model fitting procedure retained all two-way and three-way interaction terms, and predicted a further 2484 IDUs not identified from the four data sources (Table 2) for an estimated total of 21 266 IDUs (95% CI 20 582–22 140); this corresponded to an estimated IDU prevalence in all HCV-diagnosed persons of 83·3% (21 266/25 521). The sensitivity analysis, in which individuals diagnosed after 31 December 2006 were excluded (resulting n = 20 612), indicated a slightly higher estimated IDU prevalence (84·8%).
SMR01/Deaths, Scottish Morbidity Records hospital discharge/deaths data; SDMD, Scottish drug misuse database; Y, Yes; N, no.
Backward step-wise fitting of log-linear model (residual deviance of 0 on 0 residual d.f.) included main effects for data sources (a) HCV diagnoses, (b) HIV test, (c) hospital discharge/deaths and (d) SDMD, and all two-way and three-way interactions between data sources.
As a validity check, we also estimated IDU population size for only those persons with IDU risk present in the HCV Diagnosis database (n = 14 836) from the other three data sources; the estimated total number of IDUs was 14 336 (95% CI 14 069–14 629) and estimated IDU prevalence for this group of known IDUs was 96·6%, slightly less than the 100% one would expect. However, this analysis was necessarily based on the three data sources with the most impoverished IDU risk information. An additional analysis, excluding 1872 persons whose HCV diagnosis record specified a non-IDU risk activity (reducing to n = 23 659), also resulted in fewer estimated HCV-diagnosed IDUs: 19 935 (95% CI 19 724–20 170) compared to 21 266 (the latter value estimated in Table 2).
Stratified log-linear models fitted to the data according to sex, birth cohort, period of HCV diagnosis, and health board group indicated substantial variation in estimated IDU prevalence (Table 3). Prevalence was lowest for the oldest cohort (born before 1960, 49·4%) and highest for individuals born 1970–1979 (93·4%). Estimated prevalence was also highest for males (84·6%) and for persons diagnosed with HCV during 1995–1999 (91·5%).
Y, Yes; N, no; CI, confidence interval; GGC, Greater Glasgow & Clyde.
‘Saturated model?’, Y refers to a log-linear model in which all main effects and all possible two- and three-way interactions were retained after applying the backward stepwise selection.
a All two-way interactions only (residual d.f. = 4, deviance = 2·53).
b All two-way interactions and a:b:d, a:c:d, and b:c:d only (residual d.f. = 1, deviance = 1·70).
c All two-way interactions and a:c:d only (residual d.f. = 3, deviance = 1·66).
d All two-way interactions only (residual d.f. = 4, deviance = 1·90).
e All two-way interactions except a:c and b:c (residual d.f. = 6, deviance = 3·47).
f All two-way interactions and a:b:c and a:c:d only (residual d.f. = 2).
g All two-way interactions and a:b:c and a:c:d only (residual d.f. = 2, deviance = 2·33).
h All two-way interactions except b:c (residual d.f. = 5, deviance = 5·50).
i All two-way interactions and a:b:c and b:c:d only (residual d.f. = 2, deviance = 0·42).
[a, HCV Diagnosis; b, HIV Test; c, Scottish Morbidity Records hospital discharge/deaths data (SMR01/Deaths); d, Scottish drug misuse database (SDMD).]
DISCUSSION
This application in Scotland is the first to demonstrate the use of log-linear modelling, based on capture–recapture data from four linked sources, to estimate the proportion of IDUs in HCV-diagnosed persons. The estimated prevalence of current/former IDUs was 83% in Scotland's HCV-diagnosed population, substantially higher than the 58% who had reported IDU as risk activity. This estimated prevalence was somewhat lower than an estimate of IDU prevalence derived from the 65% of the study population with reported risk factor(s) (89%). However, if individuals diagnosed with HCV in the three most recent years of the study period are excluded (to allow more opportunity for ‘capture’ by the other data sources), the prevalence was estimated at 85%. The latter figure is closely comparable to the value (87%) obtained from laboratory surveillance in England & Wales in 1992–2004 [Reference Gungabissoon, Balogun and Ramsay3].
Stratified analyses indicated that estimated IDU prevalence was lowest (67%) in individuals diagnosed with HCV before 1995; this is consistent with an over-representative contribution to the early growth of the database from persons with blood clotting disorders. IDU prevalence was highest in those born in the 1960s and 1970s, reflecting the age groups in Scotland in which problem drug use is the most prevalent [Reference Hay12], and in which risky injecting practices are frequent [13]. Estimated IDU prevalence was also higher for male, compared to female, HCV-diagnosed persons (86% and 78%, respectively), which may be due to male sex being an independent risk factor for acquiring HCV infection in IDUs, leading to more male than female HCV-infected IDUs appearing on the HCV Diagnosis database.
The estimated sensitivity of the risk information field in the HCV Diagnosis database also varied according to birth cohort, and period of diagnosis, with the lowest accuracy observed for the youngest cohort (74%) and most recent HCV diagnosis period (69%); the latter finding is consistent with there sometimes being a short lag in the collection and recording of risk activity data on the HCV Diagnosis database.
The only other study we are aware of in which the number of HCV-infected IDUs was estimated using capture–recapture methods was conducted in Porto Alegre, Brazil [Reference Caiaffa14]. In this study, the total number of IDUs attending needle-exchange programmes was estimated based on two interviewed samples about 1 month apart, and then overall HCV seroprevalence in this population (53%) was assumed for the estimated total IDUs (168/317). However, the proportion of IDU risk in HCV-diagnosed persons was not estimated in the study.
Although the application of capture–recapture and log-linear modelling methods to epidemiological questions has certain strengths, it also has a number of limitations.
First, we have had to assume that all four data sources reflect the same (closed) population. In reality, especially over the long study period, new individuals enter and others leave the population, through initiation of drug use, and death. Second, within a given data source, each IDU was assumed to have the same chance of being included (i.e. to have the same ‘catchability’). Although we have attempted to address the issue of heterogeneity in being observed within a given data source by conducting stratified analyses, an unknown degree of variability will remain. Subgroups with low catchability might bias estimates of the prevalence of IDU within the HCV-diagnosed population downwards. Finally, violation of the assumed high accuracy of the record-linkage methods could also result in bias. Although the probabilistic methods used by ISD to link HCV Diagnosis with the SMR01/deaths linked dataset have historically low false-positive and false-negative rates (<5% [Reference Kendrick and Clarke9]), accuracy estimates were not available for the other linkages performed.
In conclusion, the proportion of Scotland's HCV-diagnosed population who were estimated to have acquired their infection through IDU was smaller than if estimated from only the data with non-missing risk information, but once opportunity for capture in the other data sources was increased, the proportion with IDU risk was more similar. Information on the route by which HCV infection is acquired is essential when targeting risk groups with educational and prevention interventions, and is also useful for governmental and public health professionals who develop policy and allocate funding for treatment and care. Our results – indicating a similar high prevalence of IDU in HCV-diagnosed individuals with missing data on risk activities, as for those with risk activity reported – provide evidence that efforts to prevent and treat HCV infection should focus on this risk group.
ACKNOWLEDGEMENTS
We thank Glenn Codere and Amanda Weir for their assistance with the HIV Test database, Information Services Division, NSS Scotland, for conducting the probabilistic record-linkage and the provision of hospitalization/deaths and SDMD datasets, and the various laboratories across Scotland for providing national data on HCV diagnoses and HIV testing.
DECLARATION OF INTEREST
None.