INTRODUCTION
Contact investigation around infectious tuberculosis (TB) cases can decrease or eliminate future TB transmission through identification and treatment of TB infection in contacts of TB cases. Historically, the strongest indication of TB transmission has been the diagnosis of active TB in a contact of an infectious case. Genotyping results for Mycobacterium tuberculosis isolates can support or refute transmission assumptions between epidemiologically linked cases [Reference Oelemann1–Reference Dahle4].
Transmission assessments based on genotype comparisons depend both on the genotyping methods used and how genotype concordance is defined. IS6110-based restriction fragment length polymorphism (RFLP) analysis has been a widely used genotyping method to define TB strains. There has been conflicting data on the validity of clustering TB isolates of nearly matching genotype, even when prior evidence of transmission exists. Some studies have shown that the IS6110 site is relatively stable and the rate of gain or loss of IS6110 is estimated to be low [Reference van Soolingen5]. Employing match criteria other than requiring an exact match between cases may result in an overestimation of transmission when utilizing RFLP [Reference Jonsson6, Reference Niemann7]. However, studies examining serial isolates obtained from the same patient or from a known transmission chain have shown IS6110 pattern changes [Reference Benjamin8–Reference Cave10]. Accounting for these events can impact transmission assessment between epidemiologically linked TB cases [Reference Niemann7, Reference Benjamin8, Reference de Boer11–Reference Tanaka and Francis15].
There has been limited description of how expanding genotype concordance definitions impact transmission assessments in epidemiologically linked TB cases [Reference Behr16–Reference Benedetti18]. Studies that have evaluated M. tuberculosis genotypic relationships between linked cases are often from high-incidence countries [Reference van der Spuy19], focus on cluster investigations [Reference Lindquist20, Reference McNabb21], or determine strain relatedness by only one molecular method [Reference Grant2, Reference Niemann7, Reference Cave10, Reference Behr16, Reference McNabb21, Reference Case22]. While exact-matching genotype concordance criteria (using IS6110 patterns) have traditionally been used to characterize transmission between linked cases, this does not account for IS6110 changes that may occur during or after TB transmission [Reference Warren17]. To better understand TB transmission dynamics in New York City (NYC), we reviewed index-case M. tuberculosis isolate genotyping results and those of their contacts who subsequently developed active TB across two molecular methods. To account for possible IS6110 changes, we explored the use of an expanded genotype concordance definition, and estimated the additional transmission this would reveal. We anticipated that contacts that develop TB a short time after being exposed to the index case are more likely to have isolates with an exactly matching genotype than those that develop TB after a longer period of time.
MATERIALS AND METHODS
Study population
The NYC Department of Health and Mental Hygiene TB Registry contains information on TB cases reported in NYC as well as on persons identified as having been exposed to an infectious TB case (contacts) during contact investigation. This retrospective cohort study is based on a TB preventability analysis that includes both index cases and their associated contact cases (contacts of an index case that subsequently developed active TB). Study population selection methods have been described previously [Reference Anger23]. In brief, the NYC TB Registry was used to identify contacts of TB cases (aged ⩾5 years) that were diagnosed with TB in NYC from 1 January 1997 to 31 December 2003. These contacts were then matched to TB cases diagnosed in NYC from 1 January 1997 to 31 December 2007 by name, sex, date of birth, and country of birth. Included contact cases must have been living in NYC when identified as a contact and could not have been treated for active TB in the year prior to the index case's diagnosis. Our inclusion criteria differ from those of the parent study in that we included contact cases diagnosed throughout the study period as well as multidrug-resistant TB cases and their contact cases [Reference Anger23]. To assess comparability, we compared study population contact-case demographics to those of the overall population of contact cases in NYC 1997–2007.
For individuals identified as a contact multiple times, the most recent contact event was used. We hypothesized that isolates from contact cases were more likely to be discordant with the index cases’ isolates when more time elapsed between exposure to the index case and diagnosis of active TB disease. To assess this influence of time, we classified contact cases as either prevalent (active TB diagnosed up to 9 months after being identified as a contact) or incident (active TB diagnosed more than 9 months after being identified as a contact). During contact investigation, tuberculin skin tests (TSTs) were administered to contacts unless there was a documented positive TST result before the contact investigation (prior positive) or a prior TB diagnosis. If negative, the TST was repeated after the window period (8 weeks after last day of known exposure to the index case) to allow time for the immune system to manifest a response to a recent infection. From the contact record, TST results at the time of contact investigation were abstracted and classified as follows: positive (⩾5 mm induration, obtained either during or after the window period); negative (<5 mm induration, obtained after the window period); window negative (a negative TST result during the window period with no subsequent test result); prior positive; prior TB diagnosis; or not tested.
We compared demographic (age, region of birth, sex, and race/ethnicity), clinical (time between index case and contact-case diagnoses, TB exposure setting, TST result, and HIV status at TB diagnosis), and social characteristics (history of drug use and homelessness at TB diagnosis). To explore the effect of time on contact-case genotype, we stratified data by prevalent (identified as a case during the contact investigation of the index case) and incident (identified as a case following conclusion of contact investigation of index case) time periods.
Genotyping and concordance classification
Since 2001, all initial culture-positive TB isolates have been routinely genotyped [Reference McNabb21, Reference Clark24, 25]. During the study period, the NYC Health Department used two genotyping methods to characterize TB strains: spacer oligonucleotide typing (spoligotyping) and IS6110 RFLP analysis, which were performed at the New York State Department of Health's Wadsworth Center in Albany, New York and at the Public Health Research Institute at Rutgers University in Newark, New Jersey, respectively. Details on both genotyping methods have been described previously [Reference Groenen26–Reference Thierry30].
Isolate genotype data were abstracted for index cases and associated contact cases (together considered a case-pair), and only case-pairs with complete genotype results (RFLP and spoligotype) were included in our analysis. Initially, case-pairs were classified as either an exact or a non-exact genotype match by examining both spoligotype and RFLP results. We performed bivariate analyses to compare contact cases’ clinical, social, and demographic characteristics in exact genotype-match case-pairs to those of non-exact genotype-match case-pairs within the prevalent/incident classification of the contact case.
To account for genotype changes in TB bacteria that may have occurred during or after TB transmission [Reference Niemann7, Reference Warren17, Reference van der Spuy19], isolate genotypes of non-exact genotype-match case-pairs were re-evaluated. These case-pairs were further categorized as near-match or genotype-discordant based on a non-blinded review of the RFLP patterns by a TB genotyping expert (Fig. 1). A near-match genotype was defined as a case-pair with the same spoligotype and RFLP patterns deemed to be closely related and differing by ⩽2 bands [Reference Cave10, Reference Anger23]. Case-pairs with genotype results that fell outside of the ‘near-match’ definition were classified as discordant. Based on this case-pair re-categorization, we repeated bivariate analyses of characteristics in exact genotype-match, near-match, and discordant contact cases with further stratification by prevalent or incident status.
Statistical analysis
We used Pearson's χ 2 and Fisher's exact tests for categorical data analyses and Wilcoxon rank-sum test for comparing medians; P values <0·05 were considered statistically significant. All statistical analyses were performed using SAS v. 9.2 (SAS Institute Inc., USA).
RESULTS
Study population
Of 32 031 contacts of 5450 infectious TB cases reported in NYC during 1997–2003, 432 case-pairs were identified, 118 (27%) of which were included in the final study population (Fig. 2). These 118 contact cases were linked to 104 index cases (median of 1 contact case per index case, range 1–4; data not shown). Compared to all TB contact cases, the contact cases included in the study were less likely to be aged <5 years at TB diagnosis (6% vs. 18%, respectively, P = 0·002) and more likely to be aged 18–44 years at TB diagnosis (44% vs. 58%, respectively, P = 0·010). Additionally, included contact cases were significantly more likely to be born outside the United States (53% vs. 40% of all contact cases, P = 0·012) (Table 1).
a P values generated by Pearson's χ 2 or Fisher's exact tests for proportions, Wilcoxon rank sum test for medians.
b Age at TB diagnosis.
c Includes birth in US territories.
Prevalent and incident contact cases
Of the 118 contact cases included, 70 (59%) were considered prevalent contact cases, and 48 (41%) were incident contact cases. Although there were no significant differences in demographic characteristics observed between incident and prevalent contact cases (Table 2), prevalent contact cases were more likely than incident contact cases to have had a positive TST result at the time of contact investigation (69% vs. 50% respectively, P = 0·04), and less likely to have a history of homelessness at the time of TB diagnosis (1% vs. 15% respectively, P = 0·01, see Table 2).
n.a., Not applicable.
a Prevalent contact cases were diagnosed within 9 months of the date of diagnosis of the associated index case. Incident contact cases were diagnosed >9 months after the date of diagnosis of the associated index case.
b P values generated by Pearson's χ 2 or Fisher's exact tests for proportions, Wilcoxon rank sum test for medians.
c Includes birth in US territories.
d TST result when contact case was originally evaluated as a contact. Contacts that had a negative TST result in the window period (within 8 weeks of last known date of exposure) but did not have a subsequent test were assigned a window negative TST result.
e Contacts were eligible for TST conversion if they either had a known TST induration result within 2 years of the first TST after being identified as a contact, or if they had a negative (<5 mm induration) TST during the window period and then a second TST after the window period. An increase of 10 mm induration between the TST qualifies as a conversion.
f Contact case and index case genotypes are comprised of both spoligotype and IS6110 restriction fragment length polymorphism (RFLP) results. Genotype concordance was determined by comparing the genotypes of case-pairs (consisting of the contact case and the associated index case). Initially, genotypes that matched exactly were considered exact matches and all others were categorized as non-exact matches.
g In non-exact matches, contact cases were further categorized as either near-match (case-pair with the same spoligotype and RFLP patterns differing by ⩽2 bands). Non-exact matches that did not meet the near-match definition were classified as discordant.
Case-pair genotype-match analyses
Although 82% (n = 97) of all case-pairs were ultimately categorized as near or exact genotype match, this proportion was greater in prevalent case-pairs than incident case-pairs; however, this difference was not significant (87% vs. 75% respectively, P = 0·090. Prevalent case-pairs were more likely than incident case-pairs to be classified as exact genotype match (79% vs. 58%, respectively, P = 0·02, Table 2). When applying our expanded genotype concordance definition to include non-exact genotype-match case-pairs, the proportion of prevalent and incident case-pairs reclassified as near-match genotypes was the same, at 40%.
Patient comparisons by genotype concordance (expanded definition)
We examined contact-case characteristics by the expanded genotype concordance classifications within the prevalent/incident categorization. Overall, we found no significant demographic differences between exact- and near-genotype-match contact cases in either the prevalent or incident contact-case groupings (Tables 3 and 4).
n.a., Not applicable.
a Prevalent contact cases were diagnosed within 9 months of the date of diagnosis of the associated index case.
b Contact-case and index-case genotypes are comprised of both spoligotype and IS6110 restriction fragment length polymorphism (RFLP) results. Genotype concordance was determined by comparing the genotypes of case-pairs (consisting of the contact case and the associated index case). Genotypes that matched exactly were considered genotype-concordant. Near-match contact cases were originally categorized as genotype discordant in Table 2. Near match defined as a difference in no more than 2 bands (but in the same family) between index and contact-case isolate genotypes that share identical spoligotypes. All remaining were categorized as genotype discordant.
c Includes birth in US territories.
d TST result when contact case was originally evaluated as a contact. Contacts that had a negative TST result in the window period (within 8 weeks of last known date of exposure) but did not have a subsequent test were assigned a window negative TST result.
e Contacts were eligible for TST conversion if they either had a known TST induration result within 2 years of the first TST after being identified as a contact, or if they had a negative (<5 mm induration) TST during the window period and then a second TST after the window period. An increase of 10 mm induration between the TST qualifies as a conversion.
n.a., Not applicable.
a Incident contact cases were diagnosed more than 9 months after the date of diagnosis of the associated index case.
b Contact-case and index-case genotypes are comprised of both spoligotype and IS6110 restriction fragment length polymorphism (RFLP) results. Genotype concordance was determined by comparing the genotypes of case-pairs (consisting of the contact case and the associated index case). Genotypes that matched exactly were considered genotype-concordant. Near-match contact cases were originally categorized as genotype discordant in Table 2. Near match defined as a difference in no more than 2 bands (but in the same family) between index and contact-case isolate genotypes that share identical spoligotypes. All remaining were categorized as genotype discordant.
c Includes birth in US territories.
d TST result when contact case was originally evaluated as a contact. Contacts that had a negative TST result in the window period (within 8 weeks of last known date of exposure) but did not have a subsequent test were assigned a window negative TST result.
e Contacts were eligible for TST conversion if they either had a known TST induration result within two years of the first TST after being identified as a contact, or if they had a negative (<5 mm induration) TST during the window period and then a second TST after the window period. An increase of 10 mm induration between the TST qualifies as a conversion.
DISCUSSION
In this study, genotype concordance with the index case was found in 70% of the 118 contact cases (79% of the prevalent contact cases, and 58% in the incident contact cases) using an exact match criteria between epidemiologically linked case-pairs. However, by accounting for minimal changes in IS6110 RFLP patterns, we found near-matching genotypes in an additional 12% (n = 14). Ultimately, using an expanded definition of genotype concordance, genotyping results supported TB transmission in 82% (n = 97) of all contact cases, highlighting the need for programmes to evaluate their criteria for determining what constitutes a genotype match when making transmission inferences.
Our study population of contact cases who had full genotyping results available differed slightly from all TB contact cases initially identified, namely in that there are fewer included contact cases who were aged <5 years at the time of diagnosis and included contact cases were more likely to be foreign-born. Due to the nature of our data, these results are expected. Patients aged <5 years diagnosed with TB are not likely to produce a culture-positive sputum sample, which is necessary to perform genotyping. Additionally, universal genotyping was mandated in NYC in 2001 [Reference McNabb21, Reference Clark24, 25]. Since that time, the majority of TB cases diagnosed and reported in NYC have been in foreign-born populations [31]. Therefore, there would be a smaller pool of US-born TB patients that met our eligibility criteria.
TB isolate genotype concordance has historically been used as a potential indicator of TB transmission within a specific population; a supposition that is strengthened when cases are epidemiologically linked by contact investigation. However, even in linked cases, the definition of genotype concordance influences transmission assessments. The inclusion of the near-match genotype in contact cases more accurately captures transmission events, accounting for IS6110 changes that can occur over time or when a TB strain is transmitted [Reference Warren14, Reference Warren17]. When changes occur, most studies estimated that these alterations occur at a higher rate when active TB disease has developed and prior to effective anti-TB treatment, when replication of the M. tuberculosis bacterium slows [Reference Warren14].
In prevalent case-pairs, we expect genotype concordance, as prevalent contact cases had a documented exposure to TB and were diagnosed shortly thereafter. The finding of discordant genotype in nine prevalent contact cases is unexpected. Of these, six tested TST positive during contact investigation, and three converted their TST. Although we typically categorize conversion of TST as evidence of recent transmission, it is possible that this instead represented a boosted TST response of a remote infection. All nine were born in countries with a TB incidence rate ⩾15 times that of the United States [32], and undocumented TB infection prior to identification as a contact in NYC is possible.
As expected, compared to prevalent case-pairs, we found increased M. tuberculosis genotype discordance in incident case-pairs. These incident contact cases (most of whom were born in a high-TB incident country of birth [32]) had more time to either reactivate a latent infection (acquired prior to identification as a contact) or to have been infected (or re-infected) due to undocumented TB exposures subsequent to their identification as a contact in our study. Surprisingly, we did not find a statistically significant over-representation of traditional TB risk factors (e.g. age <5 years, birth in foreign country, homelessness, HIV infection, etc.) in incident genotype discordant contact cases when compared to exact-match or near-match incident contact cases.
We attempted to account for the relative stability of the TB genome, but at the same time to allow for the possibility of minor genomic changes over time by including an additional genotyping method, spoligotyping, in the overall genotype result of isolates in our study. The finding of the same proportion of incident and prevalent near-match case-pairs (40%) in case-pairs initially classified as a non-exact genotype match was unexpected, as we anticipated that the incident contact cases would have had increased opportunity for genotype change. This finding indicates that further study of M. tuberculosis genotypic changes over time is warranted.
Our results differ from a similar study conducted in San Francisco in 1998 [Reference Behr16]. The study authors, who defined patients as having matching genotypes if the IS6110 patterns were either the same or differed by one band, found a 30% genotype discordance proportion, a twofold increase compared to our study. These conflicting results may be explained by the differences in the definition of genotype concordance as well as differences in the study populations. During the study period in San Francisco (1991–1996), the majority of TB cases were diagnosed in foreign-born individuals [33], whereas in NYC, foreign-born predominance in cases did not occur until 1997 [31]. Contacts from high-incidence countries are more likely to have had TB exposures prior to their identification as a contact of a TB case in NYC compared to US-born contacts. Thus, the contacts from high-incidence countries may be less likely to be infected in the United States, which may explain why the San Francisco authors found higher discordance in their study population. Additionally, the study authors did not specify when contact cases were diagnosed with TB, and if a large number of included cases were diagnosed after a long period of time post-contact investigation, a higher rate of discordance would be expected.
This study had some limitations. Although our study reported on a relatively large number of epidemiologically linked case-pairs, we were limited in our ability to detect statistical significance in some variables of interest due to missing data. Misclassification of ‘index’ and ‘contact’ cases may have occurred. Some contact cases were named as a contact multiple times, but only the most recent contact event was used for this analysis, and contact cases may have been infected by an earlier index case. In prevalent case-pairs, the designated contact case may have been the true index case but was diagnosed at a later date. Similarly, a contact case may have been infected by an unidentified case.
By confining our ‘near-match’ definition to genotypes differing by ⩽2 bands, it is possible that genotypes in case-pairs in which transmission did occur were misclassified as discordant. This may be particularly true in strains with a high number of bands or complex banding patterns, where the number and position of bands may be difficult to accurately interpret in the laboratory [Reference Braden, Crawford and Schable34]. We also did not consider additional alternative definitions of genotype concordance in our study (exact RFLP and near-match spoligotype or near-match RFLP and near-match spoligotype). However, only three case-pairs met either of these additional concordance definitions. Finally, studies have shown that individuals may harbour multiple strains of the M. tuberculosis bacterium [Reference Warren35]. Case-pair genotype concordance would therefore be directly dependent upon which of multiple strains was identified that such an individual may have had, and could have led to an increased proportion of case-pairs with discordant genotypes.
Despite limitations, our study covered nearly 10 years of TB case data and additionally included data on persons when they were identified as a contact, which few programmes routinely collect. The study included all case-pairs diagnosed during the study period that had complete isolate genotype data and is the largest of its kind to date.
Our study highlights the need for TB control programmes to further evaluate M. tuberculosis genotype data in previously linked cases. Nearly half (47%) of all non-exact genotype-match case-pairs were ultimately re-classified as near-match, emphasizing the importance of developing methods to account for changes in genotype to aid in transmission detection and assessment. Including TB contact events and genotyping data for all TB cases in a comprehensive TB registry facilitate TB control programmes’ ability to assess transmission between epidemiologically linked cases. This study's methods provide an additional tool by which TB control programmes should determine when to conserve resources (when genotyping refutes transmission), or identify additional transmission by using alternative definitions of genotype concordance to support transmission assessments in epidemiologically linked cases.
ACKNOWLEDGEMENTS
We acknowledge our laboratory partners at the Public Health Laboratory, the New York State Wadsworth Center and the Public Health Research Institute. In addition, the authors thank Lisa Trieu for assistance with data analysis and the field and clinic staff who conduct contact investigations at the New York City Department of Health and Mental Hygiene Bureau of Tuberculosis Control.
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
DECLARATION OF INTEREST
None.